A Systemic Problem in Metabolite Identification?

In a comprehensive re-evaluation of more than 6,000 human and rodent urine samples, Jeremy Nicholson and colleagues show that phenylacetylglutamine (PAGIn) – a metabolite now linked to cardiovascular and neurological disease – has been systematically misidentified as phenylacetylglycine (PAGly) in almost half of all NMR-based studies reporting it in humans. The error, the team argues, isn’t new: the correct assignments have been known for decades. Yet reliance on spectral databases has allowed a fundamental mistake to propagate unchecked through the literature.

In this interview, Nicholson explains how the field arrived here, why the problem extends far beyond a single metabolite pair, and what must change if metabolomics is to remain credible in an era of clinical translation.

What originally alerted you to the possibility that PAGln and PAGly were being systematically misidentified?

This is set in the context of our expertise in animal and human small molecule metabolism, both endogenous and exogenous (including drug metabolism), over decades. We also did the original proton NMR assignment of PAG and PAGly in the urine of humans and rodents respectively (in multiple papers). So, when we started to see incorrect assignments in the literature going back many years, it was rather obvious to us.

Also, having being an editor and reviewer for multiple journals for three decades, I am acutely aware of the problems that many groups have had in correctly assigning metabolite molecular structures using both NMR and mass spectrometric techniques. These errors were/are unforced, widespread in the literature, and unfortunately extend well beyond PAG and PAGly. We made a decision about a year ago to choose that as an example to examine in detail just using NMR methods.

Could you walk us through the strategy you used to unravel the extent of the problem?

The strategy was to examine de novo a large number of human and rodent urine samples from historic studies for the NMR-detected PAG and PAGly metabolites using 600 MHz proton NMR spectroscopy, statistical total correlation spectroscopy, and two-dimensional correlation methods. We found that out of thousands of samples, PAG was unique to humans and PAGly unique to rodents. This is because the two species have different amino acid transferases; it is a species-specific characteristic. It is possible – or even likely – that at very low levels there is some production of the gly conjugate in humans and the glu conjugate in rats (this is called micrometabolism), but this is probably at much less than 1 percent and therefore is not a credible explanation of the misassignment.

Why do you think this misidentification propagated so widely, and for so long, without being corrected earlier?

Misidentification commonly occurs because of careless use of publicly available databases. Again, this is not specific to these molecules. You could also ask why we waited so long to correct it. And the simple answer is that we have multiple active studies that required our immediate attention – but it became increasingly apparent to us that we needed to draw attention to a wider problem in the literature and peer review that is holding back the field.

Most importantly, some of these metabolites are now beginning to gain clinical relevance, which means that any literature errors need to be corrected. Because, put simply, the wrong assignment means the wrong pathway, which means the wrong biology and ultimately the wrong biomedical interpretation. In this case, there is a possible role for PAG in cardiovascular risk in humans. Rats do not make PAG, so experiments performed on rats in experimental disease or diet models using PAG are not strictly biologically relevant.

Do you see this as an NMR-specific challenge, or are LC-MS workflows also vulnerable to similar misannotations?

We have focused specifically on NMR assignment problems, but it is broader than that. In NMR, the spectra of the two compounds are very different in the 2-4.5 ppm chemical shift range, but the aromatic signals – the phenyl group – are effectively identical from the point of view of shifts and couplings, so people have been using that to assign the spectrum. In mass spec, this is analogous to using one of the fragment ions of a molecule to make the complete assignment without taking into account the whole molecular ion or other fragments – this would be an obvious error in MS. The PAG vs PAGly whole molecule misassignment error is much more difficult to make in MS because the molecular ion masses are completely different.

But mass spectroscopists sometimes make the opposite error of assigning a structure based on the largest fragment without considering the total fragmentation pattern and fragment ratios, which are often the main signature components.

How big is the problem? And what does this mean for the metabolomics community as a whole?

This is one part of a much bigger problem in biomarker assignment, quantification, and validation – especially for novel disease markers. All biomarkers – especially those that are claimed to be novel compounds – need to be: (i) chemically and structurally validated (as we do in our paper on PAG and PAGly); (ii) quantitatively validated (ideally using an independent method or using synthetic standards – which is far from standard); (iii) statistically validated (i.e., N) has to be large enough to know that there is a real biological effect in humans taking into account, variables such as age, ethnicity, sex, diet, etc., and for disease markers; and (iv) clinically validated in an independent patient cohort.

All these validations need to be in place before any clinical translation can take place. Of course, doing stages (ii) and (iii) properly are really hard, but in the case of PAG and PAGly, a lot of people are failing at stage (i), which is a worry for the field in general. So yes there is a problem. We identified this type of error and multiple other challenges in data standardization over 20 years ago, and indeed we were the first to try and initialize the standardization of metabolic data reporting (see here).

What steps are needed to correct the current literature and prevent similar issues in the future?

In this case, I think that it is only necessary to evaluate the data being used in a particular biological argument – so are the study data correct or not, and if not, then it should of course not be used. But the rule is simple: if it is human, it is PAG, and if it is rodent, it is PAGly – other animal species would have to be selectively examined as amino acid conjugations are highly species specific and may include other amino acids as well (taurine conjugations and so on). A lot of this biochemistry was well known 50 or more years ago – when people had more time for academic rigor and indeed peer review. But a lot has been forgotten, if ever known, and a lot of older papers, which can represent many years of careful work, are not easily sourced in the literature databases.

If you could deliver one call-to-action to the metabolomics community, what would it be?

I would appeal to people to follow basic scientific principles, which include being able to prove every step of the scientific process is correct and fully documented. For example: correct analytical procedures, correct molecular assignment, correct and appropriate statistics with power calculations where possible or necessary, correct experimental design, and critically, biological and clinical cross validation.

This level of deep assignment rarely happens in most metabolic studies, even in top journals, and is non-existent where non-expert groups using 3^rd party company analysis without the original data or 3^rd party software… This doesn’t mean that the conclusions are necessarily wrong, but it makes them much less likely to be right. Metabolic pathway analysis and “systems pharmacology” used in conjugation with dodgy analytical chemistry (probably to make it look more “plausible”) opens a whole new can of worms, but that is another story.

About the Author(s)

James Strachan

Over the course of my Biomedical Sciences degree it dawned on me that my goal of becoming a scientist didn’t quite mesh with my lack of affinity for lab work. Thinking on my decision to pursue biology rather than English at age 15 – despite an aptitude for the latter – I realized that science writing was a way to combine what I loved with what I was good at. From there I set out to gather as much freelancing experience as I could, spending 2 years developing scientific content for International Innovation, before completing an MSc in Science Communication. After gaining invaluable experience in supporting the communications efforts of CERN and IN-PART, I joined Texere – where I am focused on producing consistently engaging, cutting-edge and innovative content for our specialist audiences around the world.

A Systemic Problem in Metabolite Identification?

What originally alerted you to the possibility that PAGln and PAGly were being systematically misidentified?

Could you walk us through the strategy you used to unravel the extent of the problem?

Why do you think this misidentification propagated so widely, and for so long, without being corrected earlier?

Do you see this as an NMR-specific challenge, or are LC-MS workflows also vulnerable to similar misannotations?

How big is the problem? And what does this mean for the metabolomics community as a whole?

What steps are needed to correct the current literature and prevent similar issues in the future?

If you could deliver one call-to-action to the metabolomics community, what would it be?

About the Author(s)

James Strachan

Recommended

The Analytical Scientist Innovation Awards 2024: #7

The Analytical Scientist Innovation Awards 2024: #4

Let Me See That Brain

The Analytical Scientist Innovation Awards 2024

Explore

Featured Topics

Issues

Techniques & Tools

Applications & Fields

People & Profiles

Business & Education

A Systemic Problem in Metabolite Identification?

What originally alerted you to the possibility that PAGln and PAGly were being systematically misidentified?

Could you walk us through the strategy you used to unravel the extent of the problem?

Why do you think this misidentification propagated so widely, and for so long, without being corrected earlier?

Do you see this as an NMR-specific challenge, or are LC-MS workflows also vulnerable to similar misannotations?

How big is the problem? And what does this mean for the metabolomics community as a whole?

What steps are needed to correct the current literature and prevent similar issues in the future?

If you could deliver one call-to-action to the metabolomics community, what would it be?

Newsletters

About the Author(s)

James Strachan

Recommended

Related Content

The Analytical Scientist Innovation Awards 2024: #7

The Analytical Scientist Innovation Awards 2024: #4

Let Me See That Brain

The Analytical Scientist Innovation Awards 2024