Gary Patti: Metabolomics Is Not in Crisis

0625-401-Feature-Dark-Metabolome-Gary-Patti_Headshot.png — Credit - Gary Patti

The debate over the role of in-source fragmentation (ISF) in untargeted metabolomics continues. Following recent contributions from Pieter Dorrestein and Yasin El Abiead (“The Dark Metabolome: No Mere Figment?”), Martin Giera and Gary Siuzdak (“The Dark Metabolome Debate Continues”), and Shuzhao Li (“A Call for Context”), Gary Patti – Michael and Tana Powell Professor, Senior Director, Center for Mass Spectrometry and Metabolic Tracing, Washington University in St. Louis, USA – now weighs in with a steadying message: the foundations of metabolomics remain as solid as ever.

Can you comment on the stakes involved in this debate?

I think the real stakes here are about how metabolomics is perceived, especially by people who aren’t steeped in it. Metabolomics is now mainstream. Lots of researchers are using the technology: clinicians, molecular biologists – people who are experts in other areas but not necessarily in metabolomics. As an example, I direct an NIH multi-omics center, and we interact with scientists from all kinds of backgrounds. Metabolomics is just one of nine omics that we employ. But when this latest claim started circulating – especially as it gained media attention – I had colleagues saying, “Wait, maybe we should just drop metabolomics from our project. If the data quality is this questionable, perhaps it's not reliable enough to include.” And that’s the problem. Because that's not what anyone was saying. But for someone outside the field, all they see are the red flags: “Maybe this data isn't trustworthy. Maybe we shouldn’t bother.”

So, I think it’s critical to clarify that this debate is about one aspect of metabolomics. And meanwhile, our ability to measure and quantitate well-known, biologically relevant metabolites – glycolytic intermediates, TCA cycle metabolites, phospholipids, and more – is still exceptionally strong. These core analyses are robust, reproducible, and highly valuable, regardless of how this specific issue around unknowns plays out. We need to make sure that message doesn’t get lost in the noise, because otherwise, people may wrongly conclude that metabolomics as a whole is suspect – and that would be a real loss for science. There could be thousands of unknown metabolites to discover in the data or very few. It doesn’t change the fact that we can reliably assess most canonical biochemical pathways with existing metabolomics technologies.

Can you speak to the importance of terminology to this debate?

Terminology is central to this whole discussion. A lot of confusion comes from how people define and use the word unknown. The term is used differently across the community, and that’s a big reason the arguments may seem more polarized than they actually are.

Technically, an unknown – in the broadest sense – is any signal or peak in a metabolomics dataset that hasn't been identified yet. By that definition, everything starts as an unknown. When you first acquire data, none of the peaks have names. Even well-characterized metabolites – say, something from glycolysis or the TCA cycle – are “unknowns” until you match them to a library entry. So in that sense, yes, there are tons of unknowns in a raw dataset. But that’s not a particularly meaningful use of the term.

When people say “unknowns” in this debate, what they usually mean is “unknown metabolites” – in other words, small molecules that have never been described before. That’s a very different thing from saying you simply have a signal you haven’t assigned a name to yet.

I’d argue we should be more specific: unknown features vs. unknown compounds vs. unknown metabolites. And that raises another question: how do we define something as an “unknown metabolite”? Some people say if a signal doesn’t match any compound in the database they’re using, it must represent a novel chemical. But with so many resources now available, a rigorous approach requires searching against all of the public databases and libraries as well as the primary literature before calling something a “new” metabolite.

All of this goes to show that how you define your terms – especially unknowns – shapes your entire perspective on this issue. Without clarity on definitions, it’s easy for observers to misinterpret the debate as more contentious or contradictory than it actually is.

Given the current evidence, what can we say about the existence, size, and significance of the dark metabolome?

Typically, the phrase “dark metabolome” refers to peaks in LC/MS data that represent new molecules we have yet to identify. The idea is that we know the metabolites are there because they appear in the data, however, we don’t know what they are. It’s a metaphor for “dark matter” in astronomy, which we know exists through inference but can’t observe directly. But a lot of people – especially those outside the field – interpret the dark metabolome as, “the total number of metabolites in biology that we haven’t discovered yet” That is an entirely different question.

There is no doubt in my mind that many important metabolites in biology remain to be discovered. The issue is that trying to quantify the total number of undiscovered metabolites in biology from mass spectrometry data is impossible. Mass spectrometers have limited coverage. Some compounds don’t ionize, some are present below detection thresholds, and so on.

To say, “we posit that most unidentified peaks in metabolomics data are fragments – therefore the number of unknown metabolites in the universe is small,” is not valid. The absence of evidence is not evidence of absence. It’s a kind of category error. I don’t think any of the scientific publications we are discussing made that claim, but the media often conflates the arguments and that is where some interpretation appears to have been distorted.

What we can reasonably debate is this: among the peaks we measure in metabolomics, how many represent new compounds? That’s a fair and meaningful question. Trying to extrapolate from there to say what is left to discover in all of biology? That goes beyond what the data can support and is speculation.

I will also add that you don’t have to hunt down new metabolites to make exciting discoveries in metabolism. Even for well-established pathways, there’s still so much we don’t understand with respect to biology. Lactate is an exciting example. Just a few years ago, we thought we had a pretty good handle on the biochemical roles of lactate. But metabolomics studies over the past decade or so have revealed so much unexpected versatility. We now know that lactate can serve as a major fuel for cancer cells, it can act as a signaling molecule, it modulates immune responses, it can be attached to proteins as a post-translational modification, and so much more. One could spend their entire career just studying this one fascinating metabolite.

Why is it so difficult to determine the size of the dark metabolome?

LC-MS data are complex. There’s no question about that. It’s not just in-source fragments though. I have this histogram that I like to show, highlighting all of the different types of signals that appear in untargeted metabolomics LC-MS data. Over the years, we seem to keep adding more and more categories to it as we and many others develop a deeper appreciation of the data.

The challenge is that the only definitive way to prove that a peak is truly a novel metabolite is to solve its structure – demonstrating that you have a previously unreported chemical. That tends to be a slow and arduous process. Although some exciting new innovations might speed it up, it’s currently not practical to solve the structure of every unknown metabolite in an LC-MS dataset. Hence, we are left with inferring that a peak represents an unknown metabolite by ruling out all other possibilities.

Much of the recent discussion has centered around unidentified signals being novel metabolites or in-source fragments. Unfortunately, it is even more complicated than that. It’s essential to recognize other complexities in the data too beyond in-source fragments. In our experiences, among the most prominent are those signals that do not originate from the biological sample itself. Without their consideration, the calculated frequency of in-source fragments from metabolites will be overestimated. For example, the number of fragments from a sample’s metabolites cannot exceed the fraction of LC-MS signals in an experiment that originate from the biological specimen.

Are there any lessons that we should take away from this debate?

I’m all for doing a better job of annotating the meaningful peaks in metabolomics data. I share this story from years ago where I stood in front of hundreds of people at AACR and claimed to have found a novel metabolite, which at the time we thought was an analog of ATP. It turned out to be a dimer that was formed in the source of the mass spectrometer rather than a new metabolite. I learned my lesson. Each peak in a metabolomics dataset must be carefully vetted before jumping to the conclusion that it is a novel molecule. We have systemically annotated datasets and, in our experiences, unknown metabolites do certainly exist in the data. But it would be presumptuous if one assumed that every signal that they couldn’t identify is a new compound. Clearly, there are lots of signals in a dataset that represent unusual things that do not have biological relevance, in-source fragments are just one of the complications.

What do you think newcomers to metabolomics should understand about ISF, “unknowns,” and the dark metabolome when entering the field?

When I was starting out as a professor, I’ll admit, I was a little more liberal in my use of the word “unknowns.” I would try to recruit students to my lab by saying, “Look, we can only identify 2 percent of the signals – 98 percent are exciting unknown metabolites to be discovered!” But I had a student come up to me after one of those talks and say something I’ve never forgotten: “Untargeted metabolomics will make a good analytical chemist sick to their stomach.”

He pointed out, rightly, that I was calling peaks unknown metabolites without any real evidence – just the fact that I couldn’t identify them. He said, “They could be contaminants, artifacts, oligomers, fragments, adducts...” And he was right.

He actually joined my lab and helped, with others, to develop annotation approaches such as credentialing, which uses isotope labeling techniques to distinguish real biological features from noise. These techniques as well as many others from pioneers around the globe have given us much more insight into the nature of the datasets.

So I would say, lots of people have rigorously looked at this annotation problem and specifically in-source fragments. And those who are well-established in metabolomics – those with extensive experience doing solid work – are not giving out misidentified data. There’s no reason users of experienced cores should be afraid that citrate or glutamate, for example, is actually an in-source fragment. I think the idea that people need to retract papers over the recent in-source fragmentation claim is really misguided.

This debate is more relevant to those researchers focused on interpreting previously unidentified signals – the ones most applied scientists aren’t really paying attention to, partly because they’re so difficult to interpret. If you have a signal that might represent a completely new molecule, how do you even begin to figure out the biology? It’s not straightforward.

Very few people have the expertise to do that. Most researchers are focusing on known metabolites. It's important to recognize that those are well vetted – we have strong, solid, quantitative, reliable technologies to measure them.

So whatever side of the coin you end up on in this debate – it doesn’t really affect the day-to-day ability to conduct rigorous science with knowns. That’s what most metabolism researchers are doing, and that foundation remains very solid. Going after the unknown metabolome is an exciting challenge for measurement scientists, and it will probably keep us busy for many years to come. But it’s not a direction that I would recommend to newcomers or those who are exclusively focused on biomedical applications.

About the Author(s)

James Strachan

Over the course of my Biomedical Sciences degree it dawned on me that my goal of becoming a scientist didn’t quite mesh with my lack of affinity for lab work. Thinking on my decision to pursue biology rather than English at age 15 – despite an aptitude for the latter – I realized that science writing was a way to combine what I loved with what I was good at. From there I set out to gather as much freelancing experience as I could, spending 2 years developing scientific content for International Innovation, before completing an MSc in Science Communication. After gaining invaluable experience in supporting the communications efforts of CERN and IN-PART, I joined Texere – where I am focused on producing consistently engaging, cutting-edge and innovative content for our specialist audiences around the world.

Gary Patti: Metabolomics Is Not in Crisis

Can you comment on the stakes involved in this debate?

Can you speak to the importance of terminology to this debate?

Given the current evidence, what can we say about the existence, size, and significance of the dark metabolome?

Why is it so difficult to determine the size of the dark metabolome?

Are there any lessons that we should take away from this debate?

What do you think newcomers to metabolomics should understand about ISF, “unknowns,” and the dark metabolome when entering the field?

About the Author(s)

James Strachan

Recommended

The Analytical Scientist Innovation Awards 2024: #7

The Analytical Scientist Innovation Awards 2024: #4

The Analytical Scientist Innovation Awards 2024

Keeping Up with the Power List: Part 1

Explore

Featured Topics

Issues

Techniques & Tools

Applications & Fields

People & Profiles

Business & Education

Gary Patti: Metabolomics Is Not in Crisis

Can you comment on the stakes involved in this debate?

Can you speak to the importance of terminology to this debate?

Given the current evidence, what can we say about the existence, size, and significance of the dark metabolome?

Why is it so difficult to determine the size of the dark metabolome?

Are there any lessons that we should take away from this debate?

What do you think newcomers to metabolomics should understand about ISF, “unknowns,” and the dark metabolome when entering the field?

Newsletters

About the Author(s)

James Strachan

Recommended

Related Content

The Analytical Scientist Innovation Awards 2024: #7

The Analytical Scientist Innovation Awards 2024: #4

The Analytical Scientist Innovation Awards 2024

Keeping Up with the Power List: Part 1