A new open-source tool, PeakClimber, is aiming to make HPLC data analysis more precise and informative by improving the quantification of individual peaks in complex chromatograms. Developed to overcome the limitations of standard software, PeakClimber uses an exponentially modified Gaussian model to better capture peak shapes – particularly in biological samples where clean separation is rare.
In proof-of-concept experiments with lipid profiles from fruit flies, PeakClimber revealed compound-level changes that were masked when grouping lipids by class, helping to uncover new biological patterns. The tool is intended as a complement to mass spectrometry, enabling users to identify which peaks or compound clusters warrant further analysis, potentially reducing costs and analysis time.
We spoke with the creator, Josh Derrick, PhD candidate in the Cell, Molecular, Developmental Biology and Biophysics Program at Johns Hopkins University, and a researcher in the Ludington and Farber Labs, USA, to learn more about the motivation behind PeakClimber, its potential applications in lipidomics (and beyond), and the challenges of developing open-source tools for the chromatography community.
What sparked the idea behind PeakClimber?
One of the initial experiments in my PhD was to compare HPLC lipid profiles between flies fed a high-fat or standard diet (yeast and sugar), both with and without the presence of bacteria. The overarching goal was to build a system to understand how microbes impact digestive processes in the intestine. After integrating large peak regions that group lipids by class, I did observe significant differences, though their magnitudes were quite small. This type of integration seemed like a waste of the HPLC’s power to me, as I was condensing 20-30 compounds into a single metric. I started to think that perhaps there were larger changes present in individual compounds that were masked when integrating by class.
I then set out to use the stock software that came with our instrument to quantify each individual peak. The peak-to-peak variation proved to be extremely high, soon convincing me that I couldn’t analyze individual peaks using the algorithms provided in the software. Since I am a computational as well as bench scientist, I thought I could develop a better peak quantification solution myself. When I first started work on the software, many of the alternative programs (PeakLab, hplc.io) had not been written yet, so I felt compelled to develop my own solution.
PeakClimber was originally something I was going to write in 48 hours for personal use, but I kept finding things I could improve, and so it slowly morphed into something I planned to release to the public.
Why does better peak quantification matter? Can you explain why more accurate area fitting has such a big impact?
Traditional chromatography, often performed in analytical chemistry labs, relies on good compound separation for quantification. In this case, peak integration, perhaps with some level of background subtraction, is sufficient to accurately quantify the level of a given compound. Conversely, in biological samples – where there are hundreds, if not thousands of different compounds – there’s no way you’ll observe clean peak separation for everything. Without accurate quantification, it is very difficult to generate hypotheses; noise will obscure any real change in ratios between specific peaks. Just as more accurate balances allowed Lavosier to discover the stoichiometric relationships between oxygen and metals in ores and other compounds, more accurate peak-area fitting can highlight previously overlooked relationships between certain metabolites.
What’s something that’s often misunderstood about HPLC data analysis?
I think there’s a fear among many chromatographers that peak fitting is a biased, and perhaps inaccurate, way to quantify analytes. To some extent, they have a point; it would be better if we could simply integrate peak-area under the trace. However, for many biological samples full peak separation is practically impossible. To draw any conclusions from this data – besides simply noting the presence/absence of individual peaks – you must perform some kind of peak fitting. Even the valley-to-valley algorithm is a peak fit, as the algorithm must decide where the integration region starts and stops. The alternative to peak fitting, which is merely analyzing the presence/absence of certain peaks, seems a waste of potentially useful data, in my opinion.
What was your biggest “eureka” moment during the development or testing of PeakClimber?
We were initially using a Gaussian distribution to model peaks, as is standard in mass spec. However, we found that this approach often missed the tail region present in most peaks. There were several distributions we could have used to model this tail: looking back, I was very surprised to see that the exponentially modified Gaussian model was by far the most effective. Despite performing excellently in the proof-of-concept experiments, my big fear throughout most of the process was that PeakClimber wouldn’t be able to shed any insight on the fly lipid data. I was delighted that this was not the case.
Your paper focuses on lipids in fruit flies, why did you choose this area? And what other fields or systems do you see benefiting from this tool?
As a PhD student, I chose to work on fruit flies because of the heavily analytical approach they enable in terms of understanding how the microbiome influences whole body physiology. What’s amazing about this system is that it flourishes with just a few species of bacteria in its gut compared to the thousands in the intestine of animals like us. The problem seemed much more tractable using fruit flies as the model system.
We are hoping that PeakClimber will benefit many different types of analysis of biological samples that produce complex chromatograms, including lipidomics, glycomics, proteomics, and metabolomics in general. We envision PeakClimber as a supplement to mass spec. HPLC users without mass-spec systems can use PeakClimber to identify which compounds (or compound clusters within single peaks) are necessary to identify using mass spectrometry, potentially facilitating significant savings to cost and time.
What do you think are the biggest barriers to wider adoption of open-source analytical tools like this one?
One barrier is the resistance of much of the chromatography community to peak fitting in general. However, I think the larger problem is simply a lack of knowledge and awareness that these tools are out there.
Are there any other big challenges in chromatography that better software tools could help with?
There are two that immediately spring to mind: the first is better identification of peak shoulders, which are often separate compounds. I think PeakClimber can eventually tackle this. The second would be tools to help design better solvent gradients. There are some solutions out there already (Boelrijk, 2023), but I think there’s still some room for improvement.
What’s next for PeakClimber?
PeakClimber was developed to help me analyze data that I was generating for my main PhD project, focused on understanding how flies absorb and process lipids, as well as how this process is modulated by microbes. This will be my main focus over the next year or so. However, when I do have some extra time, there are two improvements that I would like to make. The first is to alter the least-squares algorithm to support both fronted and tailed peaks in the same run. Secondly, I’d like to improve the shoulder finding algorithm, as I’m finding a lot of peaks in my data that would be better fit if the shoulder was correctly identified. Of course, the beauty of open-source software is that the user can make these modifications themselves, and I’d like to think I’ve provided documentation that will be very helpful for that!