Chemometrics: A Bridge to the AI Age

I believe we’re well and truly living in the “AI age” – and we have been for some time (1). This has led those of us working in the related field of chemometrics to ask: what is the relationship between chemometrics and AI? What does the rise of AI mean for our field? And should we respond?

The term “chemometrics” – “kemometri,” in Swedish – was used for the first time in 1971 by a young organic chemist, Svante Wold, when he was applying for a research grant at the University of Umeå, Sweden (2). Svante was the son of Herman Wold, an eminent statistician working in the econometrics field, and Anna-Lisa Arrhenius, one of the daughters of Svante Arrhenius, recipient of the Nobel Prize for Chemistry in 1903 (3) – you might say Svante was born to be the first chemometrician. He coined the name by analogy with other disciplines, such as “econometrics” or “psychometrics,” where mathematics and its related branches (statistics, computer sciences, etc.) were applied by experts of the given subjects (economics or psychology, in this case) to extract useful information from data collected in the related fields of research. Accordingly, chemometrics was defined by Svante Wold as “the art of extracting chemically relevant information from data produced in chemical experiments” (4) – a branch that was gaining increasing attention at the time thanks to the development of computer systems.

Chemometrics started to grow thanks to the input of several pioneering scientists, such as Bruce Kowalski, an analytical chemist based at the University of Washington, Seattle. In 1974, Bruce and Svante founded the International Chemometrics Society (ICS), initially chaired by Bruce. The second President was Michele Forina – my later professor in Genova.

I often think, had he written his application nowadays, it’s likely Svante would have used terms like artificial intelligence and machine learning. Artificial intelligence is defined on Wikipedia as “the capability of computational systems to perform tasks typically associated with human intelligence.” Machine learning, in turn, is classed as “a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data and generalize to unseen data.” Since its inception, these tasks have all been commonly addressed by chemometrics. In fact, chemometric models are typically data-driven, meaning that they are based on experimental data (usually) generated from analytical platforms, rather than theoretical bases. Unsupervised exploratory models – like those obtained by principal component analysis (PCA) – provide analysts with powerful visualisation tools capable of revealing patterns such as groupings or trends of samples in the multivariate space. They also provide the ability to understand correlation structures between variables and the relationship between variables and samples. By doing this, they perform tasks to a higher degree of competency than “human intelligence” alone. For instance, if we take and examine an input data table by visual inspection, our ability to extract rich and valuable information provided by a simple PCA is limited.

On the other hand, supervised chemometric methods provide analysts with predictive models, trained with experimental data which can be used to estimate properties of interest on new samples “unseen” by the model. These properties, either qualitative or quantitative, are typically estimated by supervised classification or regression models, respectively.

“Traditionally” developed and applied within chemometrics, all of the aforementioned methods share one important property: they enable the extraction of model coefficients. This provides the ability to interpret the role of input experimental variables in the process leading to the model output, ensuring chemical interpretability – a fundamental feature of the chemometric framework. This is what distinguishes chemometricians from general data scientists: the capability to make choices and draw conclusions based on deep understanding of the experimental system under study. Often, this is the key for success in data processing workflows.

As another example, let’s look at data pre-processing – a preliminary phase of data analysis often underrated in its importance. Software for multivariate analysis generally includes tens, even hundreds of mathematical transforms that can be applied – individually or in combination – to pre-process experimental data. To make the right choice, the user should comprehensively understand not only the mathematical implications of the pre-processing tools, but also the nature of the experimental data and exact goals of the study. For instance, a standard normal variate (SNV) transform – able to correct for baseline vertical shifts and global intensity effects that may affect signal profiles – would be an appropriate choice in the context of vibrational spectroscopy if our goal was to quantify analytes in the presence of scattering effects. On the contrary, if our aim was to assess physical properties (e.g. particle size) using the same data, the same transform would destroy the desired outcome (5). Such knowledge is expected among analytical chemists, but not necessarily with general data scientists. There must, of course, be strong and interdisciplinary interactions with the data science community, not only to exchange ideas and develop new strategies, methods and tools, but also to remember that the success of a chemometric process always starts from solid “chemo” knowledge, understanding and awareness.

As I said, we’re living in the AI age. In the last decade, the power of ML techniques has been (re)discovered in chemistry, addressing the need to more efficiently process highly complex datasets; like those produced by advanced analytical platforms such as multidimensional chromatography, high-resolution mass spectrometry, and spectral imaging, both in targeted and untargeted approaches. With this, the number of chemometric courses continues to increase at undergraduate, master, and doctorate levels.

Following the general trend, attention is frequently focused on methods of the deep learning (DL) family. These approaches are often inspired by biological neural learning structures, which implement multilayered artificial neural networks (ANN) to address issues in data classification and regression. Artificial neurons are sets of interconnected coefficients that are tuned – generally through a supervised training and optimization process – to provide desired response values starting from the input data. From the 1980s at least, these methods have been investigated in a chemometric context. In a review and perspective paper published in 1991 entitled, “Neural networks: A new method for solving chemical problems or just a passing phase?”, Jure Zupan pointed out one of the main issues (6): “The problem, of course, is that careful planning of the sampling for the training set will probably lead to a number of studies not being done because a good set of data is simply not available.”

ANN methods, especially the most advanced and complex ones, work efficiently when trained with a considerably high number – thousands, or even millions – of observations. This is not a problem for applications like speech or handwriting recognition, where a specialized company can afford to collect such vast amounts of data. But for an analytical chemist, this means analyzing thousands, or millions, of samples in the lab to train a model – a feat that’s seldom accomplishable. Another major issue is that coefficients of ANN models are rarely available to the user and almost impossible to interpret, rendering DL a “black box,” of sorts. Again, this may not represent a hurdle in speech recognition, where the desired output is a correctly recognized sentence with no need to know which features helped the model provide an accurate result. But in chemical applications, we usually want to complement accurate model predictions with an in-depth and critical interpretation of the results, relating them to the domain of the experimental input variables. This need was well synthesized by Wold in his editorial paper published in 1995, celebrating the first twenty years of chemometrics: “In essence, we must remain chemists and adapt statistics to chemistry instead of vice versa” (3).

Thirty years on, and in the midst of a flurry of media coverage on AI, I often see two opposing reactions from chemometricians. The first is typical of a smaller group – I’ll describe them as “mature” chemometricians – irritated by the use of terms such as AI and ML in association with chemometrics. This fundamentalistic vision fails to recall not only the definitions of AI and ML discussed above, but also the fact that these words have been used in chemometric environments from the field’s inception. In 1971, the very same year Wold coined “chemometrics” as a term, Thomas Isenhour and Peter Jurs published in Analytical Chemistry their evocatively named paper, “Some chemical applications of machine intelligence” (7). The second reaction, more typical of younger colleagues, is a much more enthusiastic one – and in my view, enthusiasm is almost always a more positive attitude to have. The main risk I foresee is that as we continue to put our trust in the power of ML methodologies, we risk undermining the importance of training data and the quality of input information.

Once that’s made clear, however, I see great potential within the new generations of chemometricians to merge the increasing power of emerging ML tools with the solid foundations laid by the pioneers. Data mining and processing tools aside, effective platforms are becoming increasingly available to support research at different levels; from bibliographic searches and meta-analysis (8), to the organization and optimization of workflows or grammatical revision of scientific texts.

With all of this considered – as well as the increasing analytical challenges and a renewed interest in these topics – I expect a significant, and successful, increase in the implementation of chemometrics and AI strategies in analytical laboratories in the coming years.

References

M Tegmark, life 3.0: being human in the age of artificial intelligence. Alfred A. Knopf: 2017.
S Wold, “personal memories of the early PLS development,” Chemometrics Intell Lab Syst, 58, 83–84 (2001). DOI: 10.1016/S0169-7439(01)00152-6
N Kettaneh-Wold, a memoir of love, science and adventure — my life with Svante Wold. Outskirts Press: 2023.
S Wold, “chemometrics: what do we mean with it, and what do we want from it?,” Chemometrics Intell Lab Syst, 30, 109–115 (1995). DOI: 10.1016/0169-7439(95)00042-9
>P Oliveri et al., “the impact of signal pre-processing on the final interpretation of analytical outcomes — a tutorial,” Anal Chim Acta, 1058, 9–17 (2019). DOI: 10.1016/j.aca.2018.10.055
J Zupan, J Gasteiger, “neural networks: a new method for solving chemical problems or just a passing phase?,” Anal Chim Acta, 248, 1–30 (1991). DOI: 10.1016/S0003-2670(00)80865-X
TL Isenhour, PC Jurs, “some chemical applications of machine intelligence,” Anal Chem, 43, 20A–38A (1981). DOI: 10.1021/ac60304a715
MP Schneider et al., “enhancing one-class classification performance through variable selection: a review based on advanced literature search approaches,” Chemometrics Intell Lab Syst, 265, 105491 (2025). DOI: 10.1016/j.chemolab.2025.105491

About the Author(s)

Paolo Oliveri

Full Professor in the Department of Pharmacy at the University of Genoa, Italy

Chemometrics: A Bridge to the AI Age

References

About the Author(s)

Paolo Oliveri

Recommended

The Analytical Scientist Innovation Awards 2024: #5

The Climate Conversation: Part Two – Michael Gonsior

Green is Digital

Could AI Ever Replace The Analytical Scientist?

Explore

Featured Topics

Issues

Techniques & Tools

Applications & Fields

People & Profiles

Business & Education

Chemometrics: A Bridge to the AI Age

Newsletters

References

About the Author(s)

Paolo Oliveri

Recommended

Related Content

The Analytical Scientist Innovation Awards 2024: #5

The Climate Conversation: Part Two – Michael Gonsior

Green is Digital

Could AI Ever Replace The Analytical Scientist?