Researchers in Brazil and Poland have developed a proteomics-driven machine learning model that quantifies tumor aggressiveness based on a tumor’s molecular resemblance to pluripotent stem cells. Their study presents the tool “PROTsi”(protein-based stemness index) for guiding future diagnostics and therapeutic strategies across multiple cancer types.
The model was trained on mass spectrometry–based proteomic data from over 1,300 tumor samples representing 11 cancers – including breast, pancreatic, uterine, and pediatric brain cancers – sourced from the Clinical Proteomic Tumor Analysis Consortium (CPTAC). Protein abundance profiles were benchmarked against a reference dataset of 207 pluripotent stem cell samples, enabling the team to generate a continuous “stemness” score from 0 (least stem-like) to 1 (most stem-like).
“Proteins are the functional drivers of biology,” said Tathiane Malta, co-lead author of the study. “By anchoring the model to proteomic data rather than transcriptomic or epigenetic markers, we move closer to clinically actionable insights.”
The researchers used statistical dimensionality reduction and machine learning algorithms – including elastic net regression – to identify proteomic features most predictive of stemness. These features were then cross-validated against independent data and previously published transcriptomic stemness scores, confirming their biological and predictive relevance.
PROTsi successfully distinguished tumors from healthy tissues, as well as low- and high-grade tumors within specific subtypes. It also flagged a series of stemness-associated proteins that may serve as therapeutic targets – some already under investigation in cancer and other diseases. Tumors that are typically more aggressive or harder to treat – such as uterine, pancreatic, and pediatric brain cancers – were among those best distinguished by the model.
The group is now exploring refinements to PROTsi, incorporating additional data and testing further machine learning frameworks. “This kind of proteomic modeling could help bridge basic research and precision oncology,” explained co-author Renan Santos Simões. “Our goal is to support earlier interventions, better treatment matching, and deeper biological insight – starting at the level of the proteome.”