From early transcriptomics to increasingly complex multi-omics approaches, single-cell analysis has advanced rapidly since the field’s inception. Yet some researchers argue that true insight requires more than technological scale – it demands deeper, more integrated understanding of cellular function.
In this installment of our single-cell series, we speak with James Eberwine, Elmer Holmes Bobst Professor of Pharmacology at the University of Pennsylvania, USA. He discusses why the future of the field depends not just on scaling up data generation, but on deepening our analytical precision, integrating molecular layers, and aligning research more closely with meaningful biological questions.
If we were to look at the last decade or so – focusing on single-cell analysis as a field – are there any specific developments, tools, or events that you think have catalyzed the evolution of the field?
That's a bit of a tough question. We published the first single-cell transcriptomics papers back in 1990, where we looked at an individual RNA from a single cell; and 1992, where we looked at many RNAs from single cells. We also published the first single-cell library in 1994, so my view of the field stretches far beyond just the last ten years. There's been a lot of innovation since that time, but I would say the greatest change isn’t a brand-new, specific technology, but a shift in how people think about and approach single-cell analysis today.
The idea of spatial transcriptomics, for instance, is something people have talked about for a long time; we didn’t call it that back then, but the concept was out there nonetheless. Over the past decade, it's been paid much more attention by the scientific community, and today we have commercial instruments available for it. They're not as efficient as we would like, and there are still issues with some of the instrumentation, but the idea of single-cell spatial analysis has become much more important and is driving scientific discovery and commercial development.
Another idea that has gained traction (and again is something that’s been discussed since the mid-90s) is the need to look beyond just the transcriptome. This is the recognition that we need multi-omics approaches which look at multiple molecular entities in individual cells – proteins, small molecules, RNA, DNA mutations, and more recently, the RNA is associated such as on the surface of organelles – a range of components that work together to elicit cell function. People are starting to see and understand this now, and while progress has been made towards it, it hasn’t been as rapid as many of us would like.
Over the past decade, there's also been a huge emphasis on analyzing millions of cells, which, in my opinion, may be a little misguided. The question to ask is: how many cells do you really need to analyze in order to answer your specific biological question? I understand the need to create atlases – and I am glad to see people doing that – but if those atlases are done with low read depth per cell so that many cells can be analyzed as some are, they’ll have to be redone later with higher read depth utilizing better technologies.
One of the drivers behind doing massive numbers of single cells now is the need for many “data points” to inform proper informatics analysis. Informaticists want large data sets to train machine learning algorithms to better model cell states and function, which makes sense, seeing as those algorithms need tons of data to avoid drawing incorrect conclusions. Whether there’s a biological need for that many cells, however, is a different question.
A concern I have with all these large, high-profile papers using millions of cells is that researchers will start to see that data as definitive – as though it’s "ground truth.” But the reality is that it's a version of truth – a snapshot under specific conditions – and it’s important to remember that biology is dynamic. If someone publishes a massive atlas of the heart, for example, other researchers or funders might think the heart doesn’t merit further study at the single-cell level. In reality, such a dataset is just one piece of the puzzle – from a heart of a particular individual exhibiting a specific age, health status, circadian cycle, and so on. It is, of course, an important foundation from which knowledge can be gleaned, but it needs to be expanded, deepened, and diversified. It’s important not to let those big datasets give us a false sense of completeness, because there’s still a long way to go. And I think what's driving the data collection isn't always aligned with the biological questions being asked.
We also have to recognize that transcriptomes are dynamic and therefore vary based on the dynamic state of the cell which is dictated by circadian rhythms, diet, environmental factors, and many other stimuli. Therefore, any one "snapshot" needs to be interpreted with caution and supplemented with further context.
I dare say that everything that’s been done on the transcriptome has been interesting – and some of it quite useful – but as science progresses we learn new things that require rethinking about our “scientific certainties,” prompting new insights and experimentation. Among these considerations for single cell analysis, is that RNA is chemically modified in cells, giving rise to the “epitranscriptome.” These modifications dramatically affect RNA function; translation efficiency, localization, degradation, and interactions with RNA-binding proteins, so the massive transcriptomic data sets currently being produced provide some information about the cell’s biology but will need to be supplemented with information about RNA modifications to better understand the intricacies of the regulatory role of RNA within cells.
Can you tell us a bit about the focus of your current research?
My lab is divided between single-cell and single-organelle analysis. Firstly, we're working with a group of clinicians here at Penn who provide us with human neurosurgically resected tissue, including tissue from patients that present with epilepsy at the time of neurosurgery. We've learned how to culture human neuronal cells including neurons and organotypic slice cultures from these samples allowing molecular analysis of human neuronal cells. We’ve analyzed several hundreds of cells from patient biospecimens for mitochondrial single nucleotide variants to assess whether any of these variants correlate with aspects of epilepsy symptomology including seizures. This has been really exciting, and we're getting some very interesting data. For instance, there seem to be mito SNV correlations with sex differences in how some patients respond to drugs and manifest seizures, which is fascinating and may be impactful clinically.
Another major focus is analysis of the chemical modifications of RNA in a single cell using a methodology that permits simultaneous detection/quantification of many modifications and the RNA binding proteins that regulate them. We’re working hard to finish the first paper on that now – it'll include data on three different RNA modifications from individual cells, neurons and astrocytes that have been challenged using different manipulations. The key here is to not just look at the amount of RNA, but to understand the proportion of modified to unmodified RNA, since this ratio influences how that RNA functions, whether it gets translated more or less efficiently, how it's localized, and so on. Analysis of the epitranscriptomic landscape in parallel with RNA abundances is critical to understanding RNAs role in modulating cell biologies.
We're also looking at RNAs that sit on the surface of organelles, such as lysosomes, Golgi, and vesicles. We've been asking questions such as: how do those RNAs get there? Are they regulated? Could they be linked to disease? And we're investigating whether these RNAs influence local organelle function as they move around the cell. This is interesting as the subset of RNAs on the surface organelle provide a selected cohort of RNAs that we believe work in concert to modulate local (to where the organelle is localized) cell function.
Another exciting project – last year we published our work looking at endogenous single-stranded DNA in the cell nucleus as an indicator of open chromatin using a method that we developed to explore this feature of nuclear architecture in single cells. Surprisingly, we were able to show that some of those single-stranded DNA regions can act enzymatically, similar to ribozymes, but DNAzymes that function within cells. This is a novel regulatory mechanism that emerged from single-cell analysis.
But perhaps the most time-consuming thing we've been working on relates to some work we pioneered a long time ago: RNA-mediated transdifferentiation. Concurrent with Yamanaka's work with transcription factor DNA transfection, we were the first lab to reprogram cells using RNA. Back when I was a postdoc at Stanford, I coined the phrase "expression profile of the cell," which refers to the relative RNA abundances in the RNA population that exists within a cell. Since then I've wondered: why does a cell maintain a specific RNA profile? Why not just regulate everything at the level of translation (which would be energetically untenable, but theoretically possible)? With that in mind, we’ve been developing new approaches to functionalize that RNA profile in order to understand what those abundances mean in terms of function.
We published a method called “transcriptome-induced phenotype remodeling (TIPeR)” to show that the cell’s transcriptome contains the phenotypic memory of that cell. This was accomplished by utilizing light to transfect RNA populations where the RNAs exist in normal cell abundances, into a distinct cell type (phototransfection) with the result that the host cell phenoconverted to the cell type from which the RNA was isolated. Working with engineers here at Penn, we've now developed highly scalable microfluidic devices that allow us to flow RNA across live cells and transfect them using light. We can introduce multiple RNAs and control how much RNA goes into any particular cell and then phenotype the response of cells directly on the device. This is a single cell functional genomic methodology that provides mechanistic insight into the coordinated biological activity of cohorts mRNAs.
I believe this is going to be a major area for future discovery. All of us have lots of transcriptomics data from cells, but what does this information actually tell us? Currently, people test the functional significance using CRISPR or transgenics – usually one gene at a time. The problem is that biology doesn't operate one gene at a time. Cells require the coordinated function of cohorts of mRNAs and proteins working together to elicit the cellular physiologies that are more than the sum of their constituents. With the TIPeR device, we can deliver multiple RNAs in biologically relevant stoichiometries and then directly measure the phenotypic effects. This may revolutionize how we functionally analyze transcriptomic and epitranscriptomic datasets.
Are there any current single-cell technologies or advancements that you’re particularly excited about?
One of the things that’s exciting to me is the potential to analyze lipids, carbohydrates and post-translational modifications (PTMs) in single cells. Researchers are detecting and quantifying proteins in single cells now, and while progress isn’t as fast as we might like, you can see real momentum in that area starting to build. Given that post-translational modification of proteins – including phosphorylation, glycosylation, lipidation, SUMOylation, and other PTMs – alters their function, if you want to understand single-cell function, you can’t just look at which proteins are made; you have to look at how they’re modified. This is analogous to the need to study the chemical modifications of RNA, that I mentioned previously. And that’s where much future growth in the field is heading.
No longer can you just detect things – you have to quantify them and be analytically precise in the analysis. It’s not enough to say something is there; we need to know how much, how it’s modified and when it is there. Quantification of the dynamics of the complete repertoire of RNAs and proteins present in a cell will provide an important building block to understanding cell function as more than simply “the sum of a cell’s parts”.
So while much emphasis has been placed on high-throughput science, I strongly believe that we also need to be employing “high-thought-put” in the design, implementation and interpretation of our single cell experiments.
What would you say are the main challenges in single-cell research today?
The number one issue at the moment is cost; most procedures in single-cell research are expensive. Sure, companies say they’ve reduced the cost of sequencing from $5 a base down to $0.02 or less a base, but you still have to deal with the initial investment in equipment, and that equipment is by no means cheap! And it's not just that – library generation and sequencing reagents are a recurring cost that can dwarf the cost of the instrument. You’re rarely sequencing just a single library, but hundreds or thousands of libraries, and those reagent kits add up very quickly.
Something that drives up costs for all of us is the need to repeat experiments over and over again. The field would benefit from development of a consistent core set of standards – a framework to ensure data is generated in a consistent manner, so you know how a dataset was generated without having to repeat an experiment yourself. Standards currently exist in a “hodge-podge” fashion and aren’t adhered to in a consistent manner across studies or when data is considered for publication in journals or by funding agencies. Workable discipline-wide standards would reduce costs, improve access to data and facilitate knowledge generation from querying well controlled “field” accepted data.
Another worry is that, due to the needed equipment and associated reagent costs, this kind of single cell research has become and will continue to be concentrated in first-world countries where the necessary resources are concentrated. The democratization – or rather, the lack of democratization – of these technologies is an area of concern. There’s untapped creativity in places around the world that don’t have the funds to do this work, and as a scientific community we are missing out on the anticipated advances that would come from appropriate access. To address some of these issues, several regional centers have been established worldwide that permit samples to be sent for analysis. While quite important and scientifically helpful, there is an increasing number of researchers who can’t avail themselves of these resources because of increasing costs and decreasing funding.
I worry that we’re going to be slower to make progress and our efforts will be focused in parochial directions, missing the creative scientific input and discovery from those who are unable to contribute due to limited resources.
How do you anticipate the synergy between single-cell analysis and omics technologies evolving?
In answer to this, I suppose that repetition is good. As mentioned before, multi-omics is absolutely critical – we need to be able to go beyond just the transcriptome. For example, the mRNA transcriptome tells you the potential of a cell to make proteins, but it doesn't tell you which proteins are actually being made, or how they are processed or functionalized. To understand cell function, we need to determine how these omics layers interact – the transcriptome, the proteome, the metabolome, etc – These regulatory interconnections the dynamics of these interactions, work together to produce the functioning cell.
One technical issue with multiomics that I’d like to highlight is that people often marry these different omics layers by splitting samples (taking part of the cell for transcriptomics, another part for proteomics, etc.). This works to some extent, but inevitably you end up losing sample and decreasing the chance of detecting low abundance molecules as the biological material is split. This means that we may miss important biological insights as a result. To ensure this integration is seamless and meaningful, it’s important that we develop alternative techniques that don't require splitting the cell – so we obtain a complete picture.
This highlights one of the themes of our discussion, namely that additional technology development is necessary to move the field of single cell analysis forward. Technology is the foundation for scientific discovery. A corollary of this is that for continued scientific development to occur, new technology also needs to be developed. It is important to note that needed technology development goes beyond wet bench science to computational analysis. New, faster computational methods are needed to assess the massive datasets that are being generated. Further analysis of multiomics datasets is in its infancy as it isn’t clear how to analyze transcriptome data in context of same cell proteomics and metabolics datasets beyond simple correlation. There is a clear need for new technology development.
Could you tell me about the potential impact of single-cell analysis; for example, on precision medicine or other areas? In other words, what big problems could it help the field solve or answer?
Both therapeutically and diagnostically, single cell approaches have significant potential for applications in precision medicine. For example, there’s a lot of work at the moment on extracellular vesicles from single cells for diagnostics and monitoring of therapeutic efficacy. Another example is if you consider immune system T cells, they are excellent sensors of a variety of chemicals and a beacon into a person’s health. If individual T cells are multi-omically analyzed, it may be possible to assess the general health or disease status of an individual (e.g. if the patient is fighting infections, manifesting epilepsy, or even Alzheimer’s). Analysis of such “sentinel cells” would be incredibly powerful for early diagnosis and treatment of an individual.
There is also a side to single cell analysis that is underappreciated: I, and many others, are focusing on transcriptome (and now the epitranscriptome) analysis of disease-associated cells, which comes with an incredible amount of data that needs to be creatively analyzed. One of the goals of the computational analysis of these data is to suggest which particular RNAs may be therapeutically effective. The question becomes “is RNA an appropriate therapeutic?” We know from the work of others that it can be for certain diseases, where siRNAs and microRNA have been shown to be clinically therapeutic. Further, the success of the Covid mRNA vaccines show that mRNA can be effective in creating a therapeutic immune response. The success of RNA therapeutics rests on several factors, not the least of which is the need to introduce the RNA into the appropriate cells. Currently there are efforts to create multivalent immunogens through simultaneous use of multiple antigen encoding mRNAs. For many diseases however, there will be multiple mRNAs/proteins associated with the disease even for those that arise from single dominant gene defects.
Simultaneous use of multiple RNAs as a therapeutic is difficult and requires more efficient and cell directed approaches then are generally available. But even as this becomes more tenable, there will always be cells that are difficult to transfect with RNA due to the size and charge of the large nucleic acid molecules. To overcome such issues, what if we could use single cell data to identify the key changes in multiomics measures from single cells that are predicted to be therapeutically relevant, and then identify small molecules that mimic those changes – without ever having to deliver RNA? Currently, small molecules are better therapeutics than nucleic acids due to their generally easier delivery. Imagine being able to say: "If in this disease we see these 8 RNAs changing in abundance simultaneously with changes in the abundances of a particular glycosylated protein and these lipids, perhaps there are a combination of small molecules that when introduced to the cells can compensate for these changes to achieve a therapeutic effect."
Such a therapeutic strategy is best explored using disease expressing single cell multiomics datasets, where disease associated changes are not diluted by signals from other nondisease expressing cells. To get there, we need to analyze single cell data deeply, cross-analyze omics layers using machine learning approaches, all while heeding the underlying biology. There are biotech companies exploring this realm of “small molecule compensation” but most neglect to use single cell data, which would likely enhance their efforts significantly. This is the future of therapeutic development that single cell analyses has enabled.
Looking towards the future, what’s your long-term vision for the field over the next 5 to 10 years?
The success of single cell analyses has opened many directions for scientific growth over the next several years, only a few of which can be touched on here. We’re going to see many more cells sequenced, while multi-omics approaches are going to become more sensitive, robust and routine. We’ll also see spatial multi-omics becoming better established, with the ability to simultaneously analyze spatially resolved transcriptomics, proteomics, glycomics, lipidomics, and metabolomics within the same cell within a complex multicellular tissue.
There is going to be more commercial instrumentation available – whether they’ll be better or not is another question – but more options will be on the market for selected analysis, making it easier for researchers to employ single cell analyses in their work. Technologies will continue to become more sensitive and will provide higher subcellular resolution. For instance, getting down to single organelle resolution such as single mitochondria transcriptomics analysis will help to understand regulation of “intracellular spatial organelle omics.” As a result of this progress, we’ll see single cell datasets expand massively, which will require improved analytical methods especially in development of machine learning approaches that can sort through these complex seemingly disparate datasets more rapidly and effectively than current approaches allow.
Better experimental validation of single cell dataset information will be enabled by increased availability of better experimental models – including use of new animal models, better primary cell models, and better “3-D tissue in a dish” models where appropriate cell-cell interactions and communication are present. Such experimental systems are needed to appropriately test single cell generated biological hypotheses.
I don’t have time to discuss this in detail, but eventually, the goal of single cell analysis is to assess multiomics dynamics in living cells in real time. Single cell biosensors, for a variety of intracellular and membrane associated molecules, have improved dramatically over the last few years and will continue to do so, improving sensitivity of detection and the range of compounds detected. We developed a preliminary approach to detect mRNA in live cells but significant technology development over the next few years will enable better detection of RNA and other cellular chemicals in live cells. Detecting, quantifying and manipulating single cell biologies in real-time will provide the ultimate understanding of normal cell function and how disease processes alter it.
All of this is on the horizon, whether or not it will immediately lead to deeper biological insights, however, is harder to predict. We’ll certainly be able to detect and quantify more molecules in single cells, but turning this into real biological understanding will require, as mentioned before, a focus on the biology itself, not just the detection of molecular entities.
I’m a basic scientist at heart, and my scientific focus is on understanding the mysteries of cell function. I’m convinced that as single cell technology keeps improving and our willingness to think mechanistically matures, what seems like science fiction to us now will become reality, and that’s exciting to me!