The term artificial intelligence (AI) is often associated with chatbots, self-driving cars, or recommendation systems. Yet, one vital application is often overlooked: AI’s transformative potential in scientific research. AI is already an integral part of the scientific process, assisting scientists in interpreting large datasets, designing future experiments, and distilling data and knowledge into hypotheses.
To delve deeper into AI’s research potential at EMBL and beyond, we spoke to Professor Oliver Stegle. Stegle is a distinguished scientist who bridges the gap between the life sciences and the world of AI, serving as a group leader at EMBL in the Genome Biology Unit, and as head of the Computational Genomics and Systems Genetics division at the German Cancer Research Center (DKFZ).
You obtained a PhD in Physics, and now you are using AI to shed light on the complexity of life. What motivated you to perform research in the field of life sciences?
There are many ways to conduct interdisciplinary research, and I think that’s where innovation happens. My own journey led me from physics to machine learning, and finally to the life sciences. I personally enjoy contributing to the quest of understanding living things, one of the most exciting puzzles our world has to offer. Life is complex, it’s multifaceted, and it’s also a little messy, which makes it an interesting quest.
What is your research group currently working on and how are you using AI for it?
In our research work, we are interested in understanding the differences between individuals, and how these are connected to the molecular makeup of tissues, including during disease. A major part of our work is to develop algorithms that can read the genome, the code of life. Intuitively, the genome is like a thick book that encodes not only all the heritable information we’re given from our parents at birth that makes us human, but also some of the differences between us. Our goal is to read and interpret this complex book. We’re developing AI-based algorithms that are able to identify and make sense of individual changes in our DNA. By applying these pattern-hunting algorithms, we’re able to identify which differences in our genomes actually matter.
We are increasingly studying this phenomenon at the level of individual cells, because many of the things we care about in health actually originate in a single cell where something goes wrong. From mining large datasets, we are now able to train algorithms that allow us to read relevant bits of the genome and deduce specific aspects of how they matter for the individuals who carry that genome sequence.
How is AI influencing the work of other life scientists?
AI is a very broad tool. And you can use these types of approaches for solving virtually any problem where large datasets are now becoming available. Biology has always been shaped by technological innovations and we are increasingly able to measure important phenotypes – such as the properties of an individual cell, tissues, or organs at high fidelity and scale. The datasets derived from such technologies open up fundamentally new worlds in terms of biological discovery. Many of the low-hanging fruits in terms of AI will now appear where there are well-defined biological questions for which we have access to data at scale. As a result of the rise of AI, computational and experimental research groups are working together ever more closely, which changes the way in which we conduct research.
An example of a relatively recent breakthrough of this type is AlphaFold, an AI tool that allows us to predict how proteins fold from their amino acid sequence alone. Another area with rapid innovation pertains to biological imaging. I think it’s fair to say that today, tools and methods based on AI really infiltrate the whole scientific tool chain, from machines creating new data, to interpreting and finding patterns in complex data, and ultimately, perhaps in the future, to guide what experiments we should be doing next as well.
How do you envision research at EMBL in 10 years, considering the changes made possible by AI?
AI’s just going to be everywhere. And I think we’ll stop realising that we’re using it. One prospect that now becomes real (and we saw this with protein structure prediction already, which will translate to other fields) is to conduct in silico research: you can ask AI models questions rather than doing the actual experiment. This will help to speed up research and will obviously completely change the scientific process. The turnaround time will be much faster because you can get answers to complex questions quickly, without conducting an experiment. Innovation in science will then depend on our ability to formulate well-defined questions that are suitable for AI systems and to correctly judge the answers from these models – what they know and where there are gaps in their predictions. This is a big challenge right now, but I think it will be solvable. But even if AI speeds up research, I very much believe that experiments will remain at the heart of research, because we need to confirm the predictions that AI give us, and use the knowledge gained from AI to push scientific discovery forward. Ultimately, we will have more informed hypotheses to start from.
Right now is a really exciting time for the intersection between the AI and the biology communities, and EMBL is uniquely positioned at the interface of these two fields. Solving bold questions across scales and fields and bringing scientists and questions together in new ways are things that have happened at EMBL before and this is exactly what I expect to see in the area of AI.
Where do you see EMBL’s potential as a leading organisation in AI research?
The spirit of EMBL is highly collaborative. It is a multinational community, where young talent comes together to collaborate. And this is a fantastic starting point. Secondly, EMBL has access to large amounts of biological data, including data stored in the biodata archives hosted at the European Bioinformatics Institute (EMBL-EBI). EMBL-EBI is the home of the large European biodata resources, where biological data across scales, molecular imaging, sequencing, cellular data, are collected and curated and shared with the community. Such data are the seed to train AI models, and to test and validate them.
The third dimension of EMBL I want to mention is the scientific culture and the ability to adapt to new opportunities. EMBL is an institution that is intrinsically shaped by change in direction, by turnover, which allows it to bring new ideas forward quickly. This is exemplified by the speed at which EMBL science is moving into new directions in the current Molecules to Ecosystems programme, which started in 2022. In many ways, the change we are seeing now about how AI is influencing the life sciences is similar to what we saw a few decades ago when bioinformatics emerged as new field in the life sciences. This change will happen here at EMBL very naturally, and it will lead to great discoveries. We already see it today.
You are also the unit director for the ELLIS unit Heidelberg that focuses on AI in life science and health research. What makes the Heidelberg area a suitable site for this field?
First of all, research has always been about teamwork and there’s fantastic scientific excellence all over Europe. This European collaboration spirit is in the DNA of the European Laboratory for Learning Systems, ELLIS – a European network that is designed to foster collaboration in Europe and cooperation and get the brightest minds on a European scale together. But within this sort of collaborative network, I think Heidelberg is really a great place to make scientific progress. The density of life science research here is just exceptional. And that allows the community to create new interdisciplinary bridges by bringing together AI expertise to serve the most pertinent, most relevant, and most innovative technologies and questions.
Via the networks these institutions bring along, Heidelberg is also an entry door to European data resources at large. And that’s important because we need to fuel AI innovations with data. Large datasets can be curated and brought together here in unique ways. And I think that’s really the spirit of what ELLIS unit Heidelberg is trying to embrace and the progress we are hoping to make.
I also want to highlight the importance of building scientific communities locally. Last month, the AI Health Innovation Cluster organised the really fantastic AI InScide Out unconference at EMBL to bring together AI scientists from the Heidelberg and Mannheim area. It was wonderful to witness this vibrant exchange of knowledge and to discuss the advances, issues, and challenges in our field.
Is there anything else you would like to emphasise?
Technological advances have led to a change in biological research, transitioning from a lab-based approach to an increasingly collaborative team science effort. This process will be greatly accelerated through AI. Biology research will be even more diverse and global, which is why an organisation such as EMBL, which is so deeply ingrained in the European science community, will be a crucial component of the scientific ecosystem.
By harnessing AI further, we will be able to accelerate scientific understanding and knowledge. This is crucial especially in areas relevant to human and planetary health which we specifically address in EMBL’s current research programme. We are thinking about molecular biology across scales and in the context of life, from molecules to ecosystems. As usual at EMBL, the data we obtain in our interdisciplinary research are being curated and openly shared with the global scientific community for further research.
EMBL is already playing a vital role in building major pillars that this AI transformation requires, from coordinating large biological datasets to making AI models and outputs available for the world. And ultimately, it is wonderfully positioned to connect communities in order to identify the most important biological questions that are ready to be tackled using AI.
What is artificial intelligence and machine learning?
Artificial intelligence (AI) refers to a broad class of algorithms that have the ability to perform tasks that normally require human intelligence. These tasks range from speech recognition to natural language processing and data analysis.
Machine Learning (ML) is a subfield of AI where algorithms have the ability to learn and improve from experience without being explicitly programmed. This is similar to infants who improve at a task, such as crawling or speaking, by repeatedly doing it and collecting new information. ML algorithms require large amounts of data and then use statistical methods to make predictions and improve their performance. Ultimately, the algorithm’s goal is to successfully interpret data that it has never seen before.
In the life sciences, machine learning or AI-based methods are essentially trying to assemble or identify patterns in complex datasets that are predictive or informative. At the core, one primary objective when deploying these approaches is to gain further insights into the processes of life.