Edit

Petroni Group

AI-Driven Systems for Scientific Discovery

We build AI solutions that combine multimodal reasoning, agentic exploration, tool usage, knowledge retrieval and lab-in-the-loop interaction to accelerate scientific discovery.

Edit

Previous & current research

Our work has shown that large language models (LLMs) store vast amounts of knowledge in their parameters and can serve as knowledge bases. We also demonstrated that their answers become far more reliable when they can retrieve external evidence on demand, leading us to establish the retrieval-augmented generation (RAG) framework. We then moved beyond static RAG by enabling LLMs to dynamically choose from a suite of specialised tools, execute code, reflect on their own output, and iterate until the task is complete. These tool-using agents can cross-reference large amounts of heterogeneous sources (e.g., omics data and literature) and help scientists uncover new insights.

Future projects & goals

Today’s AI still struggles with complex scientific tasks, such as identifying hidden knowledge gaps or connecting concepts across disciplines. Our goal is to create realistic benchmarks that capture these challenges and to build AI-based agents capable of reasoning over large, multimodal data sources, pushing AI beyond search and toward proactive discovery.

We will embed our agents directly into experimental workflows, enabling them to monitor live data and collaborate with scientists in real time within a lab-in-the-loop setting. Equipped with EMBL’s state-of-the-art biomolecular infrastructure, these agents will shorten iteration cycles, optimise resource use, and evolve into continuously learning discovery engines.

infographic
Figure 1: Comparing symbolic and neural memory access, from “Language Models as Knowledge Bases?” (Click on image to enlarge)

Figure 2: Overview of the original RAG framework. (Click on image to enlarge)
Edit