
Rolf Apweiler: what I’ve learned
From student helper to EMBL-EBI Director, Rolf Apweiler has shaped the journey of EMBL and bioinformatics for over four decades

As a biology student in the 1980s, Rolf Apweiler applied for a student helper role at EMBL Heidelberg. Little did he know it was the beginning of an illustrious career in a field that was about to take off – bioinformatics.
We caught up with Rolf in the run-up to his retirement from EMBL to capture some of his memories and highlights, and his predictions for how open data and AI are changing the field.
“I came to Heidelberg to study biology in 1984, and I had to work to support my young family,” remembered Apweiler. “At the beginning, I worked in factories during the school holidays, but one day I saw an advert for a student helper post at EMBL. The pay was better – 12 Deutsche Marks per hour. The job requirements included solid knowledge of biology, fluent English, and computer skills. Naturally, I applied – despite English being my worst subject and never having touched a computer. I phoned up, and to my surprise, I got the job. I was really proud of myself until I learned I had been the only person brave enough to apply!”

Apweiler began curating data for the Swiss-Prot project, which later evolved into UniProt, the world’s leading resource of protein sequence and functional information. UniProt is jointly run by EMBL, the Swiss Institute of Bioinformatics (SIB), and the Protein Information Resource (PIR) in the USA, and is used by millions of scientists worldwide. Apweiler would read scientific papers and summarise important information to annotate the data in Swiss-Prot. Over time, he took on more responsibilities and became the Founder and Principal Investigator of UniProt.
Alongside his EMBL work, Apweiler continued his studies at the University of Heidelberg, earning his undergraduate and PhD degrees.
Bioinformatics in the age of dial-up internet
“Back then, the data volumes were much smaller, and to find literature on a protein or gene of interest, you had to spend hours or entire days digging through the library. Nowadays, researchers can do all of this and more online in an instant,” Apweiler said.
EMBL alumni Patricia Kahn and Graham Cameron, whom Apweiler worked alongside, were instrumental in persuading editors of scientific journals that the nucleotide sequences from papers should be sent to EMBL. These sequences were added to EMBL’s Data Library, the institute’s first public data resource, established in 1980. Believe it or not, sequences initially had to be typed into the database manually.
“My EMBL start coincided with the early days of the internet. You could connect to something called BITNET and send an email to someone in the US, but it took two hours to arrive. Similarly, to get the data to our users, we sent magnetic tapes through the post. I reckon we had a couple thousand users back then. Things are completely different now in scale, speed, and complexity. These days, EMBL-EBI’s data resources get over 120 million web requests every day, from 40 million IP addresses annually. The field has completely exploded during my time,” said Apweiler.
“Rolf doesn’t saunter into a room, he bounces into a room, and this is incredibly energising for everyone.”
Setting up EMBL’s European Bioinformatics Institute
In 1994, Apweiler was among the seven colleagues who moved from EMBL Heidelberg to Hinxton, UK, to set up EMBL-EBI. The new institute would share a campus with the Wellcome Sanger Institute, which at the time was doing much of the DNA sequencing work for the Human Genome Project. “When we moved to Hinxton, our colleague Peter Stoehr put the computer running the Oracle Database still under a VMS operating system in the trunk of his car and drove it over to the UK. We must have had about 80GB of data back then. Nowadays, EMBL-EBI holds approximately half an exabyte of disk space,” Apweiler said.

Every 18 months or so, Apweiler and colleagues saw the data double in size, in line with Moore’s Law – the observation that the number of transistors on a microchip doubles approximately every two years. “Of course, we couldn’t double the number of staff as quickly, so we had to find ways to improve productivity, storage, and tools. I don’t think this challenge will ever go away, but being able to scale our resources is one of the big success stories of EMBL, in my view,” Apweiler said.

Growth, excitement, and innovation
Much like bioinformatics, Apweiler’s career was on an upward trajectory. His major contributions to the field of proteomics were recognised by the Human Proteomics Organisation’s Distinguished Achievement Award in Proteomics in 2004, and in 2007, he was elected President of the Human Proteomics Organisation. A few years later, in 2012, he was elected as a member of EMBO, and in 2015, he became an International Society for Computational Biology (ISCB) fellow.
By this time, EMBL-EBI had grown from seven members of staff to around 600. After Professor Dame Janet Thornton’s tenure at the helm of EMBL-EBI, in 2015, Rolf Apweiler and Ewan Birney became Joint Directors of the institute.

As technologies improved and the volumes and complexity of data increased further, Apweiler and Birney played essential roles in the development of data standards for proteomics and genomics. “Having shared taxonomies and data standards is incredibly important because it makes the data open for everyone,” Apweiler said. “In my view, open data and open science create the ideal conditions for collaboration and innovation. This is an argument we have to keep making loud and clear, so it never gets taken for granted.”

Bringing academia, industry, and funders together
Apweiler also played a crucial role in developing Open Targets, a unique public-private partnership to improve how scientists systematically identify and prioritise drug targets. Open Targets has been running for over a decade and has been a tremendous success. Nine out of 10 drug discovery programmes fail, usually after several years of work and millions of pounds. But studies show that with more genetic evidence, the likelihood that a drug goes to market doubles. Open Targets aims to make this evidence available in the public domain and ultimately drive up the success rate for clinical trials.
“Rolf has always wanted to influence and change the environment where he is, and to drive change.”
As part of his mission to secure long-term, sustainable funding for public biodata resources, Apweiler was among the initiators of the Global Biodata Coalition, which brings together funders to help them coordinate and collaborate on the management and growth of biodata infrastructure worldwide. This approach ensures that biodata resources remain freely available to all researchers everywhere around the globe.
During the COVID-19 pandemic, Apweiler was a leading voice for the importance of open data sharing, global coordination, and collaboration to tackle the pandemic. He spearheaded EMBL-EBI’s efforts to develop the European COVID-19 Data Platform, with support from the European Union. Apweiler advised European governments, including the then German Chancellor Angela Merkel, and advocated for data-driven responses to keep people safe and stem the tide of the pandemic.
Below are a few of Apweiler’s reflections on his career so far, and predictions for the future, in his own words.
Life at EMBL
In the beginning, I was too shy to take educational opportunities at EMBL. It took a while to understand that this is what EMBL does: giving learning opportunities to young people.
We never fell into the trap of thinking that the tools we developed were the best. The nature of EMBL has always been very collaborative, so we always tried to use the best thing available. I think this openness is part of EMBL’s success story.
In the 80s and 90s, EMBL Heidelberg was one of the founding places for bioinformatics. A lot of the PhDs from that time became world-renowned bioinformaticians.

The importance of open data
There are always pressures to patent sequences and put up paywalls, but they can really damage scientific progress. Look at the Human Genome Project. If the data hadn’t been made open, genomics would never have exploded like it did.
Having open dialogue between academia, industry, government, and charities is crucial. We’ve seen with Open Targets that as long as people buy into the concept of making the data open, what we can achieve by working together dwarfs individual efforts.
I disagree with the idea that industry should be treated differently from academic users. We want to embrace the private sector as a user community, to make sure data have the highest societal impact. Ultimately, it’s private companies that bring new products to the market, so we need them.
Science is for everyone and belongs to everyone. It is important to emphasise and propagate this crucial fact.

How AI will change science and society
AI will transform the way we work. The importance of well-annotated experimental data in science is higher than ever before. If you want to have good AI predictions, you need good training data. The best example is the Nobel-Prize-winning AlphaFold AI, which wouldn’t have been possible without public data resources, including PDB, UniProt, MGnify, and so on.
EMBL-EBI also uses AI tools to speed up data annotation. Of course, AI-powered annotations are still predictions and can contain errors, but this is where the role of curators becomes even more important, because they have the expertise and experience to evaluate AI annotations. This increases productivity without diminishing the importance of our specialist curators.
I think AI will be revolutionary in the health sector, but the adoption will be slower than in research – and rightly so – because of the highly regulated environment and sensitive nature of the data.
EMBL-EBI and public data resources are foundational for the AI revolution in the life sciences. Without high-quality, well-annotated data, there is no AI.
Find out more about Rolf’s career and the history of EMBL-EBI in the video interview below, conducted by Angus Lamond and edited by Ruairi McEvoy.