Cast your mind back to 1994. It was the year Brazil won the World Cup after a penalty shoot-out with Italy, and the hit TV series Friends debuted on NBC. In the world of technology, Amazon and Yahoo had just been set up and the first commercial web browser, Netscape Navigator, was launched. The internet, previously used mostly by scientists and scholars, was beginning to look like the next big thing.
In September 1994, a small group of researchers from EMBL Heidelberg travelled to a remote campus in the Cambridgeshire countryside to set up a home for the growing volumes of biological data being generated around the world: the European Bioinformatics Institute (EMBL-EBI).
“The idea for EMBL-EBI was born in the mind of Graham Cameron, who ran the EMBL Data Library,” explains Rolf Apweiler, Joint Director of EMBL-EBI. “He believed sequencing would be transformative for biology, but only as long as there was a place that would archive, analyse and annotate the sequences, and, most importantly, make them publicly available.”
Today, EMBL-EBI’s two buildings accommodate around 800 employees from over 60 countries, but back in 1994, things were very different.
“EMBL-EBI was a couple of Portakabins and a hole in the ground that Graham Cameron very proudly gave us a tour of,” remembers Claire O’Donovan, Head of Metabolomics. “What’s funny is that today we still have Portakabins on site, but only because the institute is growing so fast that we often have to use them for staff overspill.”
And it’s not just the number of people that is on the rise. Maria Martin, who joined EMBL-EBI as a database developer in 1996 and now runs the Protein Function Development team, reflects on how much the data volumes have changed. “Back then, Swiss-Prot – today part of UniProt – had about 80 000 entries. We thought this was a lot and were wondering how to handle the amount of data that was coming in from collaborators. Nowadays we have over 150 million protein sequences, and growing.”
All about the data
In 1994, the two data resources for EMBL-EBI were the EMBL Nucleotide Data Bank – now the European Nucleotide Archive (ENA) – and Swiss-Prot. Alongside these, there was also a small research group and a huge sense of excitement for what was to come.
Over time, the volume and diversity of data increased significantly. “In the late 90s, the microarray revolution started in Stanford,” explains Alvis Brazma, Head of Molecular Atlas Services. “I remember that industry was particularly interested in the topic. In fact, ArrayExpress was one of the first data resources set up with industry contributions.”
These days, genomics, single-cell sequencing, metagenomics and imaging data are just some of the many data types EMBL-EBI resources accommodate. “We have always been very good at pre-empting the next big thing and adapting to it,” says Alvis.
A computing revolution
As data volumes grew, so did the demand for infrastructure. The first computer room consisted of only a few racks. When the time came to expand computing room, it sparked a big debate.
Mark Green, former Head of Administration, remembers: “The table tennis room was quite a large social area, so we thought it would be a terrible waste to convert it into a computer room, as it would take us years to fill it with kit. In the end, we bit the bullet and converted it. Within 18 months, the place was rammed full of kit and we were running out of space yet again. Soon after, we set up a data centre on campus. Now, we have three data centres plus cloud storage, which is constantly on the rise.”
Another technical milestone was setting up the first web servers for EMBL-EBI data resources, in the late 90s, the early days of the internet. “There were lots of problems with connectivity back then, so getting data from the United States required special traffic permissions,” recalls Rodrigo Lopez, Head of Web Production.
“From the beginning, the internet was all about search,” continues Rodrigo. “And all of a sudden, you didn’t have to go to the library, sit in a queue or wait for books. You simply sat at your desk and connected to the network. It was a huge shift in how science worked.”
“We used to have these crazy coding competitions to see who wrote more code, and we would count lines and mistakes to determine the winner.
We had a hell of a good time back then; we were writing code, we were developing methods, we were doing science. It was all cutting edge and there was an amazing atmosphere. Even today I think we’re just scratching the surface of what we can do with the web.”
So what about the people who made all these things happen? “Before I joined EMBL in Heidelberg, I had been told that EMBL was a bit like the Hole-in-the-Wall gang – they didn’t follow rules, they made their own. And I discovered that EMBL-EBI was a bit like the Hole-in-the-Wall gang’s Hole-in-the-Wall gang,” says Mark Green.
“It felt more like a group of friends working together, a young institute where everybody was a colleague and we had regular international cuisine parties,” remembers Maria Martin.
As the institute grew, it became impossible to know everybody, but to this day, teams work together closely through “glue projects”, ensuring that data are interoperable. Knowledge exchange and collaboration within and outside EMBL-EBI are pillars of open data and open science.
Looking to the future
Much has changed in 25 years, but some things remain the same. EMBL-EBI is still collecting, analysing and opening up data for its users. It just happens at a much wider, more diverse scale. And while in the past, data were used mainly by bioinformaticians, they now power discoveries in human health and disease, precision medicine, agri-tech, biodiversity and beyond.
So, what’s next? “The big unknown now is the functional part,” says Rolf Apweiler. “We know only a small number of the functions of genes, transcripts and proteins, but we need to work out their full characterisation. Sequencing only scratches the surface. The functional question is a much bigger one and will take a very, very long time to answer. But when we crack it, it may allow us to do things we can now only dream of.”
It’s almost a year since the coronavirus outbreak was declared a pandemic, affecting all our lives. While the virus continues its grip on the world, scientists are understanding it better and better, increasing our knowledge about it and opening up new ways to fight it.