From packets to planets
When Vinton Cerf was hired by Google and asked to choose his title, “archduke” seemed like a good candidate… but since the last one was murdered and started a world war, he settled for a less risky option: Google Chief Internet Evangelist.
The title says it all: since he started studying the electronic transfer of data in “packets” via a network of recipients and senders (the TCP/IP protocol) in the 1970s, Vinton Cerf has been instrumental every step of the way until the creation of the world wide web as we know it. He visited EMBL Heidelberg during the Heidelberg Laureate Forum, on 29 August, and shared some thoughts with us.
You are often referred to as a “founding father” of the Internet, a network that has accelerated development in many ways: how does it feel?
The Internet is an infrastructure on top of which people build what they need. It is like a road system with some simple rules: the cars need to be a certain size, drive only on one side of the road, the buildings need to be on the side… There is no rule for what the cars or the buildings should look like or contain, so people can tailor them to their needs and build the structure that will serve their purpose. That is why it has worked so well: it is extremely flexible and adaptable to people’s needs. I feel very happy to have been part of this creation 40 years ago, but while I am one of the ‘fathers’ of the Internet, Bob Kahn and I just defined the rules – then it took hundreds of thousands of people to create a network like the current Internet.
I am also working on an interplanetary Internet access that should facilitate communication in space and support exploration.
In what directions is this network going now?
As the Chief Internet Evangelist, I am happy to say that it is getting bigger and bigger. In 1973, when it was first set up, the Internet could accommodate 4.3 billion addresses – the buildings in the previous analogy. Now all these addresses have found an owner, so we need to supply more: that is why we are currently implementing a new version of the IP (Internet Protocol) system. This implementation is partly driven by the multiplication of connected objects – from remotely-controlled security systems to telescopes – that contain a computer and can communicate: each of them requires an address to function. Mobile and radio-communications that allow more people to easily access the network are also an important factor. This trend will continue in the short term. Longer term, I am also working on an interplanetary Internet access that should facilitate communication in space and support exploration. There is still a lot more that can be done with the infrastructure that was set up four decades ago.
That sounds very promising and positive, but on the other hand you have also warned of a “digital dark age” – what do you mean by this?
The current rapid development means that how we store data – both software and hardware – is changing rapidly. Storing bytes of data is useful only if you have the software they were created with, and the hardware to plug it in: otherwise you might not be able to make sense of the carefully stored bits. At the moment I have piles of floppy disks and CD-roms but nothing to extract the data they contain. There is a risk that we may lose a lot of the information currently stored if we don’t find a way to also preserve the programs and machines to understand them: this is what I call the “digital dark age”.
This is especially relevant for scientific data, but the big public databases that contain petabytes of sequences and measurements are constantly being maintained so one hopes that what is stored there should be still accessible in 100 years. Other types of information are more at risk of just vanishing when we change software, and there is a real danger we could lose this knowledge.
One hopes that what is stored there should be still accessible in 100 years
“Big data” is a daily challenge for big research institutions like EMBL, how do you see it evolving?
The development of machines that are both more efficient and connected has made data collection easier than ever before. This is leading to an accumulation of information that we often call “big data”. It is currently stored in databases and moved around when scientists need to use it; however, moving such immense datasets of raw information uses a lot of power and bandwidth, takes a lot of time, and in some cases can be simply impossible.
The current trend, and I think it will continue, is to move these large sets of data to specific data centres, like at EMBL-EBI or others around the world, which have the capacity to both store and process the information. They would allow the scientists to analyse the subset of the data they need remotely and only download the results they need instead of having to move around immense raw data sets.