Structural biology at EMBL covers the full cycle of discovery: curiosity, training, research, instrumentation, experimentation, analysis, quality control, data deposition, data sharing and re-use, molecular design and back again.
Structural biology has come a long way since Max Perutz and John Kendrew resolved the structures of myoglobin and haemoglobin more than half a century ago, and new technologies have radically reduced the time it takes to model biomolecular structures.
In 1971, two years after Dorothy Hodgkin resolved the structure of insulin, the Protein Data Bank was established. It is one of the oldest public archives for sharing biological information, and is maintained by teams in the US, Japan and Europe that together form the Worldwide Protein Data Bank (wwPDB) organisation. In Europe, the PDBe team at EMBL-EBI handles European and African depositions to the PDB, and the Electron Microscopy Data Bank (EMDB) works hard to provide an accurate, useful and flexible resource that keeps pace with a fast-growing field.
A new angle
“When the PDB was first established, it contained only data from X-ray crystallography experiments,” says Gerard Kleywegt, team leader of PDBe at EMBL-EBI. “Over the past few decades, we have seen increasing amounts of data produced by new methods such as NMR, electron microscopy and solution scattering. This diversity is great for structural biologists, but we are not always able to handle diverse data types. Collaboration, dialogue and close ties with groups focused on different methods are absolutely essential to providing the best possible service to the entire community.”
A case in point is small-angle scattering (SAS) data. In the past, SAS-based models were occasionally deposited in the PDB alongside X-ray crystallography data, but as SAS techniques improved and increased in popularity, the wwPDB partners recognised a need for new expertise to manage these data. A wwPDB task force was established and in 2014, the Small-Angle Scattering Biological Data Bank (SASBDB) was launched by Dmitri Svergun’s group at EMBL Hamburg. From its inception, the SASBDB and PDBe teams have worked closely together, transferring knowledge and data between Hinxton and Hamburg.
“The PDBe team are very open,” says Alexey Kikhney, Senior Technical Officer in the Svergun group. “Their expertise has been invaluable – they have been able to advise us on problems and issues we might run into in terms of infrastructure, policies, standards and programming.”
A crystallographer’s single, painstakingly acquired structure is, for me, one of many making up a wealth of data I can mine. [Alejandro Panjkovich, EIPOD postdoc]
The Hinxton and Hamburg teams have several joint projects. “We are working on creating an interface that makes it easier for users to move between the SASBDB and the PDB,” says Kikhney.
Another technique on the rise is cryo-electron microscopy (cryo-EM). By firing beams of electrons at frozen samples, scientists obtain images that they can piece together to create atomic models and 3D representations, or ‘volume maps’, of the molecules in the sample. In 2002, the EMDB was established at EMBL-EBI as an archive for these volume maps. But it was not designed to handle the raw data.
“The raw data are crucial for scientists who want to understand and validate the structures, or develop new software tools for handling cryo-EM data,” explains Ardan Patwardhan, who heads the EMDB at EMBL-EBI. In 2014, the Electron Microscopy Pilot Image Archive (EMPIAR) was established by the PDBe to fill this gap.
“Developing EMPIAR was very challenging in terms of storing and transferring huge amounts of data,” says Patwardhan. “The largest entry is over 12 Terabytes, which would require thousands of DVDs if you were trying to store it yourself. The EM field is developing rapidly. We are working on tools for users to visualise 3D structures right from the atomic to the cellular level, opening up new avenues for knowledge dissemination and discovery.”
Thanks to these efforts behind the scenes, scientists now have a wealth of data at their fingertips.
“I look for common patterns in biology, and for that the PDBe is a very important resource,” says Alejandro Panjkovich, an EMBL Interdisciplinary Postdoc (EIPOD) supervised jointly by Svergun in Hamburg and Kleywegt at EMBL-EBI. “When we develop new computational approaches, we need to know if they will work on existing structures. The PDB represents a huge source of data we can search for reoccurring patterns, test new methods on, collect statistics and use as benchmarks.”
“The PDB is really essential to the work we do,” adds Joana Pereira, a PhD student in Victor Lamzin’s group at EMBL Hamburg who develops methods for validating protein models. “The PDB is a very well maintained archive, but even the ‘bad’ entries are useful for us. We can test our methods to improve these entries and find new methods for validating them.”
Bringing it together
Structural biology at EMBL covers the full spectrum of discovery: curiosity, training, research, analysis, quality control, data deposition, data sharing and molecule design. As part of research infrastructure networks such as ELIXIR, EMBL’s services are well placed to grow, continuing to enable research and discovery as technologies change. Through ELIXIR, the PDBe team recently teamed up with scientists in the Czech Republic to train researchers from Masaryk University in enriching structure data with value added information. If successful, the project will broaden the European base for structure annotation and bolster data expertise in the Czech Republic’s life-science community.
“The PDB plays a crucial role in structural biology research and development,” says Sameer Velankar, who leads PDBe content and integration at EMBL-EBI. “What started as a central repository for predominantly X-ray crystallography structures has evolved and diversified through constant dialogue with the community it serves. We are very much looking forward to another 20 years of helping scientists bring ‘structure to biology’.”