Building an EMPIAR
EMPIAR, a new resource for raw, 2D electron microscopy images, lets researchers take a closer look at the images used to build 3D molecular structures.
- Researchers can browse, download and reprocess the thousands of raw, 2D images used to build a 3D structure
- Easy access to state-of-the-art raw data will drive the development of new and better validation methods, which will lead to better 3D structures
- Better structural data enables the development of medicines
Because the shape of a molecule determines how it functions, high-quality, three-dimensional images are critical for the development of interventions like medicines or herbicides. Structural biologists can assemble 3D images of molecules using high-resolution images produced by Electron cryo-microscopy (EM) – but as yet there is no single, agreed method for assembly. As a result, some published structures come into dispute and resolving the issue can be complicated due to lack of data.
Now, there is a place where the structural biology and imaging communities can access the raw data used to derive these structures and come to a consensus. EMPIAR (pdbe.org/empiar) lets researchers upload their raw data – which often amounts to hundreds of gigabytes – and download other raw datasets. EMPIAR sits alongside the Electron Microscopy Data Bank (EMDB), where 3D images are stored, and uses the fault-tolerant Aspera platform for data transfers. With large files, downloads sometimes ‘hang up’ mid-process; but Aspera lets you resume if a download fails.
The structural biology community has been crying out for these raw datasets, because validation is so important in what we do,
“The structural biology community has been crying out for these raw datasets, because validation is so important in what we do,” says Ardan Patwardhan, EMDB Coordinator at EMBL-EBI. “Now, the technology has matured to the point where we can build an archive that is fit for purpose. Anyone can now validate not only the published structure, but the methods used to calculate the resolution of a molecular structure.”
EMPIAR launches with 13 datasets, including raw data from the controversial HIV-1 glycoprotein structures. The datasets are all quite large; the biggest is almost half a terabyte. Once EMPIAR is up and running smoothly, the development team will focus its efforts on integrating EMPIAR with the EMDB and PDB, and other EMBL-EBI resources including UniProt and the Complex Portal.
EMPIAR (Electron Microscopy Pilot Image Archive) is funded by the UK’s Medical Research Council (MRC) and Biotechnology and Biological Sciences Research Council (BBSRC). EMPIAR is developed by the Protein Data Bank in Europe team at EMBL-EBI.
This post was originally published on EMBL-EBI News.