Latest News

Finding and Accessing Archived Structural SARS-CoV-2 Data

30-Jul-2021

Since the COVID-19 pandemic started in early 2020, much of global research went into studying the virus – leading to an immense supply of structural biological data for the SARS-CoV-2 proteins.

For continuing research into the virus, it was crucial that this data was made openly available for researchers. EMBL-EBI and partners (including the European Commission, EU member states and Elixir) created the COVID-19 Data Portal, which collates data from a number of public archives, providing the latest structural data on SARS-CoV-2. Representatives of the structure resources at EMBL-EBI provide a quick guide to this bounty of structural data.

The COVID-19 Data Portal was set up in April 2020, bringing together datasets from various other sources, such as the Protein Data Bank (PDB), Protein Data Bank in Europe Knowledge Base (PDBe-KB), Electron Microscopy Data Bank (EMDB), and the Electron Microscopy Public Image Archive (EMPIAR).

This overarching resource (Figure 1) has proven invaluable for researchers across Europe to access the latest SARS-CoV-2 data.

 

Figure 1. The EMPIAR page in the Covid-19 Data Portal, also showing the total number of PDB, EMDB and EMPIAR entries related to the virus (circled in red).

 

Explore the COVID-19 Data Portal

 

Find out more about the individual structure resources and how to navigate them, using the below tabs.

 

 

Protein Data Bank in Europe - Knowledge Base

The PDBe-KB Covid-19 portal is a useful resource for finding out which viral proteins there are structures for in the PDB. This portal too was launched very early on in the pandemic by the PDBe-KB team at EMBL-EBI. The main page lists all viral proteins and for each a summary of the number of structures in the PDB, ligands, etc. (Figure 2).

Figure 2. Homepage of the PDBe-KB Covid-19 portal, summarising available structural information in the PDB archive.

 

It also includes links to PDBe-KB pages that provide a detailed summary and analysis of the structures of each unique protein across the PDB (Figure 3).

Figure 3. Example of a protein page in the PDBe-KB Covid-19 portal, in this case for spike protein.

 

The PDBe-KB protein pages also offer interactive 3D views of representatives of each of the unique conformational clusters observed for each protein, as well as a superposition of the protein structure clusters with all observed ligands (Figure 4).

 

Figure 4. Representatives of the unique conformational clusters and, superimposed on these, all bound ligands observed in hundreds of structures of SARS-CoV-2 spike protein.

 

At the time of writing, there are structures for all but a handful of SARS-CoV-2 proteins in the PDB and thus PDBe-KB. For some of the remaining proteins, DeepMind has made models available that have been predicted with its successful AlphaFold AI system (Figure 5). These can be downloaded from the DeepMind website.

 

 

Protein Data Bank in Europe

To find all SARS-CoV-2 structures currently in the PDB a single query of PDBe using the organism name suffices (Figure 5). Using the facets on the left (circled in red) it is easy to drill down to a subset of interest, e.g. by protein name, experimental method, etc. Using the tabs (circled in blue) aggregated information about unique macromolecules or ligands etc. in the selected set of entries is a mere mouse-click away.

Figure 5. Search results at PDBe for all SARS-CoV-2 structures in the PDB.

 

EMDB and EMPIAR

EMDB is the archive for cryo-EM maps and tomograms, many of them underpinning structures in the PDB, but some without structural interpretation. EMPIAR contains over a petabyte of mostly raw cryo-EM data, but increasingly also 3D data from electron and X-ray imaging modalities not covered by EMDB. Recently, EMDB launched a new website (Figure 6) which includes a powerful system for searching both archives using metadata - however, to find all SARS-CoV-2 entries in EMDB and EMPIAR, simply follow the green button, “SARS-CoV-2 entries”.

Figure 6. The new EMDB homepage, highlighting the search facility and the bespoke button to find all SARS-CoV-2 entries in EMDB and EMPIAR.

 

Figure 7. Results of searching EMDB and EMPIAR for all SARS-CoV-2 entries, using the bespoke button on the EMDB homepage (see Figure 6).

 

Note: All searches and screen grabs were made on 28 and 29 July 2021 and therefore reflect the contents of the archives and portals at that time.