19-Dec-2022
The AlphaFold database, co-developed by DeepMind and EMBL-EBI, is home to hundreds of millions of predicted protein structures, determined using artificial intelligence. Advancing on the experimental work done for years by structural biologists, AlphaFold provides predictions on structures, with clear initial benefits including “improved structure studies and hypotheses in the near future”, according to Professor Sir David Stuart at the time of launch.
However, some gaps have been identified in the AlphaFold database – primarily that most predicted structures do not account for flexible regions, or that many proteins adopt a functional conformation in the presence of small molecules, cofactors and ligands.
The AlphaFill Databank has been formulated by the team at The Netherlands Cancer Institute (NKI), part of Instruct-NL to address some of these gaps. AlphaFill is a database of enriched AlphaFold models, in which small molecules are added to the predicted structures.
This addition provides greater context to the predicted structures found in the AlphaFold database, which enhances the possibility to determining both the structure and function of proteins.
Figure 1. The comparison between AlphaFold and AlphaFill, using a human myoglobin structure. a, AlphaFold diagram of human myoglobin. b, AlphaFold diagram of the heme-shaped cavity in the AlphaFold model with histidine side chains. c, AlphaFill diagram of the heme-shaped cavity, wherein the binding site is ‘filled’ with the transplanted heme group and the CO and O2 ligands.
Robbie Joosten of NKI said, “With AlphaFill, we hope to accelerate and deepen structure analysis by providing more context to 3D structure models. Knowing which co-factor binds tells you so much more about a protein's function than just having the protein part of the complex.”
The AlphaFill algorithm uses crystallographic structures from the PDB-REDO databank with more than 25% identity to an AlphaFold model over an aligned sequence of 85 residues, and identifies co-factors, ligands, and analogs of interest. 2,694 compounds were used in the study.
The algorithm then performs a local alignment of the “donor” models with the AlphaFold models using backbone atoms within 6Å from the compound of interest. After the alignment, the compound is transplanted into the AlphaFold model. This procedure was validated by comparing the root-mean-square deviation (r.m.s.d) of the transplants and their binding sites to 100% identity experimental structures.
The AlphaFill databank provides a tool for researchers to explore complexes with common ligands, in order to inform their structural biology research going forward. The authors also added the option for users to fill their own structure model.
Robbie Joosten concluded, “We have also seen people using AlphaFill to figure out what functional state of a protein is represented in an AlphaFold model. We have even had crystallographers use AlphaFill to help identify an unknown metal in their crystal structure.”