The latest webinar in the Instruct-ERIC Structure Meets Function series includes speakers from the Fragment-Screen project. The webinar series offers an insight into the very cutting edge of fragment screening research in structural biology, developing new methods and technologies for fragment-based drug discovery (FBDD).
This month we have two speakers who are integral to the Fragment-Screen project, developing innovative new solutions for optimisation of fragment screening technology, and for data management and interoperability. Genevieve Evans of EMBL-EBI will be exploring the cooperation of FBDD data and the Protein Data Bank in Europe (PDBe), then Jannis Born of IBM Research will outline their work with AI-based approaches to FBDD.
Talk 1: Genevieve Evans, EMBL-EBI
Title: Facilitating open science by considering: the fragment screening data, existing databases & extending data structure
Talk 2: Jannis Born, IBM Research
Title: Leveraging scientific language models for molecular discovery
Abstract: The discovery of novel molecules with desired properties is pivotal to our success in combating global challenges such as emerging diseases. However, navigating the discrete and practically infinite chemical search space while respecting a cascade of constraints presents a formidable challenge. Meanwhile, the success of language models, particularly Transformers, has extended into scientific domains, giving rise to the “scientific language models” that operate on, for example, small molecules, proteins or polymers. In this talk, we exploit analogies between natural language and organic chemistry to develop language models that may accelerate molecular discovery across various stages.
We begin by blending natural language with chemical language (SMILES) and propose a prompt-based multitask model that effectively solves various tasks (e.g., molecule captioning, text-based molecule design, reaction prediction or retrosynthesis). We then present the Generative Toolkit for Scientific Discovery (GT4SD), an open-source python package that provides a harmonized interface for researchers to train, fine-tune and deploy 30+ state-of-the-art molecular generative models.
Last, we present a computational workflow for protein-based molecular design that was developed in the FragmentScreen consortium. Given some binding affinity data for a target of interest, this workflow showcases the development of a virtual screening model and a two-stage generative process combining graph neural networks and language models. For the best in-silico hits of the generated virtual library, multistep retrosynthesis pathways can be produced with the IBM RXN for Chemistry webapp, thus aiding experimental synthesis. Our open-source workflow has been containerized and can be reproduced without the need of programming expertise.