Investigating biomolecular rare events with artificial intelligence augmented molecular dynamics

Thumbnail Image
Publication or External Link
Wang, Yihang
Tiwary, Pratyush
To decipher the biomolecular interaction mechanism play an important role in understanding the mystery of life. Understanding the mechanism of receptor-ligand interactions is of great importance both in context of fundamental biology and medical applications. Limited by the spatial or temporal resolution, wet-lab experiments may not be able to capture enough details to unravel the complicated interaction mechanisms. Molecular dynamics (MD) simulation, as computational tool to study many-body systems with atomic resolution, has emerged as a powerful tool to investigate the physical or biochemical properties of biomoleculars. Though MD simulation has its advantage in terms of its high spacial and temporal resolution over experimental methods, a big gap still remains between the time scale that can be reached by MD simulations and the time scale of the biological process that we want to study. In this thesis, I explore two frameworks to utilize the power of statistical mechanics, molecular dynamics, and matching learning to rectify this gap, and thus enable the simulation study of ligand-receptors interaction without overwhelming demand on computational resources. Firstly, I propose the reweighted autoencoded variational Bayes for enhanced sampling (RAVE) method, a new iterative scheme that uses the deep learning framework of information bottleneck to enhance sampling in molecular simulations. RAVE involves iterations between molecular simulations and deep learning in order to produce an increasingly accurate probability distribution along a low- dimensional latent space that captures the key features of the molecular simulation trajectory. RAVE determines an optimum, yet nonetheless physically interpretable, reaction coordinate and optimum probability distribution. Both then directly serve as the biasing protocol for a new biased simulation, which is once again fed into the deep learning module with appropriate weights accounting for the bias, the procedure continuing until estimates of desirable thermodynamic observables are converged. The usefulness and reliability of RAVE is demonstrated by applying it to two test-pieces, studying processes slower than milliseconds, calculating free energies, kinetics and critical mutations. I also systematically study the following questions: (a) the choice of a predictive time-delay in RAVE, or how far into the future should the machine learning model try to predict the state of a given system output from MD, and (b) for short time-delays, how much of an error is made in approximating the biased propagator for the dynamics as the unbiased propagator. I demonstrate through a master equation framework as to why the exact choice of time-delay is irrelevant as long as a small non-zero value is adopted. I also derive a correction to the objective function by reweighting the biased propagator, which better approximates the unbiased objective function without incurring extra computational overhead. To promote our understanding of RNA-ligand interaction at the molecular level, I use RAVE and collaborate with experimentalists to study the interplay between two ligands and PreQ1, which is a widely-studied model for RNA-small molecule recognition. I show that site-specific flexibility profiles from our simulations are in excellent agreement with in vitro measurements of flexibility using Selective 2’ Hydroxyl Acylation analyzed by Primer Extension and Mutational Profiling (SHAPE-MaP). And with orders of magnitude simulation speedup attained by RAVE, I can directly observe ligand dissociation for cognate and synthetic ligands from a PreQ1 riboswitch system. The artificial intelligence-argumented simulations reproduce known binding affinity profiles for the cognate and synthetic ligands, and pinpoint how both ligands make use of different aspects of riboswitch flexibility. On the basis of the dissociation trajectories, I also make and validate predictions of pairs of mutations for both the ligand systems that would show differing binding affinities. These mutations are distal to the binding site and could not have been predicted solely on the basis of structure. Secondly, I develop a framework based on statistical mechanics and generative Artificial Intelligence to use simulations or experiments performed at some set of temperatures to learn about the physics or chemistry at some other arbitrary temperature. Specifically, I use denoising diffusion probabilistic models, and show how these models in combination with replica exchange molecular dynamics achieve superior sampling of the biomolecular energy landscape at temperatures that were never even simulated without assuming any particular slow degrees of freedom. The key idea is to treat the temperature as a fluctuating random variable and not a control parameter as is usually done. This allows us to directly sample from the joint probability distribution in configuration and temperature space. The results are demonstrated for a chirally symmetric peptide and single-strand ribonucleic acid undergoing conformational transitions in all-atom water. I demonstrate how we can discover transition states and metastable states that were previously unseen at the temperature of interest, and even bypass the need to perform further simulations for wide range of temperatures. At the same time, any unphysical states are easily identifiable through very low Boltzmann weights. The procedure while shown here for a class of molecular simulations should be more generally applicable to mixing information across simulations and experiments with varying control parameters.