Molecular dynamics simulation and machine learning study of biological processes
Files
Publication or External Link
Date
Authors
Citation
DRUM DOI
Abstract
In this dissertation, I use computational techniques especially molecular dynamics (MD) and machine learning to study important biological processes. MD simulations can effectively be used to understand and investigate biologically relevant systems with lengths and timescales that are otherwise inaccessible to experimental techniques. These include but are not limited to thermodynamics and kinetics of protein folding, protein-ligand binding free energies, interaction of proteins with membranes, and designing new therapeutics for diseases with rational design strategies. The first chapter includes a detailed description of the computational methods including MD, Markov state modeling and deep learning. In the second chapter, we studied membrane active peptides using MD simulation and machine learning. Two cell penetrating peptides MPG and Hst5 were simulated in the presence of membrane. We showed that MPG enters the model membrane through its N-terminal hydrophobic residues while Hst5 remains attached to the phosphate layer. Formation of helical conformation for MPG helps its deeper insertion into membrane. Natural language processing (NLP) and deep generative modeling using a variational attention based variational autoencoder (VAE) was used to generate novel antimicrobial peptides. These in silico generated peptides have a high quality with similar physicochemical properties to real antimicrobial peptides. In the third chapter, we studied kinetics of protein folding using Markov state models and machine learning. We studied the kinetics of misfolding in β2-microglobulin using MSM analysis which gave us insights about the metastable states of β2m where the outer strands are unfolded and the hydrophobic core gets exposed to solvent and is highly amyloidogenic. In the next part of this chapter, we propose a machine learning model Gaussian mixture variational autoencoder (GMVAE) for simultaneous dimensionality reduction and clustering of MD simulations. The last part of this chapter is about a novel machine learning model GraphVAMPNet which uses graph neural networks and variational approach to markov processes for kinetic modeling of protein folding. In the last chapter, we study two membrane proteins, spike protein of SARS-COV-2 and EAG potassium channel using MD simulations. Binding free energy calculations using MMPBSA showed a higher binding affinity of receptor binding domain in SARS-COV-2 to its receptor ACE2 than SARS-COV which is one of the major reason for its higher infection rate. Hotspots of interaction were also identified at the interface. Glycans on the spike protein shield the spike from antibodies. Our MD simulation on the full length spike showed that glycan dynamics gives the spike protein an effective shield. However, breaches were found in the RBD at the open state for therapeutics using network analysis. In the last section, we study ligand binding to the PAS domain of EAG potassium channel and show that a residue Tyr71 blocks the binding pocket. Ligand binding inhibits the current through EAG channel.