TOWARDS FULLY AUTOMATED ENHANCED SAMPLING OF NUCLEATION WITH MACHINE-LEARNING METHODS
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
Molecular dynamics (MD) simulation has become a powerful tool to model complex molecular dynamics in physics, materials science, biology, and many other fields of study as it is advantageous in providing temporal and spatial resolutions. However, phenomena of common research interest are often considered rare events, such as nucleation, protein conformational changes, and ligand binding, which occur on timescales far beyond what brute-force all-atom MD simulations can achieve within practical computer time. This makes MD simulation difficult for studying the thermodynamics and kinetics of rare events. Therefore, it is a common practice to employ enhanced sampling techniques to accelerate the sampling of rare events. Many of these methods require performing dimensionality reduction from atomic coordinates to a low-dimensional representation that captures the key information needed to describe such transitions.
To better understand the current challenges in studying crystal nucleation with computer simulations, the goal is to first apply developed dimensionality reduction methods to such systems. Here, I will present two studies on applying different machine learning (ML) methods to the study of crystal nucleation under different conditions, i.e., in vacuum and in solution. I investigated how such meaningful low-dimensional representations, termed reaction coordinates (RCs), were constructed as linear or non-linear combinations of features. Using these representations along with enhanced sampling methods, I achieved robust state-to-state back-and-forth transitions. In particular, I focused on the case of urea molecules, a small molecule composed of 8 atoms, which can be easily sampled and is commonly used in daily practice as fertilizer in agriculture and as a nitrogen source in organic synthesis. I then analyzed my samples and benchmarked them against other experimental and computational studies.
Given the challenges in studying crystal nucleation using molecular dynamics simulations, I aim to introduce new methods to facilitate research in this field. In the second half of the dissertation, I focused on presenting novel methods to learn low-dimensional representations directly from atomic coordinates without the aid of a priori known features, utilizing advanced machine learning techniques. To test my methods, I applied them to several representative model systems, including Lennard Jones 7 clusters, alanine dipeptide, and alanine tetrapeptide. The first system is known for its well-documented dynamics in colloidal rearrangements relevant to materials science studies, while the latter two systems represent problems related to conformational changes in biophysical studies. Beyond model systems, I also applied my methods to more complex physical systems in the field of materials science, specifically iron atoms and glycine molecules. Notably, the enhanced sampling method integrated with my approaches successfully sampled robust state-to-state transitions between allotropes of iron and polymorphs of glycine.