COMPUTATIONAL METHODS IN MISSENSE MUTATION ANALYSIS: PHENOTYPES, PATHOGENICITY, AND PROTEIN ENGINEERING
MetadataShow full item record
Understanding the molecular, phenotypic, and pathogenic effects of mutations is of enormous importance in human disease research and protein engineering. Both create a demand for computational methods to leverage the explosion of new sequence data and to explore the vast space of possible protein modifications and designs. My study in this dissertation demonstrates the value of computational methods in these areas. First, I developed a new ensemble method to predict continuous phenotype values as well as binary pathogenicity and objectively tested it in CAGI (Critical Assessment of Genome Interpretation). In two recent CAGI challenges, the method was ranked third in predicting the enzyme activity of missense mutations in NAGLU (N-Acetyl-Alpha-Glucosaminidase) and second in predicting the relative growth rate of mutated human SUMO-ligase in a yeast complementation assay. I also demonstrated the effectiveness of the new ensemble method for addressing a key problem limiting the use of current mutation interpretation methods in the clinic – identifying which mutations can be assigned a pathogenic or benign status with high confidence. Next, I characterized and compared missense variants in monogenic disease and in cancer. The study revealed a number of properties of mutations in these two types of diseases, including: (a) methods based on sequence conservation properties are as effective for identifying cancer driver mutations as they are for monogenic disease mutations; (b) mutations in disordered regions of protein structure play a relatively small role in both classes of disease; (c) oncogenic mutations tend to be on the protein surface while tumor suppressors are concentrated in the core; (d) a large fraction of tumor suppressors act by destabilizing protein structure and (e) mutations in passenger genes display a surprisingly high level of deleteriousness. Finally, I applied computational methods to screen for appropriate mutations to enhance the thermostability of a catalytic domain of PlyC. This bacteriophage-derived endolysin has been demonstrated to have antimicrobial potential but its potential use is limited by its inherent thermosuseptibility. To prioritize stabilizing mutations, I conducted a rapid exhaustive survey of point mutations followed by validation using protein modeling and expert knowledge. The approach yielded three stabilizing mutants experimentally verified by our collaborators, with one particularly successful in terms of both thermal denaturation temperature and kinetic stability.