COMPUTATIONAL METHODS IN PROTEIN STRUCTURE, EVOLUTION AND NETWORKS.
MetadataShow full item record
The advent of new sequencing technology has resulted in the accumulation of a large amount of information on human DNA variation. In order to make sense of these data in the context of biology and medicine, new methods are needed both for analysis and for integration with other resources. In this work: 1) I studied the distribution pattern of human DNA variants across populations using data from the 1000 genomes project and investigated several evolutionary biology questions from the perspective of population genomics. I found population level support for trends previously observed between species, including selection against deleterious variants, and lower frequency of variants in highly expressed genes and highly connected genes. I was also able to show that the correlation between synonymous and non-synonymous variant levels is a consequence of both mutation prevalence variation across the genome and shared selection pressure. 2) I performed a systematic evaluation of the effectiveness of GWAS (Genome Wide Association Studies) for finding potential drug targets and discovered the method is very ineffective for this purpose. I proposed two reasons to explain this finding, selection against variants in drug targets and the relatively short length of drug target genes. I discovered that GWAS genes and drug targets are closely associated in the biological network, and on that basis, developed a machine learning algorithm to leverage the GWAS results for the identification of potential drug targets, making use of biological network information. As a result, I identified some potential drug repurposing opportunities. 3) I developed a method to increase the number of protein structure models available for interpreting the impact of human non-synonymous variants, important for not only the understanding the mechanisms of genetic disease but also in the study of human protein evolution. The method enables the impact of approximately 40% more missense variants to be reliably modeled. In summary, these three projects demonstrate that value of computational methods in addressing a wide range of problems in protein structure, evolution, and networks.