Thumbnail Image


Publication or External Link





High-throughput sequencing methods now provide extensive data on disease-related human genetic variants. New methods are required to maximally utilize these data for enhanced understanding and treatment of human diseases. This dissertation describes my work in addressing three aspects of this challenge: Determining disease-causative variants; representing mechanisms by which genetic variant(s) cause disease phenotypes; and quantitatively analyzing genetic disease mechanisms.

First, I developed a variant prioritization algorithm, VarP, and objectively tested it in CAGI (Critical Assessment of Genome Interpretation). It was ranked best in the CAGI challenge on interpreting panel sequencing data for 106 patients, determining which disease class each patient has and the corresponding causative variant(s). VarP correctly identified the disease class for 36 cases, including 10 where the original clinical pipeline failed, and found seven cases with strong evidence of an alternative disease to that tested. Over-reliance on pathogenicity annotations in the HGMD mutation database led to several incorrect cases. Post analysis showed that protein structure data could have helped to interpret the impact of many prioritized missense variants.

Next, I co-developed and implemented MecCog, a web-based graphical framework to represent mechanisms by which genetic variants cause disease phenotypes. A MecCog mechanism schema displays the propagation of system perturbations across stages of biological organization, using graphical notations to symbolize perturbed entities and activities, knowledge gaps, ambiguities and uncertainties, and hyperlinked evidence. The web platform enables a user to construct, store, publish, browse, query, and comment on schemas. MecCog facilitates better comprehension of disease mechanisms, identification of critical unanswered questions on causal relationships, and possible new sites of therapeutic intervention.

Finally, I developed a framework to quantitatively represent and analyze mechanisms relating genetic variants to complex trait disease. It involves generating a computable circuit from MecCog schemas by assigning node functions and parameters to represent the behavior of the schema components. I demonstrate that such a circuit can be used to analyze the effect size of a variant contributing to disease risk as a function of the genetic background in an individual and the extent to which epistatic effects may be masked in population averages. I also show that the circuit functions and parameters can be learned in a data-driven manner using a hybrid neural network approach.