Show simple item record

COMPUTATIONAL MODELING OF THE RELATIONSHIP BETWEEN SNPS AND DISEASE

dc.contributor.advisorMoult, Johnen_US
dc.contributor.authorYue, Pengen_US
dc.date.accessioned2006-02-04T07:16:25Z
dc.date.available2006-02-04T07:16:25Z
dc.date.issued2005-12-01en_US
dc.identifier.urihttp://hdl.handle.net/1903/3157
dc.description.abstractWe have developed two models, the stability model and the profile model, to identify non-synonymous single base changes (the most common cause of monogenic disease) that have deleterious effects on protein function in vivo. The stability model analyzes the effect of the resulting amino acid change on protein stability by utilizing structural information such as reduction in hydrophobic area and loss of electrostatic interactions. The profile model makes use of the conservation and type of residues observed at a base change position within a protein family. In each model, a machine learning technique, the support vector machine (SVM) was trained on a set of mutations causative of disease, and a control set of non-disease causing mutations. In jack-knifed testing, the stability model identifies 74% of disease mutations, with a false positive rate of 15%; the profile model identifies 80% of disease mutations, with a false positive rate of 10%. Evaluation of a set of in vitro mutagenesis data with the stability model established that the majority of disease mutations affect protein stability by 1 to 3 Kcal/mol. The stability model's effective distinction between disease and non-disease variants strongly supports the hypothesis that loss of protein stability is a major factor contributing to monogenic disease. Both models are used to identify deleterious SNPs in the human population. After carefully controlling of errors, we find that approximately one-fourth of the known non-synonymous SNPs are deleterious, thus providing a set of possible SNPs contributing to human complex disease traits. A web resource has been developed to provide information on disease/gene relationships at the molecular level. The resource has three primary modules. The first module is used to publish the deleterious SNPs identified by the two above-mentioned models. The second module identifies the candidate genes for a specific disease, and the third module provides information about the relationships between the sets of candidate genes. Disease/candidate gene relationships and gene-gene relationships are derived from the literature using a simple but effective text profiling method.en_US
dc.format.extent3092495 bytes
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.titleCOMPUTATIONAL MODELING OF THE RELATIONSHIP BETWEEN SNPS AND DISEASEen_US
dc.typeDissertationen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.contributor.departmentBiologyen_US
dc.subject.pqcontrolledBiology, Generalen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record