Thumbnail Image


Publication or External Link






We have investigated the properties of three sets of human missense genetic variations: cancer somatic mutations, monogenic disease causing mutations, and population SNPs, from the point of view of their impact on molecular function, distribution propensity in different protein structure environments, and disease mechanism.

Cancer genome sequencing projects have identified a large number of somatic missense mutations in cancers. We have used two analysis methods in the SNPs3D software package to assess the impact of these variants on protein function in vivo. One method identifies those mutations that significantly destabilize three dimensional protein structure, and the other detects all types of effect on protein function, utilizing sequence conservation. Data from a set of breast and colorectal tumors were analyzed. In known cancer genes, approaching 100% of missense mutations are found to impact protein function, supporting the view that these methods are appropriate for identifying driver mutations. Overall, we estimate that 50% to 60% of all somatic missense mutations have a high impact on structure stability or more generally affect the function of the corresponding proteins. This fraction is similar to the fraction of all possible missense mutations that have high impact, and much higher than the corresponding one for human population SNPs, at about 30%. We found that the majority of mutations in tumor suppressors destabilize protein structure, while mutations in oncogenes operate in more varied ways, including destabilization of the less active conformational states. A set of possible drivers with high impact is suggested.

We also studied a set of germline missense variants in phenylalanine hydroxylase, found in phenylketonuria (PKU) patients. With the aid of SNPs3D, we reinforced the previous finding that a high proportion of disease missense mutations affect protein stability, rather than other aspects of protein structure and function. We then focused on the relationship between the presence of these stability damaging missense mutations and the corresponding experimental data for the level and activity of the PAH protein product present under `in vivo' like conditions. We found that, overall, destabilizing mutations result in substantially lower protein levels, but with the maintenance of wild type like specific activity. The overall agreement between predicted stability impact and experimental evidence for lower protein levels is high, and in accordance with the previous estimates of error rates for the methods.

We next investigated the involvement of missense single base variants in the interface between two interacting proteins and their role in disease. This work consisted of three steps: first, mapping of variants onto the protein structure and identification of those in the interaction interfaces; second, distribution enrichment analysis in three structure locations (protein interior, surface, and interface); and third, impact analysis with SNPs3D. Nearly a quarter of disease causing mutations are mapped onto protein interfaces, with a strong propensity for the heteromeric interfaces, indicating that interruption of functional contacts between proteins is a significant disease mechanism. We found the enrichment propensity in the interfaces is intermediate between protein surface and interior for all three types of variants considered, namely SNPs, inter-species variants, and disease mutations. We also found missense SNPs and inter-species variants share the same enrichment pattern, with a relatively high density on the protein surface and depletion in the interior. In contrast, the disease mutations display the reverse pattern, with interior and interface the most susceptible places.