Cell Biology & Molecular Genetics

Permanent URI for this communityhttp://hdl.handle.net/1903/11811

Browse

Search Results

Now showing 1 - 5 of 5
  • Thumbnail Image
    Item
    Critical assessment of methods of protein structure prediction (CASP)—Round XV
    (Wiley, 2023-11-02) Kryshtafovych, Andriy; Schwede, Torsten; Topf, Maya; Fidelis, Krzysztof; Moult, John
    Computing protein structure from amino acid sequence information has been a long-standing grand challenge. Critical assessment of structure prediction (CASP) conducts community experiments aimed at advancing solutions to this and related problems. Experiments are conducted every 2 years. The 2020 experiment (CASP14) saw major progress, with the second generation of deep learning methods delivering accuracy comparable with experiment for many single proteins. There is an expectation that these methods will have much wider application in computational structural biology. Here we summarize results from the most recent experiment, CASP15, in 2022, with an emphasis on new deep learning-driven progress. Other papers in this special issue of proteins provide more detailed analysis. For single protein structures, the AlphaFold2 deep learning method is still superior to other approaches, but there are two points of note. First, although AlphaFold2 was the core of all the most successful methods, there was a wide variety of implementation and combination with other methods. Second, using the standard AlphaFold2 protocol and default parameters only produces the highest quality result for about two thirds of the targets, and more extensive sampling is required for the others. The major advance in this CASP is the enormous increase in the accuracy of computed protein complexes, achieved by the use of deep learning methods, although overall these do not fully match the performance for single proteins. Here too, AlphaFold2 based method perform best, and again more extensive sampling than the defaults is often required. Also of note are the encouraging early results on the use of deep learning to compute ensembles of macromolecular structures. Critically for the usability of computed structures, for both single proteins and protein complexes, deep learning derived estimates of both local and global accuracy are of high quality, however the estimates in interface regions are slightly less reliable. CASP15 also included computation of RNA structures for the first time. Here, the classical approaches produced better agreement with experiment than the new deep learning ones, and accuracy is limited. Also, for the first time, CASP included the computation of protein–ligand complexes, an area of special interest for drug design. Here too, classical methods were still superior to deep learning ones. Many new approaches were discussed at the CASP conference, and it is clear methods will continue to advance.
  • Thumbnail Image
    Item
    Breaking the conformational ensemble barrier: Ensemble structure modeling challenges in CASP15
    (Wiley, 2023-10-23) Kryshtafovych, Andriy; Montelione, Gaetano T.; Rigden, Daniel J.; Mesdaghi, Shahram; Karaca, Ezgi; Moult, John
    or the first time, the 2022 CASP (Critical Assessment of Structure Prediction) community experiment included a section on computing multiple conformations for protein and RNA structures. There was full or partial success in reproducing the ensembles for four of the nine targets, an encouraging result. For protein structures, enhanced sampling with variations of the AlphaFold2 deep learning method was by far the most effective approach. One substantial conformational change caused by a single mutation across a complex interface was accurately reproduced. In two other assembly modeling cases, methods succeeded in sampling conformations near to the experimental ones even though environmental factors were not included in the calculations. An experimentally derived flexibility ensemble allowed a single accurate RNA structure model to be identified. Difficulties included how to handle sparse or low-resolution experimental data and the current lack of effective methods for modeling RNA/protein complexes. However, these and other obstacles appear addressable.
  • Thumbnail Image
    Item
    SNPs3D: Candidate gene and SNP selection for association studies
    (Springer Nature, 2006-03-22) Yue, Peng; Melamud, Eugene; Moult, John
    The relationship between disease susceptibility and genetic variation is complex, and many different types of data are relevant. We describe a web resource and database that provides and integrates as much information as possible on disease/gene relationships at the molecular level. The resource http://www.SNPs3D.org has three primary modules. One module identifies which genes are candidates for involvement in a specified disease. A second module provides information about the relationships between sets of candidate genes. The third module analyzes the likely impact of non-synonymous SNPs on protein function. Disease/candidate gene relationships and gene-gene relationships are derived from the literature using simple but effective text profiling. SNP/protein function relationships are derived by two methods, one using principles of protein structure and stability, the other based on sequence conservation. Entries for each gene include a number of links to other data, such as expression profiles, pathway context, mouse knockout information and papers. Gene-gene interactions are presented in an interactive graphical interface, providing rapid access to the underlying information, as well as convenient navigation through the network. Use of the resource is illustrated with aspects of the inflammatory response and hypertension. The combination of SNP impact analysis, a knowledge based network of gene relationships and candidate genes, and access to a wide range of data and literature allow a user to quickly assimilate available information, and so develop models of gene-pathway-disease interaction.
  • Thumbnail Image
    Item
    GWAS and drug targets
    (Springer Nature, 2014-05-20) Cao, Chen; Moult, John
    Genome wide association studies (GWAS) have revealed a large number of links between genome variation and complex disease. Among other benefits, it is expected that these insights will lead to new therapeutic strategies, particularly the identification of new drug targets. In this paper, we evaluate the power of GWAS studies to find drug targets by examining how many existing drug targets have been directly 'rediscovered' by this technique, and the extent to which GWAS results may be leveraged by network information to discover known and new drug targets. We find that only a very small fraction of drug targets are directly detected in the relevant GWAS studies. We investigate two possible explanations for this observation. First, we find evidence of negative selection acting on drug target genes as a consequence of strong coupling with the disease phenotype, so reducing the incidence of SNPs linked to the disease. Second, we find that GWAS genes are substantially longer on average than drug targets and than all genes, suggesting there is a length related bias in GWAS results. In spite of the low direct relationship between drug targets and GWAS reported genes, we found these two sets of genes are closely coupled in the human protein network. As a consequence, machine-learning methods are able to recover known drug targets based on network context and the set of GWAS reported genes for the same disease. We show the approach is potentially useful for identifying drug repurposing opportunities. Although GWA studies do not directly identify most existing drug targets, there are several reasons to expect that new targets will nevertheless be discovered using these data. Initial results on drug repurposing studies using network analysis are encouraging and suggest directions for future development.
  • Thumbnail Image
    Item
    Insights from GWAS: emerging landscape of mechanisms underlying complex trait disease
    (Springer Nature, 2015-06-18) Pal, Lipika R; Yu, Chen-Hsin; Mount, Stephen M; Moult, John
    There are now over 2000 loci in the human genome where genome wide association studies (GWAS) have found one or more SNPs to be associated with altered risk of a complex trait disease. At each of these loci, there must be some molecular level mechanism relevant to the disease. What are these mechanisms and how do they contribute to disease? Here we consider the roles of three primary mechanism classes: changes that directly alter protein function (missense SNPs), changes that alter transcript abundance as a consequence of variants close-by in sequence, and changes that affect splicing. Missense SNPs are divided into those predicted to have a high impact on in vivo protein function, and those with a low impact. Splicing is divided into SNPs with a direct impact on splice sites, and those with a predicted effect on auxiliary splicing signals. The analysis was based on associations found for seven complex trait diseases in the classic Wellcome Trust Case Control Consortium (WTCCC1) GWA study and subsequent studies and meta-analyses, collected from the GWAS catalog. Linkage disequilibrium information was used to identify possible candidate SNPs for involvement in disease mechanism in each of the 356 loci associated with these seven diseases. With the parameters used, we find that 76% of loci have at least of these mechanisms. Overall, except for the low incidence of direct impact on splice sites, the mechanisms are found at similar frequencies, with changes in transcript abundance the most common. But the distribution of mechanisms over diseases varies markedly, as does the fraction of loci with assigned mechanisms. Many of the implicated proteins have previously been suggested as relevant, but the specific mechanism assignments are new. In addition, a number of new disease relevant proteins are proposed. The high fraction of GWAS loci with proposed mechanisms suggests that these classes of mechanism play a major role. Other mechanism types, such as variants affecting expression of genes remote in the DNA sequence, will contribute in other loci. Each of the identified putative mechanisms provides a hypothesis for further investigation.