Biology Research Works
Permanent URI for this collectionhttp://hdl.handle.net/1903/13
Browse
3 results
Search Results
Item Simple statistical models predict C-to-U edited sites in plant mitochondrial RNA(Springer Nature, 2004-09-16) Cummings, Michael P; Myers, Daniel SRNA editing is the process whereby an RNA sequence is modified from the sequence of the corresponding DNA template. In the mitochondria of land plants, some cytidines are converted to uridines before translation. Despite substantial study, the molecular biological mechanism by which C-to-U RNA editing proceeds remains relatively obscure, although several experimental studies have implicated a role for cis-recognition. A highly non-random distribution of nucleotides is observed in the immediate vicinity of edited sites (within 20 nucleotides 5' and 3'), but no precise consensus motif has been identified. Data for analysis were derived from the the complete mitochondrial genomes of Arabidopsis thaliana, Brassica napus, and Oryza sativa; additionally, a combined data set of observations across all three genomes was generated. We selected datasets based on the 20 nucleotides 5' and the 20 nucleotides 3' of edited sites and an equivalently sized and appropriately constructed null-set of non-edited sites. We used tree-based statistical methods and random forests to generate models of C-to-U RNA editing based on the nucleotides surrounding the edited/non-edited sites and on the estimated folding energies of those regions. Tree-based statistical methods based on primary sequence data surrounding edited/non-edited sites and estimates of free energy of folding yield models with optimistic re-substitution-based estimates of ~0.71 accuracy, ~0.64 sensitivity, and ~0.88 specificity. Random forest analysis yielded better models and more exact performance estimates with ~0.74 accuracy, ~0.72 sensitivity, and ~0.81 specificity for the combined observations. Simple models do moderately well in predicting which cytidines will be edited to uridines, and provide the first quantitative predictive models for RNA edited sites in plant mitochondria. Our analysis shows that the identity of the nucleotide -1 to the edited C and the estimated free energy of folding for a 41 nt region surrounding the edited C are the most important variables that distinguish most edited from non-edited sites. However, the results suggest that primary sequence data and simple free energy of folding calculations alone are insufficient to make highly accurate predictions.Item Few amino acid positions in rpoB are associated with most of the rifampin resistance in Mycobacterium tuberculosis(Springer Nature, 2004-09-28) Cummings, Michael P; Segal, Mark RMutations in rpoB, the gene encoding the β subunit of DNA-dependent RNA polymerase, are associated with rifampin resistance in Mycobacterium tuberculosis. Several studies have been conducted where minimum inhibitory concentration (MIC, which is defined as the minimum concentration of the antibiotic in a given culture medium below which bacterial growth is not inhibited) of rifampin has been measured and partial DNA sequences have been determined for rpoB in different isolates of M. tuberculosis. However, no model has been constructed to predict rifampin resistance based on sequence information alone. Such a model might provide the basis for quantifying rifampin resistance status based exclusively on DNA sequence data and thus eliminate the requirements for time consuming culturing and antibiotic testing of clinical isolates. Sequence data for amino acid positions 511–533 of rpoB and associated MIC of rifampin for different isolates of M. tuberculosis were taken from studies examining rifampin resistance in clinical samples from New York City and throughout Japan. We used tree-based statistical methods and random forests to generate models of the relationships between rpoB amino acid sequence and rifampin resistance. The proportion of variance explained by a relatively simple tree-based cross-validated regression model involving two amino acid positions (526 and 531) is 0.679. The first partition in the data, based on position 531, results in groups that differ one hundredfold in mean MIC (1.596 μg/ml and 159.676 μg/ml). The subsequent partition based on position 526, the most variable in this region, results in a > 354-fold difference in MIC. When considered as a classification problem (susceptible or resistant), a cross-validated tree-based model correctly classified most (0.884) of the observations and was very similar to the regression model. Random forest analysis of the MIC data as a continuous variable, a regression problem, produced a model that explained 0.861 of the variance. The random forest analysis of the MIC data as discrete classes produced a model that correctly classified 0.942 of the observations with sensitivity of 0.958 and specificity of 0.885. Highly accurate regression and classification models of rifampin resistance can be made based on this short sequence region. Models may be better with improved (and consistent) measurements of MIC and more sequence data.Item Toward reconstructing the evolution of advanced moths and butterflies (Lepidoptera: Ditrysia): an initial molecular study(Springer Nature, 2009-12-02) Regier, Jerome C; Zwick, Andreas; Cummings, Michael P; Kawahara, Akito Y; Cho, Soowon; Weller, Susan; Roe, Amanda; Baixeras, Joaquin; Brown, John W; Parr, Cynthia; Davis, Donald R; Epstein, Marc; Hallwachs, Winifred; Hausmann, Axel; Janzen, Daniel H; Kitching, Ian J; Solis, M Alma; Yen, Shen-Horn; Bazinet, Adam L; Mitter, CharlesIn the mega-diverse insect order Lepidoptera (butterflies and moths; 165,000 described species), deeper relationships are little understood within the clade Ditrysia, to which 98% of the species belong. To begin addressing this problem, we tested the ability of five protein-coding nuclear genes (6.7 kb total), and character subsets therein, to resolve relationships among 123 species representing 27 (of 33) superfamilies and 55 (of 100) families of Ditrysia under maximum likelihood analysis. Our trees show broad concordance with previous morphological hypotheses of ditrysian phylogeny, although most relationships among superfamilies are weakly supported. There are also notable surprises, such as a consistently closer relationship of Pyraloidea than of butterflies to most Macrolepidoptera. Monophyly is significantly rejected by one or more character sets for the putative clades Macrolepidoptera as currently defined (P < 0.05) and Macrolepidoptera excluding Noctuoidea and Bombycoidea sensu lato (P ≤ 0.005), and nearly so for the superfamily Drepanoidea as currently defined (P < 0.08). Superfamilies are typically recovered or nearly so, but usually without strong support. Relationships within superfamilies and families, however, are often robustly resolved. We provide some of the first strong molecular evidence on deeper splits within Pyraloidea, Tortricoidea, Geometroidea, Noctuoidea and others. Separate analyses of mostly synonymous versus non-synonymous character sets revealed notable differences (though not strong conflict), including a marked influence of compositional heterogeneity on apparent signal in the third codon position (nt3). As available model partitioning methods cannot correct for this variation, we assessed overall phylogeny resolution through separate examination of trees from each character set. Exploration of "tree space" with GARLI, using grid computing, showed that hundreds of searches are typically needed to find the best-feasible phylogeny estimate for these data. Our results (a) corroborate the broad outlines of the current working phylogenetic hypothesis for Ditrysia, (b) demonstrate that some prominent features of that hypothesis, including the position of the butterflies, need revision, and (c) resolve the majority of family and subfamily relationships within superfamilies as thus far sampled. Much further gene and taxon sampling will be needed, however, to strongly resolve individual deeper nodes.