Biology Research Works
Permanent URI for this collectionhttp://hdl.handle.net/1903/13
Browse
Item Die Balz des Kolibris Selasphorus platycercus(Zoologische Jahrbücher. Abteilung für Systematic, Ökologie, und Geographie der Tiere, 1948) Wagner, H. O.; Inouye, David W.This article describes behavior of Broad-tailed Hummingbirds (Selasphorus platycercus) in Mexico, which the author describes (perhaps mistakenly) as mating behavior.Item Ascidian gene-expression profiles(Springer Nature, 2002-09-25) Jeffery, William RWith the advent of gene-expression profiling, a large number of genes can now be investigated simultaneously during critical stages of development. This approach will be particularly informative in studies of ascidians, basal chordates whose genomes and embryology are uniquely suited for mapping developmental gene networks.Item Visualization and analysis of microarray and gene ontology data with treemaps(Springer Nature, 2004-06-28) Baehrecke, Eric H; Dang, Niem; Babaria, Ketan; Shneiderman, BenThe increasing complexity of genomic data presents several challenges for biologists. Limited computer monitor views of data complexity and the dynamic nature of data in the midst of discovery increase the challenge of integrating experimental results with information resources. The use of Gene Ontology enables researchers to summarize results of quantitative analyses in this framework, but the limitations of typical browser presentation restrict data access. Here we describe extensions to the treemap design to visualize and query genome data. Treemaps are a space-filling visualization technique for hierarchical structures that show attributes of leaf nodes by size and color-coding. Treemaps enable users to rapidly compare sizes of nodes and sub-trees, and we use Gene Ontology categories, levels of RNA, and other quantitative attributes of DNA microarray experiments as examples. Our implementation of treemaps, Treemap 4.0, allows user-defined filtering to focus on the data of greatest interest, and these queried files can be exported for secondary analyses. Links to model system web pages from Treemap 4.0 enable users access to details about specific genes without leaving the query platform. Treemaps allow users to view and query the data from an experiment on a single computer monitor screen. Treemap 4.0 can be used to visualize various genome data, and is particularly useful for revealing patterns and details within complex data sets.Item Simple statistical models predict C-to-U edited sites in plant mitochondrial RNA(Springer Nature, 2004-09-16) Cummings, Michael P; Myers, Daniel SRNA editing is the process whereby an RNA sequence is modified from the sequence of the corresponding DNA template. In the mitochondria of land plants, some cytidines are converted to uridines before translation. Despite substantial study, the molecular biological mechanism by which C-to-U RNA editing proceeds remains relatively obscure, although several experimental studies have implicated a role for cis-recognition. A highly non-random distribution of nucleotides is observed in the immediate vicinity of edited sites (within 20 nucleotides 5' and 3'), but no precise consensus motif has been identified. Data for analysis were derived from the the complete mitochondrial genomes of Arabidopsis thaliana, Brassica napus, and Oryza sativa; additionally, a combined data set of observations across all three genomes was generated. We selected datasets based on the 20 nucleotides 5' and the 20 nucleotides 3' of edited sites and an equivalently sized and appropriately constructed null-set of non-edited sites. We used tree-based statistical methods and random forests to generate models of C-to-U RNA editing based on the nucleotides surrounding the edited/non-edited sites and on the estimated folding energies of those regions. Tree-based statistical methods based on primary sequence data surrounding edited/non-edited sites and estimates of free energy of folding yield models with optimistic re-substitution-based estimates of ~0.71 accuracy, ~0.64 sensitivity, and ~0.88 specificity. Random forest analysis yielded better models and more exact performance estimates with ~0.74 accuracy, ~0.72 sensitivity, and ~0.81 specificity for the combined observations. Simple models do moderately well in predicting which cytidines will be edited to uridines, and provide the first quantitative predictive models for RNA edited sites in plant mitochondria. Our analysis shows that the identity of the nucleotide -1 to the edited C and the estimated free energy of folding for a 41 nt region surrounding the edited C are the most important variables that distinguish most edited from non-edited sites. However, the results suggest that primary sequence data and simple free energy of folding calculations alone are insufficient to make highly accurate predictions.Item Few amino acid positions in rpoB are associated with most of the rifampin resistance in Mycobacterium tuberculosis(Springer Nature, 2004-09-28) Cummings, Michael P; Segal, Mark RMutations in rpoB, the gene encoding the β subunit of DNA-dependent RNA polymerase, are associated with rifampin resistance in Mycobacterium tuberculosis. Several studies have been conducted where minimum inhibitory concentration (MIC, which is defined as the minimum concentration of the antibiotic in a given culture medium below which bacterial growth is not inhibited) of rifampin has been measured and partial DNA sequences have been determined for rpoB in different isolates of M. tuberculosis. However, no model has been constructed to predict rifampin resistance based on sequence information alone. Such a model might provide the basis for quantifying rifampin resistance status based exclusively on DNA sequence data and thus eliminate the requirements for time consuming culturing and antibiotic testing of clinical isolates. Sequence data for amino acid positions 511–533 of rpoB and associated MIC of rifampin for different isolates of M. tuberculosis were taken from studies examining rifampin resistance in clinical samples from New York City and throughout Japan. We used tree-based statistical methods and random forests to generate models of the relationships between rpoB amino acid sequence and rifampin resistance. The proportion of variance explained by a relatively simple tree-based cross-validated regression model involving two amino acid positions (526 and 531) is 0.679. The first partition in the data, based on position 531, results in groups that differ one hundredfold in mean MIC (1.596 μg/ml and 159.676 μg/ml). The subsequent partition based on position 526, the most variable in this region, results in a > 354-fold difference in MIC. When considered as a classification problem (susceptible or resistant), a cross-validated tree-based model correctly classified most (0.884) of the observations and was very similar to the regression model. Random forest analysis of the MIC data as a continuous variable, a regression problem, produced a model that explained 0.861 of the variance. The random forest analysis of the MIC data as discrete classes produced a model that correctly classified 0.942 of the observations with sensitivity of 0.958 and specificity of 0.885. Highly accurate regression and classification models of rifampin resistance can be made based on this short sequence region. Models may be better with improved (and consistent) measurements of MIC and more sequence data.Item Appendices to "Colonization of thistles by biocontrol agents"(2005-02-02T13:25:30Z) Dodge, Gary; Louda, Svata; Inouye, DavidAppendices B, C, and D for a manuscript from Gary Dodge's dissertation research (Biology Department, UMCP).Item Construction of a bacterial artificial chromosome library from the spikemoss Selaginella moellendorffii: a new resource for plant comparative genomics(Springer Nature, 2005-06-14) Wang, Wenming; Tanurdzic, Milos; Luo, Meizhong; Sisneros, Nicholas; Kim, Hye Ran; Weng, Jing-Ke; Kudrna, Dave; Mueller, Christopher; Arumuganathan, K; Carlson, John; Chapple, Clint; de Pamphilis, Claude; Mandoli, Dina; Tomkins, Jeff; Wing, Rod A; Banks, Jo AnnThe lycophytes are an ancient lineage of vascular plants that diverged from the seed plant lineage about 400 Myr ago. Although the lycophytes occupy an important phylogenetic position for understanding the evolution of plants and their genomes, no genomic resources exist for this group of plants. Here we describe the construction of a large-insert bacterial artificial chromosome (BAC) library from the lycophyte Selaginella moellendorffii. Based on cell flow cytometry, this species has the smallest genome size among the different lycophytes tested, including Huperzia lucidula, Diphaiastrum digita, Isoetes engelmanii and S. kraussiana. The arrayed BAC library consists of 9126 clones; the average insert size is estimated to be 122 kb. Inserts of chloroplast origin account for 2.3% of the clones. The BAC library contains an estimated ten genome-equivalents based on DNA hybridizations using five single-copy and two duplicated S. moellendorffii genes as probes. The S. moellenforffii BAC library, the first to be constructed from a lycophyte, will be useful to the scientific community as a resource for comparative plant genomics and evolution.Item Microarray analysis of Pseudomonas aeruginosa reveals induction of pyocin genes in response to hydrogen peroxide(Springer Nature, 2005-09-08) Chang, Wook; Small, David A; Toghrol, Freshteh; Bentley, William EPseudomonas aeruginosa, a pathogen infecting those with cystic fibrosis, encounters toxicity from phagocyte-derived reactive oxidants including hydrogen peroxide during active infection. P. aeruginosa responds with adaptive and protective strategies against these toxic species to effectively infect humans. Despite advances in our understanding of the responses to oxidative stress in many specific cases, the connectivity between targeted protective genes and the rest of cell metabolism remains obscure. Pseudomonas aeruginosa, a pathogen infecting those with cystic fibrosis, encounters toxicity from phagocyte-derived reactive oxidants including hydrogen peroxide during active infection. P. aeruginosa responds with adaptive and protective strategies against these toxic species to effectively infect humans. Despite advances in our understanding of the responses to oxidative stress in many specific cases, the connectivity between targeted protective genes and the rest of cell metabolism remains obscure. This finding proposes that pyocin production might be another novel defensive scheme against oxidative attack by host cells.Item Genome re-annotation: a wiki solution?(Springer Nature, 2007-02-01) Salzberg, Steven LThe annotation of most genomes becomes outdated over time, owing in part to our ever-improving knowledge of genomes and in part to improvements in bioinformatics software. Unfortunately, annotation is rarely if ever updated and resources to support routine reannotation are scarce. Wiki software, which would allow many scientists to edit each genome's annotation, offers one possible solution.Item A cricket Gene Index: a genomic resource for studying neurobiology, speciation, and molecular evolution(Springer Nature, 2007-04-25) Danley, Patrick D; Mullen, Sean P; Liu, Fenglong; Nene, Vishvanath; Quackenbush, John; Shaw, Kerry LAs the developmental costs of genomic tools decline, genomic approaches to non-model systems are becoming more feasible. Many of these systems may lack advanced genetic tools but are extremely valuable models in other biological fields. Here we report the development of expressed sequence tags (EST's) in an orthopteroid insect, a model for the study of neurobiology, speciation, and evolution. We report the sequencing of 14,502 EST's from clones derived from a nerve cord cDNA library, and the subsequent construction of a Gene Index from these sequences, from the Hawaiian trigonidiine cricket Laupala kohalensis. The Gene Index contains 8607 unique sequences comprised of 2575 tentative consensus (TC) sequences and 6032 singletons. For each of the unique sequences, an attempt was made to assign a provisional annotation and to categorize its function using a Gene Ontology-based classification through a sequence-based comparison to known proteins. In addition, a set of unique 70 base pair oligomers that can be used for DNA microarrays was developed. All Gene Index information is posted at the DFCI Gene Indices web page. Orthopterans are models used to understand the neurophysiological basis of complex motor patterns such as flight and stridulation. The sequences presented in the cricket Gene Index will provide neurophysiologists with many genetic tools that have been largely absent in this field. The cricket Gene Index is one of only two gene indices to be developed in an evolutionary model system. Species within the genus Laupala have speciated recently, rapidly, and extensively. Therefore, the genes identified in the cricket Gene Index can be used to study the genomics of speciation. Furthermore, this gene index represents a significant EST resources for basal insects. As such, this resource is a valuable comparative tool for the understanding of invertebrate molecular evolution. The sequences presented here will provide much needed genomic resources for three distinct but overlapping fields of inquiry: neurobiology, speciation, and molecular evolution.Item A computational survey of candidate exonic splicing enhancer motifs in the model plant Arabidopsis thaliana(Springer Nature, 2007-05-21) Pertea, Mihaela; Mount, Stephen M; Salzberg, Steven LAlgorithmic approaches to splice site prediction have relied mainly on the consensus patterns found at the boundaries between protein coding and non-coding regions. However exonic splicing enhancers have been shown to enhance the utilization of nearby splice sites. We have developed a new computational technique to identify significantly conserved motifs involved in splice site regulation. First, 84 putative exonic splicing enhancer hexamers are identified in Arabidopsis thaliana. Then a Gibbs sampling program called ELPH was used to locate conserved motifs represented by these hexamers in exonic regions near splice sites in confirmed genes. Oligomers containing 35 of these motifs have been shown experimentally to induce significant inclusion of A. thaliana exons. Second, integration of our regulatory motifs into two different splice site recognition programs significantly improved the ability of the software to correctly predict splice sites in a large database of confirmed genes. We have released GeneSplicerESE, the improved splice site recognition code, as open source software. Our results show that the use of the ESE motifs consistently improves splice site prediction accuracy.Item Structure and evolution of a proviral locus of Glyptapanteles indiensis bracovirus(Springer Nature, 2007-06-26) Desjardins, Christopher A; Gundersen-Rindal, Dawn E; Hostetler, Jessica B; Tallon, Luke J; Fuester, Roger W; Schatz, Michael C; Pedroni, Monica J; Fadrosh, Douglas W; Haas, Brian J; Toms, Bradley S; Chen, Dan; Nene, VishvanathBracoviruses (BVs), a group of double-stranded DNA viruses with segmented genomes, are mutualistic endosymbionts of parasitoid wasps. Virus particles are replication deficient and are produced only by female wasps from proviral sequences integrated into the wasp genome. Virus particles are injected along with eggs into caterpillar hosts, where viral gene expression facilitates parasitoid survival and therefore perpetuation of proviral DNA. Here we describe a 223 kbp region of Glyptapanteles indiensis genomic DNA which contains a part of the G. indiensis bracovirus (GiBV) proviral genome. Eighteen of ~24 GiBV viral segment sequences are encoded by 7 non-overlapping sets of BAC clones, revealing that some proviral segment sequences are separated by long stretches of intervening DNA. Two overlapping BACs, which contain a locus of 8 tandemly arrayed proviral segments flanked on either side by ~35 kbp of non-packaged DNA, were sequenced and annotated. Structural and compositional analyses of this cluster revealed it exhibits a G+C and nucleotide composition distinct from the flanking DNA. By analyzing sequence polymorphisms in the 8 GiBV viral segment sequences, we found evidence for widespread selection acting on both protein-coding and non-coding DNA. Comparative analysis of viral and proviral segment sequences revealed a sequence motif involved in the excision of proviral genome segments which is highly conserved in two other bracoviruses. Contrary to current concepts of bracovirus proviral genome organization our results demonstrate that some but not all GiBV proviral segment sequences exist in a tandem array. Unexpectedly, non-coding DNA in the 8 proviral genome segments which typically occupies ~70% of BV viral genomes is under selection pressure suggesting it serves some function(s). We hypothesize that selection acting on GiBV proviral sequences maintains the genetic island-like nature of the cluster of proviral genome segments described herein. In contrast to large differences in the predicted gene composition of BV genomes, sequences that appear to mediate processes of viral segment formation, such as proviral segment excision and circularization, appear to be highly conserved, supporting the hypothesis of a single origin for BVs.Item Characterization of the dsDNA prophage sequences in the genome of Neisseria gonorrhoeae and visualization of productive bacteriophage(Springer Nature, 2007-07-05) Piekarowicz, Andrzej; Kłyż, Aneta; Majchrzak, Michał; Adamczyk-Popławska, Monika; Maugel, Timothy K; Stein, Daniel CBioinformatic analysis of the genome sequence of Neisseria gonorrhoeae revealed the presence of nine probable prophage islands. The distribution, conservation and function of many of these sequences, and their ability to produce bacteriophage particles are unknown. Our analysis of the genomic sequence of FA1090 identified five genomic regions (NgoΦ1 – 5) that are related to dsDNA lysogenic phage. The genetic content of the dsDNA prophage sequences were examined in detail and found to contain blocks of genes encoding for proteins homologous to proteins responsible for phage DNA replication, structural proteins and proteins responsible for phage assembly. The DNA sequences from NgoΦ1, NgoΦ2 and NgoΦ3 contain some significant regions of identity. A unique region of NgoΦ2 showed very high similarity with the Pseudomonas aeruginosa generalized transducing phage F116. Comparative analysis at the nucleotide and protein levels suggests that the sequences of NgoΦ1 and NgoΦ2 encode functionally active phages, while NgoΦ3, NgoΦ4 and NgoΦ5 encode incomplete genomes. Expression of the NgoΦ1 and NgoΦ2 repressors in Escherichia coli inhibit the growth of E. coli and the propagation of phage λ. The NgoΦ2 repressor was able to inhibit transcription of N. gonorrhoeae genes and Haemophilus influenzae HP1 phage promoters. The holin gene of NgoΦ1 (identical to that encoded by NgoΦ2), when expressed in E. coli, could serve as substitute for the phage λ s gene. We were able to detect the presence of the DNA derived from NgoΦ1 in the cultures of N. gonorrhoeae. Electron microscopy analysis of culture supernatants revealed the presence of multiple forms of bacteriophage particles. These data suggest that the genes similar to dsDNA lysogenic phage present in the gonococcus are generally conserved in this pathogen and that they are able to regulate the expression of other neisserial genes. Since phage particles were only present in culture supernatants after induction with mitomycin C, it indicates that the gonococcus also regulates the expression of bacteriophage genes.Item Genome assembly forensics: finding the elusive mis-assembly(Springer Nature, 2008-03-14) Phillippy, Adam M; Schatz, Michael C; Pop, MihaiWe present the first collection of tools aimed at automated genome assembly validation. This work formalizes several mechanisms for detecting mis-assemblies, and describes their implementation in our automated validation pipeline, called amosvalidate. We demonstrate the application of our pipeline in both bacterial and eukaryotic genome assemblies, and highlight several assembly errors in both draft and finished genomes. The software described is compatible with common assembly formats and is released, open-source, at http://amos.sourceforge.net .Item Genome sequence and rapid evolution of the rice pathogen Xanthomonas oryzae pv. oryzae PXO99A(Springer Nature, 2008-05-01) Salzberg, Steven L; Sommer, Daniel D; Schatz, Michael C; Phillippy, Adam M; Rabinowicz, Pablo D; Tsuge, Seiji; Furutani, Ayako; Ochiai, Hirokazu; Delcher, Arthur L; Kelley, David; Madupu, Ramana; Puiu, Daniela; Radune, Diana; Shumway, Martin; Trapnell, Cole; Aparna, Gudlur; Jha, Gopaljee; Pandey, Alok; Patil, Prabhu B; Ishihara, Hiromichi; Meyer, Damien F; Szurek, Boris; Verdier, Valerie; Koebnik, Ralf; Dow, J Maxwell; Ryan, Robert P; Hirata, Hisae; Tsuyumu, Shinji; Lee, Sang Won; Ronald, Pamela C; Sonti, Ramesh V; Van Sluys, Marie-Anne; Leach, Jan E; White, Frank F; Bogdanove, Adam JXanthomonas oryzae pv. oryzae causes bacterial blight of rice (Oryza sativa L.), a major disease that constrains production of this staple crop in many parts of the world. We report here on the complete genome sequence of strain PXO99A and its comparison to two previously sequenced strains, KACC10331 and MAFF311018, which are highly similar to one another. The PXO99A genome is a single circular chromosome of 5,240,075 bp, considerably longer than the genomes of the other strains (4,941,439 bp and 4,940,217 bp, respectively), and it contains 5083 protein-coding genes, including 87 not found in KACC10331 or MAFF311018. PXO99A contains a greater number of virulence-associated transcription activator-like effector genes and has at least ten major chromosomal rearrangements relative to KACC10331 and MAFF311018. PXO99A contains numerous copies of diverse insertion sequence elements, members of which are associated with 7 out of 10 of the major rearrangements. A rapidly-evolving CRISPR (clustered regularly interspersed short palindromic repeats) region contains evidence of dozens of phage infections unique to the PXO99A lineage. PXO99A also contains a unique, near-perfect tandem repeat of 212 kilobases close to the replication terminus. Our results provide striking evidence of genome plasticity and rapid evolution within Xanthomonas oryzae pv. oryzae. The comparisons point to sources of genomic variation and candidates for strain-specific adaptations of this pathogen that help to explain the extraordinary diversity of Xanthomonas oryzae pv. oryzae genotypes and races that have been isolated from around the world.Item Visual sensitivities tuned by heterochronic shifts in opsin gene expression(2008-05-23) Carleton, Karen L.; Spady, Tyrone C; Streelman, J. Todd; Kidd, Michael R.; McFarland, William N.; Loew, Ellis R.Background Cichlid fishes have radiated into hundreds of species in the Great Lakes of Africa. Brightly colored males display on leks and vie to be chosen by females as mates. Strong discrimination by females causes differential male mating success, rapid evolution of male color patterns and, possibly, speciation. In addition to differences in color pattern, Lake Malawi cichlids also show some of the largest known shifts in visual sensitivity among closely related species. These shifts result from modulated expression of seven cone opsin genes. However, the mechanisms for this modulated expression are unknown. Results In this work, we ask whether these differences might result from changes in developmental patterning of cone opsin genes. To test this, we compared the developmental pattern of cone opsin gene expression of the Nile tilapia, Oreochromis niloticus, with that of several cichlid species from Lake Malawi. In tilapia, quantitative polymerase chain reaction showed that opsin gene expression changes dynamically from a larval gene set through a juvenile set to a final adult set. In contrast, Lake Malawi species showed one of two developmental patterns. In some species, the expressed gene set changes slowly, either retaining the larval pattern or progressing only from larval to juvenile gene sets (neoteny). In the other species, the same genes are expressed in both larvae and adults but correspond to the tilapia adult genes (direct development). Conclusion Differences in visual sensitivities among species of Lake Malawi cichlids arise through heterochronic shifts relative to the ontogenetic pattern of the tilapia outgroup. Heterochrony has previously been shown to be a powerful mechanism for change in morphological evolution. We found that altering developmental expression patterns is also an important mechanism for altering sensory systems. These resulting sensory shifts will have major impacts on visual communication and could help drive cichlid speciation.Item Flowering phenology data, Rocky Mountain Biological Laboratory, 2005(2008-06-19) Inouye, David WilliamThese spreadsheets summarize data from a long-term study of the timing and variation of flowering collected by David Inouye (Professor, Department of Biology, UMCP) from permanent 2x2m plots at the Rocky Mountain Biological Laboratory. This submission contains a separate spreadsheet for each plot for 2005. Metadata for this project are available at the Web site of the Rocky Mountain Biological Laboratory (www.rmbl.org) and will also be deposited in DRUM.Item Flowering phenology data, Rocky Mountain Biological Laboratory, 2006(2008-06-23) Inouye, David WilliamThese spreadsheets summarize data from a long-term study of the timing and variation of flowering collected by David Inouye (Professor, Department of Biology, UMCP) from permanent 2x2m plots at the Rocky Mountain Biological Laboratory. This submission contains a separate spreadsheet for each plot for 2006. Metadata for this project are available at the Web site of the Rocky Mountain Biological Laboratory (www.rmbl.org) and will also be deposited in DRUM.Item Summary of long-term flowering data for Delphinium nuttallianum at the Rocky Mountain Biological Laboratory(2008-06-24) Inouye, David WilliamThese data come from a long-term study (still in progress as of 2008) of the phenology and abundance of flowering at the Rocky Mountain Biological Laboratory by David Inouye. Individual plots (e.g., Rocky Meadow Plot #7) are 2x2m, and are visited every other day to count all flowers of all species. This file summarizes data from 1973-2007 for 8 plots, including: the first date of flowering, date of the peak number of flowers, date of the peak number of inflorescences, the peak number of flowers counted, the peak number of inflorescences counted, and the length of the flowering period.Item Summary of long-term flowering data for Delphinium barbeyi at the Rocky Mountain Biological Laboratory(2008-06-24) Inouye, David WilliamThese data come from a long-term study (still in progress as of 2008) of the phenology and abundance of flowering at the Rocky Mountain Biological Laboratory by David Inouye. Individual plots (e.g., Wet Meadow Plot #1) are 2x2m, and are visited every other day to count all flowers of all species. This file summarizes data from 1973-2007 for 14 plots, including: the first date of flowering, date of the peak number of flowers, date of the peak number of inflorescences, the peak number of flowers counted, the peak number of inflorescences counted, and the length of the flowering period.