Theses and Dissertations from UMD

Permanent URI for this communityhttp://hdl.handle.net/1903/2

New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM

More information is available at Theses and Dissertations at University of Maryland Libraries.

Browse

Search Results

Now showing 1 - 10 of 15
  • Thumbnail Image
    Item
    Cellular Pattern Quantication and Automatic Bench-marking Data-set Generation on confocal microscopy images
    (2010) Cui, Chi; JaJa, Joseph; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The distribution, directionality and motility of the actin fibers control cell shape, affect cell function and are different in cancer versus normal cells. Quantification of actin structural changes is important for further understanding differences between cell types and for elucidation the effects and dynamics of drug interactions. We propose an image analysis framework to quantify the F-actin organization patterns in response to different pharmaceutical treatments.The main problems addressed include which features to quantify and what quantification measurements to compute when dealing with unlabeled confocal microscopy images. The resultant numerical features are very effective to profile the functional mechanism and facilitate the comparison of different drugs. The analysis software is originally implemented in Matlab and more recently the most time consuming part in the feature extraction stage is implemented onto the NVIDIA GPU using CUDA where we obtain 15 to 20 speedups for different sizes of image. We also propose a computational framework for generating synthetic images for validation purposes. The validation for the feature extraction is done by visual inspection and the validation for quantification is done by comparing them with well-known biological facts. Future studies will further validate the algorithms, and elucidate the molecular pathways and kinetics underlying the F-actin changes. This is the first study quantifying different structural formations of the same protein in intact cells. Since many anti-cancer drugs target the cytoskeleton, we believe that the quantitative image analysis method reported here will have broad applications to understanding the mechanisms of candidate pharmaceutical.
  • Thumbnail Image
    Item
    BIOLOGY AND EVOLUTION OF CHROMALVEOLATE PROTISTS
    (2010) Miller, John James; Delwiche, Charles F; Cell Biology & Molecular Genetics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Dinoflagellates and haptophytes are both prominent members of the marine phytoplankton and are considered chromalveolates. The interactions of the parasitic dinoflagellate Amoebophrya with its host dinoflagellate Akashiwo sanguinea were studied using cell biological techniques. The free-swimming dinospore stage of Amoebophrya has two flagella, trichocyts, striated strips, condensed chromatin resembling heterochromatin, and electron dense bodies. When entering the host cytoplasm and again when entering the host nucleus, the electron dense bodies appear in a tube of microtubules close to the surface of the host or its nucleus. Host entry is inhibited by cytochalasin D implying a role for microfilament polymerization in the entry process. While in the host cytoplasm, Amoebophrya appears to be separated from the host cytoplasm by two membranes. After entering the host nucleus, the parasite grows and undergoes mitosis forming a multinucleated trophont. The mastigocoel is an internal cavity that contains flagella and becomes the outside of the parasite after it leaves the host. This study indicates that the mastigocoel forms as a result of vesicle fusion. Eventually, Amoebophrya fills the host nucleus and takes on a beehive appearance. The beehive stage contains numerous trichocyts and striated strips. The level of chromatin condensation in intracellular trophonts is highly variable. It then exits its host as a multinucleated vermiform shaped creature, which then splits up into individual infective dinospores. A phylogenomic pipeline was designed to analyze the genome and evolutionary history of the haptophyte Emiliania huxleyi. It appears to have genes linking it to three lineages: heterokonts, green algae, and red algae. Genes with shared phylogenetic affinities appear to fit into limited functional categories and be physically localized in the genome. The phylogenetic affinities of E. huxleyi with the green algae may be an artifact of the much greater number of sequenced genomes from the Viridiplantae (=plants+ green algae) when compared to the rhodophytes. The evolutionary history of E. huxleyi is still unclear although they do seem to be similar in many ways to heterokonts and are generally believed to have red algae derived plastids.
  • Thumbnail Image
    Item
    IDENTIFICATION OF PUTATIVE O-REPEAT BIOSYNTHETIC GENES IN NEISSERIA SICCA 4320
    (2010) Miller, Clinton; Stein, Daniel C; Cell Biology & Molecular Genetics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Lipopolysaccharide (LPS) and lipooligosaccharide (LOS) are important virulence determinants found in gram-negative bacteria. LOS differs from LPS in that it lacks the O-repeat characteristic of LPS. While the genetic basis of LOS production in the pathogenic Neisseria has been extensively studied, little research has focused on the genetics underlying LOS production and resulting diversity in commensal Neisseria. A commensal strain that caused a fatal case of bacterial endocarditis, Neisseria sicca 4320, was found to produce a unique polysaccharide similar to the O-repeat of LPS in addition to typical Neisseria LOS. N. sicca 4320 was analyzed by bioinformatic and molecular biological gene-finding screens to identify putative O-repeat biosynthesis genes. Twenty-one open reading frames (ORFs) with similarity to other polysaccharide biosynthesis genes were located in the screens of N. sicca 4320. Two open reading frames with similarity to glycosyltransferases were found to be unique to N. sicca 4320.
  • Thumbnail Image
    Item
    Patterns and Complexity in Biological Systems: A Study of Sequence Structure and Ontology-based Networks
    (2010) Glass, Kimberly; Girvan, Michelle; Physics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Biological information can be explored at many different levels, with the most basic information encoded in patterns within the DNA sequence. Through molecular level processes, these patterns are capable of controlling the states of genes, resulting in a complex network of interactions between genes. Key features of biological systems can be determined by evaluating properties of this gene regulatory network. More specifically, a network-based approach helps us to understand how the collective behavior of genes corresponds to patterns in genetic function. We combine Chromatin-Immunoprecipitation microarray (ChIP-chip) data with genomic sequence data to determine how DNA sequence works to recruit various proteins. We quantify this information using a value termed "nmer-association.'' "Nmer-association'' measures how strongly individual DNA sequences are associated with a protein in a given ChIP-chip experiment. We also develop the "split-motif'' algorithm to study the underlying structural properties of DNA sequence independent of wet-lab data. The "split-motif'' algorithm finds pairs of DNA motifs which preferentially localize relative to one another. These pairs are primarily composed of known transcription factor binding sites and their co-occurrence is indicative of higher-order structure. This kind of structure has largely been missed in standard motif-finding algorithms despite emerging evidence of the importance of complex regulation. In both simple and complex regulation, two genes that are connected in a regulatory fashion are likely to have shared functions. The Gene Ontology (GO) provides biologists with a controlled terminology with which to describe how genes are associated with function and how those functional terms are related to each other. We introduce a method for processing functional information in GO to produce a gene network. We find that the edges in this network are correlated with known regulatory interactions and that the strength of the functional relationship between two genes can be used as an indicator of how informationally important that link is in the regulatory network. We also investigate the network structure of gene-term annotations found in GO and use these associations to establish an alternate natural way to group the functional terms. These groups of terms are drastically different from the hierarchical structure established by the Gene Ontology and provide an alternative framework with which to describe and predict the functions of experimentally identified groups of genes.
  • Thumbnail Image
    Item
    High Performance Computing for DNA Sequence Alignment and Assembly
    (2010) Schatz, Michael Christopher; Salzberg, Steven L; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Recent advances in DNA sequencing technology have dramatically increased the scale and scope of DNA sequencing. These data are used for a wide variety of important biological analyzes, including genome sequencing, comparative genomics, transcriptome analysis, and personalized medicine but are complicated by the volume and complexity of the data involved. Given the massive size of these datasets, computational biology must draw on the advances of high performance computing. Two fundamental computations in computational biology are read alignment and genome assembly. Read alignment maps short DNA sequences to a reference genome to discover conserved and polymorphic regions of the genome. Genome assembly computes the sequence of a genome from many short DNA sequences. Both computations benefit from recent advances in high performance computing to efficiently process the huge datasets involved, including using highly parallel graphics processing units (GPUs) as high performance desktop processors, and using the MapReduce framework coupled with cloud computing to parallelize computation to large compute grids. This dissertation demonstrates how these technologies can be used to accelerate these computations by orders of magnitude, and have the potential to make otherwise infeasible computations practical.
  • Thumbnail Image
    Item
    Whole-genome sequence analysis for pathogen detection and diagnostics
    (2010) Phillippy, Adam Michael; Salzberg, Steven L; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    This dissertation focuses on computational methods for improving the accuracy of commonly used nucleic acid tests for pathogen detection and diagnostics. Three specific biomolecular techniques are addressed: polymerase chain reaction, microarray comparative genomic hybridization, and whole-genome sequencing. These methods are potentially the future of diagnostics, but each requires sophisticated computational design or analysis to operate effectively. This dissertation presents novel computational methods that unlock the potential of these diagnostics by efficiently analyzing whole-genome DNA sequences. Improvements in the accuracy and resolution of each of these diagnostic tests promises more effective diagnosis of illness and rapid detection of pathogens in the environment. For designing real-time detection assays, an efficient data structure and search algorithm are presented to identify the most distinguishing sequences of a pathogen that are absent from all other sequenced genomes. Results are presented that show these "signature" sequences can be used to detect pathogens in complex samples and differentiate them from their non-pathogenic, phylogenetic near neighbors. For microarray, novel pan-genomic design and analysis methods are presented for the characterization of unknown microbial isolates. To demonstrate the effectiveness of these methods, pan-genomic arrays are applied to the study of multiple strains of the foodborne pathogen, Listeria monocytogenes, revealing new insights into the diversity and evolution of the species. Finally, multiple methods are presented for the validation of whole-genome sequence assemblies, which are capable of identifying assembly errors in even finished genomes. These validated assemblies provide the ultimate nucleic acid diagnostic, revealing the entire sequence of a genome.
  • Thumbnail Image
    Item
    Novel Methods for Metagenomic Analysis
    (2010) White, James Robert; Pop, Mihai; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    By sampling the genetic content of microbes at the nucleotide level, metagenomics has rapidly established itself as the standard in characterizing the taxonomic diversity and functional capacity of microbial populations throughout nature. The decreasing cost of sequencing technologies and the simultaneous increase of throughput per run has given scientists the ability to deeply sample highly diverse communities on a reasonable budget. The Human Microbiome Project is representative of the flood of sequence data that will arrive in the coming years. Despite these advancements, there remains the significant challenge of analyzing massive metagenomic datasets to make appropriate biological conclusions. This dissertation is a collection of novel methods developed for improved analysis of metagenomic data: (1) We begin with Figaro, a statistical algorithm that quickly and accurately infers and trims vector sequence from large Sanger-based read sets without prior knowledge of the vector used in library construction. (2) Next, we perform a rigorous evaluation of methodologies used to cluster environmental 16S rRNA sequences into species-level operational taxonomic units, and discover that many published studies utilize highly stringent parameters, resulting in overestimation of microbial diversity. (3) To assist in comparative metagenomics studies, we have created Metastats, a robust statistical methodology for comparing large-scale clinical datasets with up to thousands of subjects. Given a collection of annotated metagenomic features (e.g. taxa, COGs, or pathways), Metastats determines which features are differentially abundant between two populations. (4) Finally, we report on a new methodology that employs the generalized Lotka-Volterra model to infer microbe-microbe interactions from longitudinal 16S rRNA data. It is our hope that these methods will enhance standard metagenomic analysis techniques to provide better insight into the human microbiome and microbial communities throughout our world. To assist metagenomics researchers and those developing methods, all software described in this thesis is open-source and available online.
  • Thumbnail Image
    Item
    Anomaly Detection in Time Series: Theoretical and Practical Improvements for Disease Outbreak Detection
    (2009) Lotze, Thomas Harvey; Shmueli, Galit; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The automatic collection and increasing availability of health data provides a new opportunity for techniques to monitor this information. By monitoring pre-diagnostic data sources, such as over-the-counter cough medicine sales or emergency room chief complaints of cough, there exists the potential to detect disease outbreaks earlier than traditional laboratory disease confirmation results. This research is particularly important for a modern, highly-connected society, where the onset of disease outbreak can be swift and deadly, whether caused by a naturally occurring global pandemic such as swine flu or a targeted act of bioterrorism. In this dissertation, we first describe the problem and current state of research in disease outbreak detection, then provide four main additions to the field. First, we formalize a framework for analyzing health series data and detecting anomalies: using forecasting methods to predict the next day's value, subtracting the forecast to create residuals, and finally using detection algorithms on the residuals. The formalized framework indicates the link between the forecast accuracy of the forecast method and the performance of the detector, and can be used to quantify and analyze the performance of a variety of heuristic methods. Second, we describe improvements for the forecasting of health data series. The application of weather as a predictor, cross-series covariates, and ensemble forecasting each provide improvements to forecasting health data. Third, we describe improvements for detection. This includes the use of multivariate statistics for anomaly detection and additional day-of-week preprocessing to aid detection. Most significantly, we also provide a new method, based on the CuScore, for optimizing detection when the impact of the disease outbreak is known. This method can provide an optimal detector for rapid detection, or for probability of detection within a certain timeframe. Finally, we describe a method for improved comparison of detection methods. We provide tools to evaluate how well a simulated data set captures the characteristics of the authentic series and time-lag heatmaps, a new way of visualizing daily detection rates or displaying the comparison between two methods in a more informative way.
  • Thumbnail Image
    Item
    Highly Scalable Short Read Alignment with the Burrows-Wheeler Transform and Cloud Computing
    (2009) Langmead, Benjamin Thomas; Salzberg, Steven L; Pop, Mihai; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Improvements in DNA sequencing have both broadened its utility and dramatically increased the size of sequencing datasets. Sequencing instruments are now used regularly as sources of high-resolution evidence for genotyping, methylation profiling, DNA-protein interaction mapping, and characterizing gene expression in the human genome and in other species. With existing methods, the computational cost of aligning short reads from the Illumina instrument to a mammalian genome can be very large: on the order of many CPU months for one human genotyping project. This thesis presents a novel application of the Burrows-Wheeler Transform that enables the alignment of short DNA sequences to mammalian genomes at a rate much faster than existing hashtable-based methods. The thesis also presents an extension of the technique that exploits the scalability of Cloud Computing to perform the equivalent of one human genotyping project in hours.
  • Thumbnail Image
    Item
    A framework for discovering meaningful associations in the annotated life sciences Web
    (2009) Lee, Woei-jyh; Raschid, Louiqa; Tseng, Chau-Wen; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    During the last decade, life sciences researchers have gained access to the entire human genome, reliable high-throughput biotechnologies, affordable computational resources, and public network access. This has produced vast amounts of data and knowledge captured in the life sciences Web, and has created the need for new tools to analyze this knowledge and make discoveries. Consider a simplified Web of three publicly accessible data resources Entrez Gene, PubMed and OMIM. Data records in each resource are annotated with terms from multiple controlled vocabularies (CVs). The links between data records in two resources form a relationship between the two resources. Thus, a record in Entrez Gene, annotated with GO terms, can have links to multiple records in PubMed that are annotated with MeSH terms. Similarly, OMIM records annotated with terms from SNOMED CT may have links to records in Entrez Gene and PubMed. This forms a rich web of annotated data records. The objective of this research is to develop the Life Science Link (LSLink) methodology and tools to discover meaningful patterns across resources and CVs. In a first step, we execute a protocol to follow links, extract annotations, and generate datasets of termlinks, which consist of data records and CV terms. We then mine the termlinks of the datasets to find potentially meaningful associations between pairs of terms from two CVs. Biologically meaningful associations of pairs of CV terms may yield innovative nuggets of previously unknown knowledge. Moreover, the bridge of associations across CV terms will reflect the practice of how scientists annotate data across linked data repositories. Contributions include a methodology to create background datasets, metrics for mining patterns, applying semantic knowledge for generalization, tools for discovery, and validation with biological use cases. Inspired by research in association rule mining and linkage analysis, we develop two metrics to determine support and confidence scores in the associations of pairs of CV terms. Associations that have a statistically significant high score and are biologically meaningful may lead to new knowledge. To further validate the support and confidence metrics, we develop a secondary test for significance based on the hypergeometric distribution. We also exploit the semantics of the CVs. We aggregate termlinks over siblings of a common parent CV term and use them as additional evidence to boost the support and confidence scores in the associations of the parent CV term. We provide a simple discovery interface where biologists can review associations and their scores. Finally, a cancer informatics use case validates the discovery of associations between human genes and diseases.