Theses and Dissertations from UMD

Permanent URI for this communityhttp://hdl.handle.net/1903/2

New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM

More information is available at Theses and Dissertations at University of Maryland Libraries.

Browse

Search Results

Now showing 1 - 5 of 5
  • Thumbnail Image
    Item
    Semantic Foundations For Formalizing Brain Cancer Profiles
    (2019) Abraham, Joel; Austin, Mark; Celiku, Orieta; Systems Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    With the advent of whole-genome DNA sequencing technologies, tailoring of medical treatment to individual patients based on their genetic makeup has become the vanguard of modern medicine. One such area that can benefit from individualized medicine is that of brain and other Central Nervous System (CNS) cancers. The prognosis of malignant brain cancers is among the worst due to the heterogeneity and complexity of these tumors and their micro-environment. We present a framework that combines data mining and machine learning techniques with semantic approaches for building a clinically-relevant knowledge base of brain cancer profiles. We construct clusters of patients based on the similarity of their profiles using the k-means clustering algorithm and extract relevant molecular attributes of these clusters to classify instances of the clusters. We create a semantic model with ontologies, rule checking and reasoning, to enable rational therapeutic regimen selection. Finally, we lay the foundation to incorporate this framework into a digital twin architecture of a patient.
  • Thumbnail Image
    Item
    Genomic analysis of bacteriophages from non-O157 shiga toxin-producing Escherichia coli
    (2015) Tang, Shuai; Meng, Jianghong; Food Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Shiga toxin-producing Escherichia coli (STEC) is the fiercest pathotype among all diarrheic E. coli. STEC O157:H7 has been a predominant serotype for STEC in the United States. However, an increasing number of cases of infections by STEC other than O157 have been reported in recent years. Shiga toxin (Stx) is the most important virulence factor of STEC and is encoded by stx, which is introduced into STEC genome by bacteriophages through gene transduction. A detailed understanding about Stx bacteriophages is necessary to reveal the emergence and pathogenicity of STEC. The very unstable genomes of Stx bacteriophages result in a dynamic phenotypic versatility including virulence, host cell repertoire and tolerance to adversities. Sequencing technology enables us to generate genomic sequence data of bacteriophages at an affordable cost. The project aimed at obtaining genomic DNA sequences of Stx bacteriophages of non-O157 STEC isolates of diverse serotypes. Thirteen bacteriophages were successfully induced from 83 STEC isolates of serotypes O74, O111, O121, O130, O163, O179 and O183. The bacteriophage DNA samples were collected and sequenced using MiSeq Desktop Sequencer (Illumina®, Inc). Automatically assembled sequences were manually compared to E. coli genome sequence available from NCBI (NC_000913.3) to verify the reliability of the sequencing results. Nine verified bacteriophage sequences were aligned to two Stx bacteriophage genomes of NCBI (NC_000924.1 and NC_018846.1) and visible alignment results were obtained. A phylogenetic relationship of the nine phages and the two reference sequences was constructed and gene profiles of each sample sequences were identified. The comparative analysis indicated that recombination events occurred in probacteriophages showed traces. Similarity of bacteriophage genomes correlated to the serotypes of host bacteria based on the comparison of phylogenetic tree and STEC serotypes. Gene identification results showed that nucleotide variance does not show region specificity, silent mutations are frequent in housekeeping genes and virulence genes are conservative among phage samples.
  • Thumbnail Image
    Item
    Novel methods for comparing and evaluating single and metagenomic assemblies
    (2015) Hill, Christopher Michael; Pop, Mihai; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The current revolution in genomics has been made possible by software tools called genome assemblers, which stitch together DNA fragments “read” by sequencing machines into complete or nearly complete genome sequences. Despite decades of research in this field and the development of dozens of genome assemblers, assessing and comparing the quality of assembled genome sequences still heavily relies on the availability of independently determined standards, such as manually curated genome sequences, or independently produced mapping data. The focus of this work is to develop reference-free computational methods to accurately compare and evaluate genome assemblies. We introduce a reference-free likelihood-based measure of assembly quality which allows for an objective comparison of multiple assemblies generated from the same set of reads. We define the quality of a sequence produced by an assembler as the conditional probability of observing the sequenced reads from the assembled sequence. A key property of our metric is that the true genome sequence maximizes the score, unlike other commonly used metrics. Despite the unresolved challenges of single genome assembly, the decreasing costs of sequencing technology has led to a sharp increase in metagenomics projects over the past decade. These projects allow us to better understand the diversity and function of microbial communities found in the environment, including the ocean, Arctic regions, other living organisms, and the human body. We extend our likelihood-based framework and show that we can accurately compare assemblies of these complex bacterial communities. After an assembly has been produced, it is not an easy task determining what parts of the underlying genome are missing, what parts are mistakes, and what parts are due to experimental artifacts from the sequencing machine. Here we introduce VALET, the first reference-free pipeline that flags regions in metagenomic assemblies that are statistically inconsistent with the data generation process. VALET detects mis-assemblies in publicly available datasets and highlights the current shortcomings in available metagenomic assemblers. By providing the computational methods for researchers to accurately evalu- ate their assemblies, we decrease the chance of incorrect biological conclusions and misguided future studies.
  • Thumbnail Image
    Item
    RNA-SEQUENCING ANALYSIS: READ ALIGNMENT AND DISCOVERY AND RECONSTRUCTION OF FUSION TRANSCRIPTS
    (2013) Kim, Daehwan; Salzberg, Steven L; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    RNA-sequencing technologies, which sequence the RNA molecules being transcribed in cells, allow us to explore the process of transcription in exquisite detail. One of the primary goals of RNA sequencing analysis is to reconstruct the full set of transcripts (isoforms) of genes that were present in the original cells. In addition to the transcript structures, experimenters need to estimate the expression levels for all transcripts. The first step in the analysis process is to map the RNA-seq reads against the reference genome, which provides the location from which the reads originated. In contrast to DNA sequence alignment, RNA-seq mapping algorithms have two additional challenges. First, any RNA-seq alignment program must be able to handle gapped alignment (or spliced alignment) with very large gaps due to introns, typically from 50-100,000 bases in mammalian genomes. Second, the presence of processed pseudogenes from which introns have been removed may cause many exon-spanning reads to map incorrectly. In order to cope with these problems effectively, I have developed new alignment algorithms and implemented them in TopHat2, a second version of TopHat (one of the first spliced aligners for RNA-seq reads). The new TopHat2 program can align reads of various lengths produced by the latest sequencing technologies, while allowing for variable-length insertions and deletions with respect to the reference genome. TopHat2 combines the ability to discover novel splice sites with direct mapping to known transcripts, producing more sensitive and accurate alignments, even for highly repetitive genomes or in the presence of processed pseudogenes. These new capabilities will contribute to improvements in the quality of downstream analysis. In addition to its splice junction mapping algorithm, I have developed novel algorithms to align reads across fusion break points, which result from the breakage and re-joining of two different chromosomes, or from rearrangements within a chromosome. Based on this new fusion alignment algorithm, I have developed TransFUSE, one of the first systems for reconstruction and quantification of full- length fusion gene transcripts. TransFUSE can be run with or without known gene annotations, and it can discover novel fusion transcripts that are transcribed from known or unknown genes.
  • Thumbnail Image
    Item
    Comparative genomic analysis of Vibrio cholerae O31: capsule, O-antigen, pathogenesis and genome
    (2006-11-21) Chen, Yuansha; Morris, J Glenn; Marine-Estuarine-Environmental Sciences; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Vibrio cholerae is the causative agent of cholera. In order to understand the genetic basis underlying the emergence of novel epidemic strains of V. cholerae, the genetics of surface polysaccharide biogenesis, and the role of lateral gene transfer in the evolution of this species, we investigated. NRT36S and A5 are both NAG-ST producing, cholera toxin negative, serogroup O31 V. cholerae. NRT36S is encapsulated and causes diarrhea when administered to volunteers; A5 is acapsular and does not colonize or cause illness in humans. The structure of the capsular (CPS) polysaccharide in NRT36S was determined by NMR. The gene cluster of CPS biogenesis was identified by transposon mutagenesis combined with whole genome sequencing data. The CPS gene cluster shared the same genetic locus as that of the O-antigen of lipopolysaccharide (LPS) biogenesis gene cluster. The LPS biogenesis regions in A5 were similar to NRT36S except that a 6.5 kb fragment in A5 replaced a 10 kb fragment in NRT36S in the middle of the LPS gene cluster. The genome of NRT36S was sequenced to a draft containing 174 contigs plus the superintegron region. Besides confirming the existence of NAG-ST, we also identified the genes for a type three secretion system (TTSS), a putative exotoxin, and two different RTX genes. Four pili systems were also identified. Therefore, the genome of non-O1 Vibrio cholerae NRT36S demonstrates the presence of pathogenic mechanisms that are distinct from O1 V. cholerae. We conclude that lateral gene transfer plays a critical role in the emergence of new strains. The co-location of CPS and LPS could provide a mechanism for simultaneous emergence of new O and K antigens in a single strain. Our data also highlights the apparent mobility within the CPS/LPS region that would provide a basis for the large number of observed V. cholerae serogroups and the emergence of novel epidemic strains.