Theses and Dissertations from UMD

Permanent URI for this communityhttp://hdl.handle.net/1903/2

New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM

More information is available at Theses and Dissertations at University of Maryland Libraries.

Browse

Search Results

Now showing 1 - 8 of 8
  • Thumbnail Image
    Item
    GENOMICS ENABLED GENE DISCOVERY IN DIPLOID AND POLYPLOID WHEAT
    (2024) Yadav, Inderjit Singh; Tiwari, Vijay; Plant Science and Landscape Architecture (PSLA); Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Hexaploid bread wheat (Triticum aestivum) is one of the most important staple food crops for humans. Sustainable genetic improvement in wheat is critical for ensuring global food security and requires the introduction of new genes and alleles into elite wheat cultivars. The progenitor species and wild wheat relatives are a reservoir of genetic diversity for wheat improvement. This doctoral thesis demonstrates the application of genomic resources and bioinformatics pipelines to characterize the wild germplasm and to streamline the gene discovery pipeline using five diverse species involving wheat progenitor, wild, and related species. Genomics-assisted characterization of the genetic diversity present in gene banks is a major step towards the systematic utilization of unexploited germplasm to ensure the sustainable development of new varieties. Toward this end, we used genomics datasets to curate wild and related accessions of tetraploid wheat from two distinct species Triticum turgidum and Triticum timopheevii. Using Genotyping by sequencing (GBS) data and a unique similarity matrix and powercore analysis, a set of 102 accessions were identified as the core set accessions that represent 20 and 35 percent of the total accessions of the WGRC tetraploid wheat collection of T. turgidum and T. timopheevii, respectively. Further, three distinct centers of rich genetic diversity were identified for wild and domesticated emmer and T. timopheevii in the Fertile Crescent. GWAS analysis of the genotypic and phenotypic dataset identified a novel QTL for leaf rust resistance on chromosome 2B in T. timopheevii. Triticale is a man-made cereal derived from a cross between tetraploid and hexaploid wheat with diploid rye. There are large numbers of triticale germplasm available in different gene banks; however, in many cases, the ploidy information is not accurate and affects the quality of work with large triticale germplasm. In this work, using the low-cost GBS datasets, a pipeline was developed to detect contamination in the UMD triticale collection and facilitated the accurate classification of ploidy, ensuring the purity of the triticale germplasm. This approach identified contamination of 11 wheat accessions and enabled the correct classification of 236 hexaploid and 12 octoploid triticale, these results were further confirmed through GISH experiments. Wild and related germplasms are considered as the goldmine of genetic diversity for wheat improvement. The modern wheat cultivars have gone through several rounds of heavy selections for yield related traits and have lost the genetic diversity against several abiotic and biotic stresses. On the other hand, wild relatives of wheat have been growing naturally without any substantial artificial selection pressure and it allowed them to preserve their genetic diversity. This study investigates the genetic diversity of a selected set of genes to visualize the differences in wild wheat relatives and polyploid wheat cultivars. To study these differences, group 5 chromosome of Aegilops geniculata and Aegilops umbellulata, belonging to the tertiary gene pool, were assembled. Comparative analysis revealed a higher rate of pseudogenization in bread wheat compared to these two wild relatives, primarily due to the difference in exon/intron length between the genes, rendering these genes non-functional. Diploid einkorn wheat (Triticum monococcum), with inherent disease resistance, offers a valuable resource for wheat improvement. To facilitate its proper utilization, two of the reference genomes-one wild (T. monococcum ssp. aegilopoides) and one domesticated (T. monococcum ssp. monococcum) were assembled in the study. Kmer-GWAS identified seven novel QTLs associated with powdery mildew resistance, three for leaf rust resistance, and two for stem rust resistance. These QTLs harbor diverse gene classes encoding for resistance gene analogs, cysteine-rich receptor kinases, transcription factors, and G-type lectins. Overall, the knowledge and resources developed in this research would contribute to the characterization of vast germplasm and the development of climate-resilient wheat.
  • Thumbnail Image
    Item
    Algorithms for scalable and efficient population genomics and metagenomics
    (2022) Javkar, Kiran Gajanan; Pop, Mihai; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Microbes strongly impact human health and the ecosystem of which they are a part. Rapid improvements and decreasing costs in sequencing technologies have revolutionized the field of genomics and enabled important insights into microbial genome biology and microbiomes. However, new tools and approaches are needed to facilitate the efficient analysis of large sets of genomes and to associate genomic features with phenotypic characteristics better. Here, we built and utilized several tools for large-scale whole-genome analysis for different microbial characteristics, such as antimicrobial resistance and pathogenicity, that are important for human health. Chapters 2 and 3 demonstrate the needs and challenges of population genomics in associating antimicrobial resistance with genomic features. Our results highlight important limitations of reference database-driven analysis for genotype-phenotype association studies and demonstrate the utility of whole-genome population genomics in uncovering novel genomic factors associated with antimicrobial resistance. Chapter 4 describes PRAWNS, a fast and scalable bioinformatics tool that generates compact pan-genomic features. Existing approaches are unable to meet the needs of large-scale whole-genome analyses, either due to scalability limitations or the inability of the genomic features generated to support a thorough whole-genome assessment. We demonstrate that PRAWNS scales to thousands of genomes and provides a concise collection of genomic features which support the downstream analyses. In Chapter 5, we assess whether the combination of long and short-read sequencing can expedite the accurate reconstruction of a pathogen genome from a microbial community. We describe the challenges for pathogen detection in current foodborne illness outbreak monitoring. Our results show that the recovery of a pathogen genome can be accelerated using a combination of long and short-read sequencing after limited culturing of the microbial community. We evaluated several popular genome assembly approaches and identified areas for improvement. In Chapter 6, we describe SIMILE, a fast and scalable bioinformatics tool that enables the detection of genomic regions shared between several assembled metagenomes. In metagenomics, microbial communities are sequenced directly without culturing. Although metagenomics has furthered our understanding of the microbiome, comparing metagenomic samples is extremely difficult. We describe the need and challenges in comparing several metagenomic samples and present an approach that facilitates large-scale metagenomic comparisons.
  • Thumbnail Image
    Item
    Three Variations of Precision Medicine: Gene-Aware Genome Editing, Ancestry-Aware Molecular Diagnosis, and Clone-Aware Treatment Planning
    (2021) Sinha, Sanju; Ruppin, Eytan; Mount, Steve; Biology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    During my Ph.D., I developed several computational approaches to advance precision medicine for cancer prevention and treatment. My thesis presents three such approaches addressing these emerging challenges by analyzing large-scale cancer omics data from both pre-clinical models and patients datasets. In the first project, we studied the cancer risk associated with CRISPR-based therapies. Therapeutics based on CRISPR technologies (for which the chemistry Nobel prize was awarded in 2020) are poised to become widely applicable for treating a variety of human genetic diseases. However, preceding our work, two experimental studies have reported that genome editing by CRISPR–Cas9 can induce a DNA damage response mediated by p53 in primary cells hampering their growth. This could lead to an undesired selection of cells with pre-existing p53 mutations. Motivated by these findings, we conducted the first comprehensive computational and experimental investigation of the risk of CRISPR-induced selection of cancer gene mutants across many different cell types and lineages. I further studied whether this selection is dependent on the Cas9/sgRNA-delivery method and/or the gene being targeted. Importantly, we asked whether other cancer driver mutations may also be selected during CRISPR-Cas9 gene editing and identified that pre-existing KRAS mutants may also be selected for during CRISPR-Cas9 editing. In summary, we established that the risk of selection for pre-existing p53 or KRAS mutations is non-negligible, thus calling for careful monitoring of patients undergoing CRISPR-Cas9-based editing for clinical therapeutics for pre-existing p53 and KRAS mutations. In the second project, we aimed to delineate some of the molecular mechanisms that may underlie the observed differences in cancer incidences across cancer patients of different ancestries, focusing mainly on lung cancer. We found that lung tumors from African American (AA) patients exhibit higher genomic instability, homologous recombination deficiency, and aggressive molecular features such as chromothripsis. We next demonstrated that these molecular differences extend to many other cancer types. The prevalence of germline homologous recombination deficiency (HRD) is also higher in tumors from AAs, suggesting that at least some of the somatic differences observed may have genetic origins. Importantly, our findings provide a therapeutic strategy to treat tumors from AAs with high HRD, with agents such as PARP and checkpoint inhibitors, which is now further explored by our experimental collaborators. In the third project, we developed a new computational framework to leverage single-cell RNA-seq from patients’ tumors to guide optimal combination treatments that can target multiple clones in the tumor. We first showed that our predicted viability profile of multiple cancer drugs significantly correlates with their targeted pathway activity at a single-cell resolution, as one would expect. We apply this framework to predict the response to monotherapy and combination treatment in cell lines, patient-derived-cell lines, and most importantly, in a clinical trial of multiple myeloma patients. Following these validations, we next charted the landscape of optimal combination treatments of the existing FDA-approved drugs in multiple myeloma, providing a resource that could be used to potentially guide combination trials. Taken together, these results demonstrate the power of multi-omics analysis of cancer data to identify potential cancer risks and a strategy to mitigate, to shed light on molecular mechanisms underlying cancer disparity in AA patients, and point to possible ways to improve their treatment, and finally, we developed a new approach to treat cancer patients based on single-cell transcriptomics of their tumors.
  • Thumbnail Image
    Item
    GENOME ASSEMBLY AND VARIANT DETECTION USING EMERGING SEQUENCING TECHNOLOGIES AND GRAPH BASED METHODS
    (2018) Ghurye, Jay; Pop, Mihai; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The increased availability of genomic data and the increased ease and lower costs of DNA sequencing have revolutionized biomedical research. One of the critical steps in most bioinformatics analyses is the assembly of the genome sequence of an organism using the data generated from the sequencing machines. Despite the long length of sequences generated by third-generation sequencing technologies (tens of thousands of basepairs), the automated reconstruction of entire genomes continues to be a formidable computational task. Although long read technologies help in resolving highly repetitive regions, the contigs generated from long read assembly do not always span a complete chromosome or even an arm of the chromosome. Recently, new genomic technologies have been developed that can ''bridge" across repeats or other genomic regions that are difficult to sequence or assemble and improve genome assemblies by ''scaffolding" together large segments of the genome. The problem of scaffolding is vital in the context of both single genome assembly of large eukaryotic genomes and in metagenomics where the goal is to assemble multiple bacterial genomes in a sample simultaneously. First, we describe SALSA2, a method we developed to use interaction frequency between any two loci in the genome obtained using Hi-C technology to scaffold fragmented eukaryotic genome assemblies into chromosomes. SALSA2 can be used with either short or long read assembly to generate highly contiguous and accurate chromosome level assemblies. Hi-C data are known to introduce small inversion errors in the assembly, so we included the assembly graph in the scaffolding process and used the sequence overlap information to correct the orientation errors. Next, we present our contributions to metagenomics. We developed a scaffolding and variant detection method MetaCarvel for metagenomic datasets. Several factors such as the presence of inter-genomic repeats, coverage ambiguities, and polymorphic regions in the genomes complicate the task of scaffolding metagenomes. Variant detection is also tricky in metagenomes because the different genomes within these complex samples are not known beforehand. We showed that MetaCarvel was able to generate accurate scaffolds and find genome-wide variations de novo in metagenomic datasets. Finally, we present EDIT, a tool for clustering millions of DNA sequence fragments originating from the highly conserved 16s rRNA gene in bacteria. We extended classical Four Russians' speed up to banded sequence alignment and showed that our method clusters highly similar sequences efficiently. This method can also be used to remove duplicates or near duplicate sequences from a dataset. With the increasing data being generated in different genomic and metagenomic studies using emerging sequencing technologies, our software tools and algorithms are well timed with the need of the community.
  • Thumbnail Image
    Item
    GENETIC CONFLICT IN LAKE MALAWI CICHLIDS: B CHROMOSOMES AND SEX DETERMINATION
    (2019) Clark, Frances Elizabeth; Kocher, Thomas D.; Biology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    B chromosomes (Bs) are selfish genetic elements known to manipulate various cellular processes. These manipulations increase their transmission to the next generation, a process known as drive. After the recent discovery of Bs in African cichlid fish, sequence amplification methodologies were used to quantify B chromosome distribution in 7 species of Lake Malawi cichlids. In these species, Bs are limited to females and are haploid in the diploid genome. Considering various possible drive mechanisms, I propose this B chromosome drives by manipulating meiosis I in females. Genetic crosses quantifying B transmission in Metriaclima lombardoi confirmed transmission above Mendelian expectations. The transmission of this B also skews the sex ratio among progeny towards females. M. lombardoi individuals lacking Bs were shown, via a genetic linkage analysis, to have a male heterogametic (XY) sex determination system. A similar linkage analysis of families segregating B chromosomes indicated only the progeny lacking a B were influenced by this XY system. This substantiates the hypothesis that this B is a female sex determiner. Individuals of all 7 species were re-sequenced with short-reads and read coverage across the genome was compared in a coverage ratio analysis that resulted in the detection of 1.37 Mb in the reference genome with copies on the B, shared by all 7 species. Accounting for copy number of each sequence, 12-44 Mb of shared B sequence was identified. Amongst this sequence were 144 loci containing genes and gene fragments. A differential expression analysis found hundreds to thousands of differentially expressed loci between individuals with and without Bs, biased towards decreased expression in B individuals. Transcriptomes were analyzed for B-specific SNPs revealing 53 loci transcribed from the B chromosome and six candidate genes that might contribute to drive. I have described the distribution and behavior of the Lake Malawi cichlid B as well as captured a large portion of its sequence. This, combined with the genomic resources available for cichlids, makes this model system a valuable tool for future studies of the molecular mechanisms of drive, sequence structure and evolution of B chromosomes, and the association between B and sex chromosomes.
  • Thumbnail Image
    Item
    UNRAVELING THE EVOLUTIONARY HISTORY OF NOCTURNALITY IN THE STRISORES
    (2017) White, Noor; Carleton, Karen L; Braun, Michael J; Behavior, Ecology, Evolution and Systematics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Tracing the processes of adaptation is a fundamental practice in the study of evolutionary biology. By combining multiple lines of evidence, we can elucidate the processes of diversification, speciation, and ultimately, evolution. For my doctoral dissertation, I studied the evolutionary history of a superorder of birds (Strisores) that have undergone a dramatic life history transition, the shift from a day-living (diurnal) to a night-living (nocturnal) lifestyle. Previous study found that the diurnal Apodiformes (swifts and hummingbirds) are nested deep within the clade of nocturnal or crepuscular Caprimulgiformes (nightbirds). However, resolution of the other major lineages eluded previous efforts, precluding analysis of the evolution of nocturnality in this group. To resolve the phylogeny of Strisores, I utilized a novel class of genome-scale markers, ultraconserved elements (UCEs). UCEs are operationally defined regions of extreme conservation between two or more genomes. I collected and sequenced ~4,000 UCEs from each of 191 species of birds representing every major extant lineage, plus two crocodilian outgroups—a greater number of elements than had ever been collected or studied before. With this data, I have resolved the phylogeny of the largest and oldest (Caprimulgidae and Nyctibiidae, respectively) lineages of nightbirds, as well as the superorder Strisores, and have shed light on best practices for the use of UCEs in phylogenomics. With a phylogeny representing the evolutionary history of Strisores I then ask when, and where, potential adaptations to nocturnality occurred. To this end, I have developed a molecular tool to efficiently enrich 47 genes comprising the phototransduction cascade, a network of genes that converts the absorption of a photon by an opsin into a neural signal. I demonstrated that this tool is effective in 33 bird species chosen to cover extant avian diversity. The data captured using this array will facilitate the identification of potential molecular adaptations to nocturnality, enable the improvement of models predicting opsin sensitivity from sequence data, and allow strong inference about the perception of color across birds and other vertebrates.
  • Thumbnail Image
    Item
    MULTIVARIATE METHODS FOR HIGH-THROUGHPUT BIOLOGICAL DATA WITH APPLICATION TO COMPARATIVE GENOMICS
    (2015) Hsiao, Chiao-wen; Corrada Bravo, Héctor; Mathematics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Phenotypic variation in multi-cellular organisms arises as a result complex gene regulation mechanisms. Modern development of high-through technology opens up the possibility of genome-wide interrogation of aspects of these mechanisms across molecular phenotypes. Multivariate statistical methods provide convenient frameworks for modeling and analyzing data obtained from high-throughput experiments probing these complex aspects. This dissertation presents multivariate statistical methods to analyze data arising from two specific high-throughput molecular assays: (1) ribosome footprint profiling experiments, and (2) flow cytometry data. Ribosome footprint profiling describes an in vivo translation profile in a living cell and offers insights into the process of post-transcriptional gene regulation. Translation efficiency (TE) is a measure that quantifies the rate at which active translation is occurring for each gene – defined as the ratio of ribosome protected fragment count to mRNA fragment count. We introduce pairedSeq, an empirical covariance shrinkage method for differential testing of translation efficiency from sequencing data. The method draws on variance decomposition techniques in mixed-effect modeling and analysis of variance. Benchmark tests comparing to the existing methods reveals that pairedSeq effectively detects signals in genes with high variation in expression measurements across samples due to high co-variability between ribosome occupancy and transcript abundance. In contrast, existing methods tend to mistake genes with negative co-variability as signals, as a result of variance underestimation when not accounting for negative co-variability. We then present a genome-wide survey of primate species divergence at the translational and post-translational layer of gene regulation. FCM is routinely employed to characterize cellular characteristics such as mRNA and protein expression at the single-cell level. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. FlowMap-FR can be used to quantify the similarity between cell populations under scenarios of proportion differences and modest position shifts, and to identify situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. It has been implemented as a stand-alone R/Bioconductor package easily incorporated into current FCM data analytical workflows.
  • Thumbnail Image
    Item
    High Performance Computing for DNA Sequence Alignment and Assembly
    (2010) Schatz, Michael Christopher; Salzberg, Steven L; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Recent advances in DNA sequencing technology have dramatically increased the scale and scope of DNA sequencing. These data are used for a wide variety of important biological analyzes, including genome sequencing, comparative genomics, transcriptome analysis, and personalized medicine but are complicated by the volume and complexity of the data involved. Given the massive size of these datasets, computational biology must draw on the advances of high performance computing. Two fundamental computations in computational biology are read alignment and genome assembly. Read alignment maps short DNA sequences to a reference genome to discover conserved and polymorphic regions of the genome. Genome assembly computes the sequence of a genome from many short DNA sequences. Both computations benefit from recent advances in high performance computing to efficiently process the huge datasets involved, including using highly parallel graphics processing units (GPUs) as high performance desktop processors, and using the MapReduce framework coupled with cloud computing to parallelize computation to large compute grids. This dissertation demonstrates how these technologies can be used to accelerate these computations by orders of magnitude, and have the potential to make otherwise infeasible computations practical.