Cell Biology & Molecular Genetics Theses and Dissertations

Permanent URI for this collectionhttp://hdl.handle.net/1903/2750

Browse

Search Results

Now showing 1 - 10 of 24
  • Item
    Cell Population Shifts and Clinical Heterogeneity in Sjögren's Disease
    (2024) Pranzatelli, Thomas J; Johnson, Philip L.F.; Cell Biology & Molecular Genetics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Sjögren's disease (SjD) is a systemic autoimmune disease that causes loss of function of the salivary and lacrimal glands. Those with the disease, overwhelmingly female with an onset of disease in the fourth or fifth decade of life, commonly suffer from dry mouth, cavities and damage to the eyes. Patients present with a wide variety of clinical phenotypes, with variation in degree of immune infiltration and glandular damage as well as positivity for autoantibodies. This thesis uncovers the changes in cell population and gene expression in the gland that underpin diversity in disease severity. SjD patients lose the majority of a specific epithelial population in their labial salivary glands and, as the number of immune infiltrates grows the surviving members of this population can be found colocalizing with invading GZMK+ T cells and expressing markers of increased proliferation. Standard differential gene expression analysis highlighted gene markers of cell types changing in proportion with disease; an unenlightening result when the cell population changes are well-characterized. To avoid this pitfall an ensemble of random forests was trained to find genes predictive of patient subtypes without being correlated with diagnosis. Genes with high importance for autoantibody positivity were enriched for GO terms related to antigen processing and presentation. A master regulator of salivary gland identity, ZBTB7B, was identified from chromatin accessibility data. Mice with this transcription factor knocked out lose salivary flow and develop pockets of tissue in their glands that resemble other glands, eg., labial gland epithelium inside of parotid glands. This work supports a clinical presentation-specific approach to therapy and paves the path for reengineering the glands to correct the effects of disease.
  • Thumbnail Image
    Item
    TRANSLATION, REPLICATION AND TRANSCRIPTOMICS OF THE SIMPLEST PLUS-STRAND RNA PLANT VIRUSES
    (2024) Johnson, Philip Zhao; Simon, Anne E; Cell Biology & Molecular Genetics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Plus (+)-strand RNA viruses are among the most common pathogens of plants and animals. Furthermore, they present model systems for the study of basic biological processes, including protein translation and RNA replication, and shed light on the versatile roles that RNA structures play in these processes. After cell entry, the next step in the (+)-strand RNA viral life cycle is translation of the viral genome to produce the viral RNA-dependent RNA polymerase (RdRp) and associated replication proteins necessary for viral replication to occur. For many (+)-strand RNA viruses lacking a 5´cap and 3´ poly(A) tail, translation depends upon RNA structural elements within their genomes capable of hijacking the host translation machinery, which for plant viruses are commonly located in their 3´ proximal regions and are termed 3´ cap-independent translation enhancer (CITE) elements. In Chapter 2, I report upon my work characterizing a new subclass of panicum mosaic virus-like translation enhancer (PTE) elements, which bind and co-opt for viral use the host translation initiation factor 4E (eIF4E) – the translation initiation factor normally responsible for binding and recognition of mRNA 5´caps during canonical eukaryotic translation initiation. Thus, PTE 3´CITEs present a novel mechanism for co-opting the critical host factor eIF4E. My work characterizing a new subclass of PTE 3´CITEs further revealed characteristics common among all PTE 3´CITEs pertaining to their mechanism of binding eIF4E.After translation of the necessary viral replication proteins, replication of the viral RNA occurs, which again is in large part mediated by RNA structural elements within the viral genome that can bind to the viral RdRp and/or host factors involved in viral replication. Indeed, RNA structural elements often serve dual roles in viral translation and replication and/or are located proximal to RNA structural elements involved in the alternate function. In Chapter 3, I discuss my work characterizing novel replication elements in the 3´ terminal regions of umbraviruses (family Tombusviridae). The uncovered replication elements appear to be specific to umbraviruses and are located immediately upstream of replication/translation elements that are common throughout the Tombusviridae, lending greater complexity to the already complex 3´ proximal structures of umbraviruses. While the study of (+)-strand RNA viruses has historically focused on their protein-coding transcripts, (+)-strand RNA viruses also commonly produce additional non-coding transcripts, including recombinant defective RNAs, typically containing 5´ and 3´ co-terminal viral genome segments, and (+/-)-foldback RNAs, composed of complementary (+)- and (-)-strand viral sequences joined together. Long non-coding RNAs that accumulate to high levels have also been reported for plant and animal (+)-strand RNA viruses in recent years, and truncations of viral transcripts also commonly arise due to host nuclease activity and/or premature termination of replication elongation by the viral RdRp. The rise of long-read high-throughput sequencing technologies such as nanopore sequencing presents an opportunity to fully map the complexity of (+)-strand RNA viral transcriptomes. In Chapter 4, I present my work performing this analysis, employing direct RNA nanopore sequencing, in which the transcripts present in an RNA sample of interest are directly sequenced. This analysis revealed for the umbra-like virus citrus yellow vein-associated virus (CY1): (i) three novel 5´ co-terminal long non-coding RNAs; (ii) D-RNA population dynamics; (iii) a common 3´ terminal truncation of 61 nt among (+)-strand viral transcripts; (iv) missing 3´ terminal CCC-OH motif in virtually all (-)-strand reads; (v) major timepoint- and tissue-specific differences; and (vi) an abundance of (+/-)-foldback RNAs at later infection timepoints in leaf tissues. This work also sheds light on the current shortcomings of direct RNA nanopore sequencing as a technique. Finally, the importance of RNA structural biology in the study of (+)-strand RNA viruses presents the need for specialized RNA structure drawing software with functionality to easily control the layout of nucleobases, edit base-pairs, and annotate/color the nucleobases and bonds in a drawing. It is through the visual exploration of RNA structures that RNA biologists routinely improve upon the outputs of RNA structure prediction programs and perform crucial phylogenetic analyses among related RNA structures. Large RNA structures, such as whole viral genomes thousands of nucleotides long, can only be studied in their entirety with the aid of RNA structure visualization tools. To this end, I have developed over the course of my doctoral education the 2D RNA structure drawing application RNAcanvas, which is available as a web app and has grown popular among the RNA biology community. RNAcanvas emphasizes graphical mouse-based interaction with RNA structure drawings and has special functionality well suited for the drawing and exploration of large RNA structures, such as automatic layout adjustment and maintenance, complementary sequence highlighting, motif finding, and performance optimizations. Large viral structures such as that of the 2.7 kb CY1 genomic RNA could not have been characterized without the aid of RNAcanvas. In Chapter 5, I present my work developing RNAcanvas.
  • Thumbnail Image
    Item
    Mixture Models for Nucleic Acid Sequence Feature Analysis
    (2023) Wang, Bixuan; Mount, Stephen M; Cell Biology & Molecular Genetics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Signals in nucleotide sequences play a crucial role in interactions among macromolecules and the regulation of biological functional processes such as transcription, the splicing of messenger RNA precursors and translation. Recognition of signals in nucleotide sequences is the first step in functional annotation, which is critical for the identification of deleterious mutations and the identification of targets for disease treatment. One of the essential steps in gene expression, RNA splicing removes introns from newly transcribed RNA, ligating exons to generate mature RNA. Splicing involves the formation and recycling of the spliceosome, a large macromolecular complex whose assembly requires complex coordination by splicing factors through the recognition of RNA-protein binding sites. One potential method to reveal unknown subtypes of samples and identify distinctively distributed features is by applying a mixture model called the admixture model or Latent Dirichlet Allocation (LDA), which allows samples to have partial memberships of different clusters that can be interpreted for functional motif identification. By applying mixture models to RNA sequences, I found splicing signals such as the polypyrimidine tract and the branch point in intron sequences. Mixture models also showed motifs associated with reading frames from coding sequences, which further revealed potential coding regions from 5’ untranslated regions and long non-coding RNAs. Dynamic single-molecule imaging of nascent RNAs coupled with multiple genome-wide assays reveals that splicing happens far more often than expected, and partial intron removal can be captured prior to completion of the entire transcript. I hypothesize that the spliceosome progressively removes large introns in small pieces through 'recursive splicing' instead of removing the whole intron at once. However, the sequence features that distinguish sites of recursive splicing from canonical splice sites remain to be discovered. Here, I applied mixture models to sequences from human introns to identify sequence features associated with recursive splicing. This method helped me to recognize and visualize splicing signals from annotated intron sequences and identify potential coding sequences from human 5' untranslated regions and long non-coding RNA. After applying mixture models to the sequences surrounding recursive and canonical splicing sites, I found that transcripts where large introns can be recursively spliced can be distinguished from those without recursive splicing by the presence of CG-rich motifs flanking 5' splice sites upstream of first introns, and the absence of DNA methylation at these sites.In addition to applications of mixture models, I also explored RNA Bind-N-Seq data reflecting the binding activities of the splicing factor U2AF and found that the recursive 3' splice sites have higher U2AF binding affinities than the downstream canonical 3'SS. The observations suggest that, first, mixture models have the potential to identify functional motifs, including subtle signals in sequences such as the branch sites that only occur in a subgroup of introns. Second, the usage of recursive splicing sites is associated with sequence features in the first exons of the transcripts, suggesting a testable model for the regulation of recursive splicing in human introns.
  • Thumbnail Image
    Item
    Methods for Efficient Processing and Comprehensive Analysis of Single Cell Sequencing Data
    (2024) He, Dongze; Patro, Rob R.P.; Cell Biology & Molecular Genetics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Over the past decade, the rapid development of single-cell RNA-sequencing (scRNA-seq) technology has revolutionized the understanding of cellular differentiation, heterogeneity, transcriptional dynamics, and, many other biological processes. Despite the explosive growth of data analysis methods that aid in biological discovery, there are still many unsolved questions in raw data processing (also known as preprocessing) of scRNA-seq data --- the procedure for analyzing the raw sequenced fragments to generate the quantitative measurements of gene expression. In this dissertation, we first describe a computational ecosystem we developed that provides an end-to-end pipeline for accurately and efficiently processing single-cell sequencing data. Then, we will discuss the computational and analytical challenges we found during the development of alevin-fry and the solutions we provided for tackling these challenges. Chapters 2 and 3 demonstrate the computational successes we achieved for single-cell data processing. In Chapter 2, we present a novel computational framework, alevin-fry, for rapid, accurate, and memory-frugal quantification of single-cell sequencing data. In Chapter 3, we discuss an augmented execution context, simpleaf, of alevin-fry that not only provides a simplified user interface to the alevin-fry framework, but also offers many high-level simplifications for single-cell data processing, and for assisting with data provenance propagation and reproducible analyses. Our results demonstrate that, with the help of alevin-fry and simpleaf, we are able to process single-cell data from both "standard'' chemistries, as well as from more advanced and complex data types, and achieve the same level of accuracy as existing best-in-class methods, while being substantially faster and more memory efficient. Chapter 4 introduces Forseti, a mechanistic model to probabilistically assign a splicing status to scRNA-seq reads. As the first probabilistic and mechanistic model for solving the ambiguity of splicing status in tagged-end, short-read scRNA-seq data, we show that Forseti can be used to accurately and efficiently infer the splicing status of scRNA-seq reads, and to help identify the correct gene origin for multigene-mapped reads. In Chapter 5, we describe the results of a comprehensive analysis of "off-target'' reads (reads whose mappings cannot be accounted for under the presumed and intended components of the underlying protocol) in scRNA-seq. Overall, our results suggest that off-target scRNA-seq reads contain underappreciated information about various transcriptional activities. These observations about yet-unexploited information in existing scRNA-seq data will help guide and motivate the community to improve current algorithms and analysis methods, and to develop novel approaches that utilize off-target reads to extend the reach and accuracy of single-cell data analysis pipelines.
  • Thumbnail Image
    Item
    Investigation of progerin expression in non-Hutchinson-Gilford Progeria Syndrome individuals
    (2023) Yu, Reynold; Cao, Kan; Mount, Steve; Molecular and Cell Biology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Hutchinson-Gilford Progerin Syndrome (HGPS) is a premature aging disease caused by a point mutation in the LMNA gene, which encodes A-type lamins. This mutation activates a cryptic splice donor in exon 11 and leads to the production of a toxic lamin variant called progerin. Interestingly, small amounts of progerin have also been found in cells and tissues of normal individuals. Here we examine the expression of progerin in publicly available RNA-seq data from normal individuals of the GTEx project. Among the 30 available tissues, progerin expression in normal individuals is highest in sun-exposed skin samples, and its expression in different tissues of the same donor is correlated. In addition, telomere shortening is significantly correlated with progerin expression. Transcriptome-wide correlation analyses suggest that the level of progerin expression is highly correlated with switches in gene isoform expression patterns, perhaps reflecting widespread isoform shifts in these samples. Differential expression analyses show that progerin expression is correlated with significant changes in the level of transcripts from genes involved in splicing regulation and a significant reduction of mitochondrial transcripts. Interestingly, 5’ splice sites whose use is correlated (either positively or negatively) with progerin expression have significantly altered frequencies of consensus trinucleotides within the core 5’ splice site. Furthermore, introns whose alternative splicing is correlated with progerin have reduced GC content. Together, our study suggests that progerin expression in normal individuals is part of a global shift in splicing patterns and provides insight into the mechanism behind these changes.
  • Thumbnail Image
    Item
    A MULTI-OMICS APPROACH TO CHARACTERIZING THREE HEALTH RELEVANT FUNCTIONS OF THE HUMAN GUT MICROBIOME
    (2022) Braccia, Domenick James; Hall, Brantley; Cell Biology & Molecular Genetics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The human gut is home to trillions of microorganisms that routinely interact with their human host in both beneficial and detrimental ways. The advent of next-generation sequencing and high-throughput “omics” technologies has created new opportunities to examine the role that the human gut microbiome plays on human health, especially in regard to gastrointestinal diseases such as Inflammatory Bowel Disease and colorectal cancer. In my dissertation, I utilize genomic, transcriptomic, metabolomic, and protein sequence datasets to characterize three health-relevant functions of the human gut microbiome. First, I performed a multi-omic, bioinformatic analysis to identify the bacterial enzyme, bilirubin reductase. While bilirubin reduction to urobilinogen and stercobilinogen is a well-known function of the human gut microbiome, the enzyme(s) responsible for the conversion of bilirubin to non-toxic reduced products have yet to be fully characterized. In this chapter, I review how I leveraged publicly available metabolomic, metagenomic, and metatranscriptomic data to explore over 2 million putative reductase genes and identify a candidate operon encoding bilirubin reductase. Second, I examined sources of microbial hydrogen sulfide (H2S) production by bacteria of the human gut microbiome. H2S is a sulfuric gas produced by various bacterial phyla of the human gut microbiome and is implicated in the etiology of gastrointestinal diseases such as Inflammatory Bowel Disease and colorectal cancer. In this chapter, I show via bioinformatic analysis that the capacity to produce H2S via cysteine degradation is ubiquitous in the human gut. Third, I explored bacterial prodrug activation required for the activation of immune system modulators such as sulfasalazine. After curating amino acid sequences of known azoreducing genes and performing a protein sequence search across the Unified Human Gastrointestinal Genomes (UHGG) collection containing 4,644 genomes, I identified putative azoreducing and non-azoreducing bacterial strains to be experimentally validated. Together, these results highlight a successful mult-omic approach to characterizing three diverse but health-relevant functions of the human gut microbiome.
  • Thumbnail Image
    Item
    INVESTIGATING MOLECULAR MECHANISMS SPECIFYING DIVERSE ROSACEAE FRUIT TYPES THROUGH COMPARATIVE TRANSCRIPTOMIC ANALYSIS
    (2022) Li, Muzi; Liu, Zhongchi; Mount, Stephen; Cell Biology & Molecular Genetics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Rosaceae is a plant family with over 3,000 species including a number of economically important fruit-bearing species. Although plants in Rosaceae family have similar basic flower structure, their fruit flesh comes from distinct floral tissues. In drupe fruit, such as peach and plum, the ovary wall becomes enlarged and fleshy. In pome fruit, such as apple and pear, the fruit fleshy is mainly derived from the hypanthium that encases the ovary. In drupetum fruit, such as raspberry, numerous unfused ovaries each grow into a fleshy drupelet. In achenetum fruit, such as strawberry, the numerous unfused ovaries eventually dry up, but the receptacle, the stem tip that supports these ovaries, instead develops into the fruit flesh. By investigating and comparing the transcriptomes from these four Rosaceae fruits, peach (Prunus persica), apple (Malus x domestica), strawberry (Fragaria vesca), and raspberry (Rubus idaeus), at the earliest stages of fruit development, we gain important insights into the genetic mechanisms underlying fleshy fruit diversity. The expression of B class MADS-box genes, PISTILLATA, APETALA3 and TM6, shows negative correlation with the ability to form fleshy fruit tissues. Based on RNA transcript and phylogenetic analysis, FBP9, a MADS-box gene related to the E class, appears to be necessary but insufficient for flesh formation. In addition to the regulatory roles MADS-box genes play in fruit identity specification, extensive lignification of the strawberry ovary wall may contribute to the inability of strawberry ovary to become fleshy. Finally, a database (ROsaceae Fruit Transcriptome database, ROFT) is established for researchers to query for orthologous genes and their expression patterns during fruit development in the four species as well as to query for the tissue-specific and tissue- and stage-specific genes. Together, these findings provide the framework for functional investigations of fruit type specification and insights into the evolution of diverse fruit types in the Rosaceae family. The knowledge gained will advance our understanding in the evolution of fleshy fruits, a defining feature of angiosperm, and enable the creation of new fruit types for consumers.
  • Thumbnail Image
    Item
    STRATEGIES AND RESOURCES FOR RATIONAL VACCINE DESIGN AND ANTIBODY-ANTIGEN DOCKING AND AFFINITY PREDICTION
    (2022) Guest, Johnathan Daniel; Pierce, Brian G.; Cell Biology & Molecular Genetics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Antibody recognition of antigens is a unique class of protein-protein interactions, and increased knowledge regarding the determinants of these interactions has advanced fields such as computational vaccine design and protein docking. However, the diversity and flexibility of antibodies and antigens can hinder generation of potent vaccine immunogens or prediction of correct antibody-antigen interfaces, slowing progress in the design of vaccines and antibody therapeutics. In this thesis, we present strategies to design vaccine candidates for a difficult viral target and describe expanded resources for benchmarking and training antibody-antigen docking and affinity prediction algorithms.We utilized rational design to develop candidate immunogens for a vaccine against hepatitis C virus (HCV), which represents a global disease burden despite recent advances in antiviral treatments. This design strategy produced a soluble and secreted E1E2 glycoprotein heterodimer with native-like antigenicity and immunogenicity by fusing ectodomains with a leucine zipper scaffold and a furin cleavage site. We developed additional constructs that incorporated synthetic or non-eukaryotic scaffolds or alternative ectodomains that included consensus sequences designed using a large reference database. Finally, we utilized previously published data on HCV antibody neutralization and E1E2 mutagenesis to predict residues that impact antibody neutralization and E1E2 heterodimerization, offering potential insights that can aid vaccine design. To improve our knowledge of and accuracy in modeling antibody-antigen recognition, we assembled a set of antibody-antigen complex structures from the Protein Data Bank (PDB) that expanded Docking Benchmark 5, a widely used benchmark for protein docking. These complexes more than doubled the number of antibody-antigen structures in the benchmark and, based on tests of current algorithms, highlight significant challenges for docking and affinity prediction. Building on this resource, we assembled and curated a dataset of ~400 antibody-antigen affinities and corresponding structures, forming an expanded and updated benchmark to guide ΔG prediction of antibody-antigen interactions. Using this dataset, we retrained combinations of terms from existing scoring functions and potentials, demonstrating that this resource can be used to improve antibody-antigen ΔG prediction. Overall, these findings can advance HCV vaccine design and antibody-antigen docking and affinity prediction, helping to better elucidate the determinants of antibody-antigen interactions and to better display vaccine immunogens for induction of neutralizing antibodies.
  • Thumbnail Image
    Item
    Using Phylotranscriptomics to Study the Evolution of the Green Algae
    (2021) Ferranti, David Anthony; Delwiche, Charles F; Cell Biology & Molecular Genetics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The colonization of land by plants approximately 500 million years ago (Ma) is one of the most important events in the history of complex life. Land plants, hereafter referred to as “embryophytes,” comprise the ecological foundation of every major terrestrial biome, making them an essential lineage to the origin and maintenance of biodiversity in those habitats. The embryophytes form a monophyletic clade within one of the two major phyla of the green algae, the charophytes. Estimates from both fossil data and molecular clock analyses suggest that the charophytes diverged from the other main phylum of green algae, the chlorophytes, by as much as 1500 Ma. Here I present a phylogenetic analysis using transcriptomic and genomic data of 62 green algae and embryophyte operational taxonomic units, 31 of which were assembled de novo for this project. I focus on identifying the charophyte lineage that is sister to embryophytes, and show that the Zygnematophyceae have the strongest support, although the Charophyceae also have moderate support. I demonstrate that this phylogenetic tree topology is robust across different phylogenetic models and methods. Furthermore, I examine amino acid and codon usage across the tree and find that patterns in these data broadly follow the phylogenetic tree. I conclude by searching my dataset for the presence/absence of several protein domains and gene families known to be important in embryophytes, including the ethylene signaling pathway and various ion transporters. Many of these domains and genes have homologous sequences in the charophyte lineages, indicating that those green algae were particularly well-suited to the colonization of land.
  • Thumbnail Image
    Item
    Discovery and Characterization of Antiterminator Proteins in Bacteria
    (2018) Goodson, Jonathan Ryan; Winkler, Wade; Cell Biology & Molecular Genetics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Transcription is a discontinuous process, where each nucleotide incorporation cycle offers a decision between elongation, pausing, halting, or termination. In bacteria, many regulators—including protein antiterminators or cis-acting regulatory RNAs, such as riboswitches—exert their influence over transcription elongation. Through such mechanisms, these regulators can couple physiological or environmental signals to transcription attenuation, a process where RNA structure directly influences formation of transcription termination signals. However, through another regulatory mechanism called processive antitermination (PA), RNA polymerase can become induced to bypass termination sites over much greater distances than transcription attenuation can offer. These mechanisms are widespread in bacteria, although only a few mechanistic classes have been discovered overall. The aim of the research in this dissertation is two-fold: to identify novel genetic regulatory mechanisms targeting transcription termination and to systematically study the diversity and breadth distribution of these mechanisms among bacteria. This research focuses on two distinct mechanisms, each representing one of these mechanisms of antitermination. First, I detail discovery of LoaP, a specialized paralog of the universally conserved NusG transcription elongation factor. Our data demonstrate that Bacillus velezensis LoaP controls gene expression of antibiotic biosynthesis gene clusters by promoting readthrough of transcription termination sites. Additionally, we show that, unlike other bacterial NusG proteins, LoaP binds RNA with high affinity, and with apparent specificity for a sequence in the 5′ leader regions of its target operons. Second, we describe the interaction between a family of antitermination proteins containing the ANTAR RNA-binding domain with its target RNA. We show that ANTAR-containing proteins bind a tandem stem-loop RNA motif to prevent formation of terminator structures. Using a combination of mutagenesis strategies, we elucidate some of the RNA-binding requirements of a representative ANTAR protein. Finally, employed bioinformatic and phylogenetic approaches to place these regulators in the context of their entire protein families, learning about the distribution of these mechanisms, their association with particular potential regulons, and sequence composition of different protein subfamilies.