Biology
Permanent URI for this communityhttp://hdl.handle.net/1903/11810
Browse
16 results
Search Results
Item Algorithmic approaches for investigating DNA Methylation in tumor evolution and heterogeneity(2024) Li, Xuan; Sahinalp, S. Cenk; Mount, Stephen M.; Biology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Intratumor heterogeneity and tumor diversity of cancer impose significant challenges on the prospect of personalized cancer diagnosis, treatment, and prognostics. While many studies seek to understand the complex dynamics of cancer with theoretically well-suited biomarkers like DNA mutations, the relative molecular rigidity and sparsity of mutation make it often challenging to reconstruct reliable tumor lineage using mutation profiles in practice. Epigenetic markers like DNA methylation, on the other hand, serve as a promising alternative to elucidate intratumor heterogeneity and tumor diversity. However, systematic research leveraging algorithmic approaches to investigate DNA methylation in the context of tumor evolution and heterogeneity remains limited. Aimed to address critical gaps in computational cancer research, this dissertation presents novel computational frameworks for analyzing DNA methylation at both single-cell and bulk levels and offers insights into methylation-based tumor heterogeneity, tumor evolutionary dynamics, and cellular composition in tumor samples for characterization of the complex epigenetic landscape of tumors. Chapter 2 and Chapter 3 introduce Sgootr (Single-cell Genomic methylatiOn tumOr Tree Reconstruction), the first distance-based computational method to jointly select tumor lineage-informative CpG sites and reconstruct tumor lineages from single-cell methylation data. Sgootr lays the groundwork for understanding tumor evolution through the lens of single-cell methylation profiles. Motivated by the need highlighted in Chapter 2 to overcome imbalances in single-cell methylation data across patient samples for interpretable comparative patient analysis, Chapter 4 presents FALAFL (FAir muLti-sAmple Feature seLection). With integer linear programming (ILP) serving as its algorithmic backbone, FALAFL provides a fast and reliable solution to fairly select CpG sites across different single-cell methylation patient samples to optimally represent the entire patient cohort and identify reliable tumor lineage-informative CpG sites. Finally, Chapter 5 shifts the scope from single-cell to bulk tissue contexts and introduces Qombucha (Quadratic prOgraMming Based tUmor deConvolution with cell HierArchy), which is designed to tackle the challenges of bulk tissue analysis by inferring the methylation profiles of progenitor brain cells and determining cell type composition in bulk glioblastoma (GBM) samples. The work presented in this dissertation demonstrates the power of algorithmic and data science approaches to tackle some of the most pressing challenges in understanding the complexity of cancer epigenomics. With novel computational tools addressing current limitations in methylation data analysis, this work paves the way for further research in tumor evolution, personalized cancer treatment, and biomarker discovery. Overall, the computational frameworks and findings presented here bridge the gap between complex molecular data and clinically meaningful insights in the battle against cancer.Item EVOLUTION OF THE CRISPR IMMUNE SYSTEM FROM ECOLOGICAL TO MOLECULAR SCALES(2024) Xiao, Wei; Johnson, Philip LF; Biology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Bacteria and archaea inhabit environments that constantly face viral infections and other external genetic threats. They have evolved an arsenal of defense strategies to protect themselves. My research delves into the CRISPR immune system, the only known adaptive immune system of prokaryotes. My work explores three different dimensions of the CRISPR immune system, ranging from ecological to molecular scales.From an evolutionary perspective, CRISPR is widely distributed across the prokaryotic tree, underscoring its immune effectiveness. However, the CRISPR distribution is uneven and some lineages are devoid of CRISPR. Here, I identify two ecological drivers of the CRISPR immune system. By analyzing both 16S rRNA data and metagenomic data, I find the CRISPR system is favored in less abundant prokaryotes in the saltwater environment and higher diverse prokaryote communities in the human oral environment. On the molecular level, the CRISPR system selects and cleaves its “favorite” DNA segments (also known as “spacers”) from invading viral genomes to form immune memories. I explore how the spacer sequence composition affects its acquisition rate by the CRISPR system. I develop a convolutional neural network model to predict the spacer acquisition rate based on the spacer sequence composition in natural microbial communities. The model interpretation reveals that the PAM-proximal end of the spacer is more important in predicting the spacer abundance, which is consistent with previous findings from controlled experimental studies. Combining these scales, CRISPR repeat sequences coevolve with the rest of the genome. Thus, I explore the potential of utilizing CRISPR repeat sequences for taxonomy profiling. I find a strong relationship between unique repeat sequences and taxonomy in both the RefSeq database and a human metagenomic dataset. Then I show high accuracy when utilizing repeat sequences in taxonomy annotation of human metagenomic contigs. This novel method not only aids in annotating CRISPR arrays but also introduces a novel tool for metagenomic sequence annotation.Item Application of advanced machine learning strategies for biomedical research(2023) Chou, Renee Ti; Cummings, Michael P.; Biology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Biomedical research delves deeply into understanding individual health and disease mechanisms. Recent advancements in technologies have further transformed the field with large-scale data sets, enabling data-driven approaches to identify important patterns and relationships from large data sets. However, these data sets are often noisy and unstructured. Moreover, missing values and high dimensionality further complicate the analysis processes aimed at yielding meaningful results. With examples in ocular diseases and malaria, this dissertation presents novel strategies employing machine learning to tackle some of the challenges in biomedical research. In ocular diseases, sustained ocular drug delivery is critical to retain therapeutic levels and improve patient adherence to dosing schedules. To enhance the sustained delivery system, we engineer peptide sequences as an adapter to impart desired properties to ocular drugs. Specifically, we develop machine learning models separately for three properties–melanin binding, cell-penetration, and non-toxicity. We employ data reduction techniques to reduce the number of features while maintaining the machine learning model performance and apply interpretable machine learning techniques to explain model predictions on the three properties. Experimental validation in rabbits show two-fold increase in drug retention time with the selected peptide candidate. The developed machine learning framework can be further tailored to engineer other properties in molecular sequences with a wide variety of potential in biomedical applications. Malaria is an infectious disease caused by protozoan of the genus Plasmodium and has been a burden in global health. Developing malaria vaccines is challenging due to the diversity in parasite antigen sequences, which may lead to immune escape. To facilitate the vaccine development process, we leverage the wealth of systems data collected from various sources. For facile data management, a database is constructed to store the structured data processed from the results of the bioinformatics tools. Due to the small fraction of Plasmodium proteins labeled as known antigens, and the remaining proteins unknown of being antigens or non-antigens, a positive-unlabeled machine learning method is applied to identify potential vaccine antigen candidates. Beyond malaria, our approach provides a promising framework for identifying and prioritizing vaccine antigen candidates for a broad range of disease pathogens.Item The genomics of species divergence in drosophila(2023) Carpinteyro Ponce, Javier; Machado, Carlos A; Biology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)How do new species arise and diverge? Has been a fundamental question in evolutionary biology. The process of species divergence can be studied at many different levels of biological organization. However, it is until the recent advancements of genome sequencing technologies that genome-wide signatures of species divergence have started to unveil the complex genomic landscape of speciation. In this dissertation we investigate the landscape of genomic divergence using a classic pair of Drosophila species. We generated four new high quality genome assemblies for Drosophila pseudoobscura and D. persimilis to explore the genomic differences at three different levels. We first characterized the structural variation landscape between D. pseudoobscura and D. persimilis and stablished its association with transposable elements and tested how intrinsic genomic factors, such as recombination, influence the accumulation ofstructural variation associated with transposable elements in both species. With a combination of high-quality genome assemblies and a comprehensive population genomics data set, we also explored how the contribution of recombination rate and introgression promote sequence divergence with the potential of forming species barriers. Moreover, we investigated how gene co-expression networks potentially rewiring between species contribute to the divergence landscape between D. pseudoobscura and D. persimilis. Our work highlights the complex landscape of species divergence occurring at multiple levels of organization. Moreover, the integration of potential species drivers identified at different scales shed lights on the molecular mechanisms involved in speciation.Item SYSTEMS IMMUNOLOGY OF IMMUNE IMPRINTS INDUCED BY ACUTE VIRAL INFECTIONS(2023) Liu, Can; Johnson, Philip L.F.; Tsang, John S.; Biology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Upon encountering perturbations such as viral infections, the immune system initiates a cascade of molecular and cellular responses. These alterations may persist even after recovery, resulting in enhanced or diminished response to subsequent stimuli compared to the naïve state. Such persistent changes, referred to as immune imprints or long-term non-specific memory, indicate an incomplete resolution from immunological perturbations. The primary focus of this dissertation is to systemically investigate the immune imprints resulting from acute infections and how they shape the baseline immune status to future heterologous challenges.First, we employed cutting-edge single-cell multi-omics and computational approaches to assess the immune response during the COVID-19 disease course and severity correlates at an unprecedented resolution. We identified gene expression profiles – apoptosis in plasmacytoid dendritic cells and IL-15-linked increase of fatty acid (FA) metabolism in CD56dimCD16hi NK cells – as primary correlates of disease severity. This increase of FA signature with disease severity was also concomitant with an attenuated inflammation, indicating a dysfunctional or exhaustion-like state of these NK cells. While the depressed inflammation signature in severe patients was also found in different cell types near hospitalization, it increased temporally at later time points, indicating a critical late-stage juncture in the disease course. Next, we took the opportunity of the period following the first wave of COVID-19 pandemic to study immune imprints in human cohorts who had recovered from COVID-19 before widespread vaccination and reinfection occurred. We demonstrated that individuals who recovered from mild COVID-19, exhibit distinct immune signatures through single-cell transcriptomic profiling. Male recoverees also showed heightened responses to the seasonal influenza vaccine compared to healthy individuals without a history of COVID-19 and female recoverees. These sex dimorphic imprints highlight the interplay between intrinsic factors like sex and non-intrinsic factors such as prior SARS-CoV-2 infection, in shaping an individual's immune system over time. Lastly, we also investigated the immune imprints after acute viral infection using a controlled experimental mouse model of influenza infection. After examining cellular and gene expression profiles in various organs after the infection, we found persistent changes in both adaptive and innate immune components across multiple organs. Moreover, these changes affected subsequent local IL-17 inflammatory response and secondary heterologous vaccinations in anatomically distinct organs. Together, both human and mouse studies here are important pieces toward an improved understanding of long-term immune imprints after perturbations, which can be leveraged to develop more effective and personalized vaccines and disease treatments.Item ANALYTICAL APPROACHES FOR COMPLEX MULTI-BATCH -OMICS DATASETS AND THEIR APPLICATION TO NEURONAL DEVELOPMENT(2023) Alexander, Theresa Ann; Speer, Colenso M; El-Sayed, Najib M; Biology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)High-throughput sequencing methods are extremely powerful tools to quantify gene expression in bulk tissue and individual cells. Experimental designs often aim to quantify expression shifts to characterize developmental trajectories, disease states, or cellular drug responses. Experimental and genetic methods are also rapidly evolving to capture specific aspects of gene expression such as in targeting individual cell types, regulatory stages, and spatially resolved cell subcompartments. These studies frequently involve a variety of experimental conditions that require many samples to guarantee sufficient statistical power for subsequent analyses. These studies are frequently processed in multiple batches due to limitations on the number of samples that can be collected, processed, and sequenced at once. To eliminate erroneous results in subsequent analyses, it is necessary to deconvolve non-biological variation (batch effect) from biological signal. Here, we explored variational contributions in multi-batch high throughput sequencing experiments by developing new methods, evaluating heterogeneity-contributors in an axon-TRAP-RiboTag protocol case-study, and highlighting biological results from this protocol. First, we describe iDA, a novel dimensionality reduction method for high-throughput sequencing data. High-dimensional data in complex, multi-batch experiments often result in discrete clustering of samples or cells. Existing unsupervised linear dimensionality reduction methods like PCA often do not resolve discreteness simply with projections of maximum variance. We show that iDA can produce better projections for separating discrete clustering that correlates with known experimental biological and batch factors. Second, we provide a case study of special considerations for a complex, multi-batch high throughput experiment. We investigated the multi-faceted heterogenic contributions of a study using the axon-TRAP-RiboTag translatomic isolation protocol in a neuronal cell type. We show that popular batch-correction methods may reduce signal due to true biological heterogeneity in addition to technical noise. We offer metrics to help identify biological signal-driven batch-differences. Lastly, we employ our understanding of variational contributions in the intrinsically photosensitive retinal ganglion cell (ipRGC) -omics case study to explore the biological transcriptomic and translatomic coordination. Our analysis revealed ipRGCs participate in subcompartment-specific local protein translation. Genetic perturbations of photopigment-driven neuronal activity led to global tissue transcriptomic shifts in both the retina and brain targets, but the ipRGC axonal-specific translatome was unaltered.Item Systems Approaches to Immunology in Acute COVID19, Monogenic Immune Disorders, and Childhood Development(2022) Rachmaninoff, Nicholas; Johnson, Philip F; Biology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)In recent years advances in immune profiling technologies have allowed us to generate data at anunprecedented scale, and interrogate human immune systems in ways that were not previously possible. In this dissertation, I use these approaches in three different contexts. First, I explore how the cells of the immune system respond to acute COVID-19 infection and how this depends on the severity of the disease. Using CITEseq, simultaneous profiling of surface markers and RNA in peripheral blood mononuclear cells, I identify differentially expressed gene expression programs associated with COVID-19 infection and gene expression programs associated with disease severity. In addition, I explore how phenotypes of memory Tcells including the clonal nature and exhaustion signatures are associated with severity of COVID-19 infection. Second, I address what it means to be immunologically healthy through a multi-omics study of a cohort of patients at the NIH clinical center with various monogenic Immune disorders. I identify supervised and unsupervised axes of immune health that can separate disease from healthy controls, and additionally track changes to the immune system as people age, showing the parallels between disease associated inflammation and aging associated inflammation. I verify the utility of these metrics in several contexts outside of the original cohort and show that the signatures reflect broad changes to various cells of the immune system. Last, I explore the development of the immune system in childhood and the maintenance of temporally stable gene expression patterns. In a cohort of children that was tracked longitudinally over six years in Nicaragua, I utilize whole blood transcriptomics to explore both how the immune system changes as children grow older and which aspects of the immune system show large amounts of individuality or persistent inter-subject variation in their levels. I show that persistent inter-subject variation in gene expression and cellular frequencies is quite pronounced throughout childhood and attempt to identify when certain aspects of the immune system begin to stabilize in terms of their levels for an individual.Item METHOD VALIDATION AND DEVELOPMENT FOR THE METAGENOMIC EXPLORATION OF MICROBIAL COMMUNITIES(2022) Commichaux, Seth; Pop, Mihai; Rand, Hugh; Biology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Our world is inhabited and shaped by diverse and complex microbial communities which we are only beginning to characterize and understand. With the advent of affordable high-throughput sequencing, the study of the genomic content of microbial communities, metagenomics, has accelerated our understanding of their impact on human and environmental health. The increasing number of datasets produced by metagenomic studies provide many opportunities for novel bioinformatic analyses and for the development of computational methods. However, careful benchmarking and validation are also important undertakings to ensure the integrity of methods and research in such a rapidly developing field. Here, we explored several problems in metagenomics by benchmarking existing methods and technologies, developing new methods, recommending best practices, and highlighting opportunities for future work. First, microbial gene catalogs document and organize the genes found in microbial communities and provide a reference for the standardized analysis of metagenomic data. Although commonly used to explore the intersection between microbiomes, humans, and ecosystems, the methods used for their construction and effectiveness for metagenomic analyses had not been critically evaluated. Our analysis highlighted important limitations of gene catalogs, opportunities for future research, and allowed us to recommend best practices. Second, we assessed if nanopore long read sequencing could expedite the accurate reconstruction of a pathogen genome from a microbial community. The investigation of foodborne illness outbreaks routinely uses short-read whole genome sequencing of pure culture pathogen colonies. However, culturing is a bottleneck and short reads cannot span all bacterial genomic repeats, often leading to fragmented assemblies. Our results showed that the integration of long-read sequencing could expedite the public health response by reconstructing complete pathogen genomes from a microbial community after limited culturing. Additionally, our evaluation of state-of-the-art assembly tools identified biases and areas for improvement. Third, we describe taxaTarget, a supervised learning approach for the taxonomic classification of microeukaryotes in metagenomic data. Metagenomics has been underutilized for microeukaryotes due to the many computational challenges they present. Existing tools often implement universal sequence similarity cutoffs which ignore that sequences can evolve at different rates and, thus, have different discriminatory power. We show that a data-driven approach to determining classification thresholds can result in higher sensitivity and precision than existing tools. Fourth, we explored the use of horizontally transferred plasmids to relate an outbreak strain to the microbiome of a suspected environmental source. The investigation of the 2020 red onion outbreak recovered the outbreak strain from patients but not the farms implicated as the likely source of contamination. Our analysis identified highly similar plasmids in the outbreak strain and environmental isolates collected from the farms, which supported a connection between the outbreak strain and the implicated farms. Additionally, we highlighted the need for more detailed and accurate metadata, more extensive environmental sampling, and a better understanding of plasmid molecular evolution before such analyses can be added to the public health response.Item INSIGHTS INTO DINOFLAGELLATE NATURAL PRODUCT SYNTHESIS VIA CATALYTIC DOMAIN INTERACTIONS(2022) Williams, Ernest Patrick; Place, Allen R; Marine-Estuarine-Environmental Sciences; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Dinoflagellates are protists that can be split into two evolutionary groups, the parasitic syndinians and the largely photosynthetic “core” dinoflagellates. They represent a major portion of aquatic biomass which means that they are responsible for large portions of carbon that are both fixed and released. Other than biomass, the fixed carbon can be made into natural products such as polyunsaturated fatty acids that support the biota of many ecosystems or toxins that are harmful to aquatic life and humans. DNA and RNA analyses have been used to discover the putative genes that may make these compounds, but their non-colinear arrangement in the genome is very different from model organisms and their gene copy number is very high, making it nearly impossible to determine the exact biosynthetic pathways. The goal of my studies was to develop methods to differentiate biosynthetic pathways such as lipid and toxin synthesis by comparing the ability of domains to interact with each other with the assumption that domains that preferentially interact are more likely to participate in the same pathway. Initially, a survey was performed on available dinoflagellate transcriptomes to enumerate domains potentially involved in natural product synthesis and bin them based on sequence similarity to identify genes that could be used in biochemical assays. An interesting integration of analogous genes involved in lipid synthesis with those involved in natural product synthesis was observed as well as trends in domain expansion and contraction during core dinoflagellate evolution. Ultimately, the domain that scaffolds natural product synthesis, the thiolation domain, was chosen for further study because it exhibited two clear functional bins and is acted on directly by another enzyme, a phosphopantetheinyl transferase (PPTase). The PPTase activates the thiolation domain by transferring the phosphopantetheinate group from Coenzyme A to the thiolation domain, creating a free thiol group upon which the natural products are synthesized. These PPTases were then enumerated in dinoflagellates and characterized by looking for sequence motifs and observing expression patterns over a diel cycle as well as during growth in the model species Amphidinium carterae, a basal toxic dinoflagellate. Amphidinium carterae appears to have three PPTases, two of which (PPTase 1 and 2) are very similar, except that PPTase 2 does not appear to have a stop codon and has never been observed as a full-size protein. The remaining two PPTases (PPTase 1 and 3) had alternating expression patterns that did not appear to directly correlate to the acyl carrier protein, the thiolation domain required specifically for lipid biosynthesis. This carrier protein, like other enzymes for natural product synthesis in dinoflagellates, had a chloroplast targeting sequence while the three PPTases did not. To investigate the ability of these three PPTases to activate various thiolation domains, a total of 8 domains from A. carterae were substituted into the blue pigment synthesizing gene BpsA from Streptomyces lavendulae. These recombinant constructs were used for coexpression in E. coli as well as in vitro to reduce as many artifacts as possible and assess the interactions of each PPTase with the thiolation domains. Some of the recombinant BpsA genes were able to make blue dye with all three PPTases, while others never made blue dye both in E. coli as well as in vitro. In vitro quantification of free thiol added by the PPTase showed that all the thiolation domains, as well as the acyl carrier protein could be phosphopantetheinated by all the PPTases. This generalist substrate recognition, along with the alternating expression patterns and lack of chloroplast signaling peptide, indicate that the two active PPTases are performing the same function on all available thiolation domains, probably before export to the chloroplast. This lack of pathway segregation by PPTases is a completely novel way of synthesizing natural products compared to bacteria and fungi, likely due to the acquisition of both photosynthesis and natural product/lipid biosynthesis during dinoflagellate evolution that was not present in the common ancestor. Additionally, the techniques to identify genes of interest and perform biochemical characterization developed here are useful for future experiments annotating the function of dinoflagellate genes.Item DEVELOPMENT OF A BIOINFORMATICS PLASMID-SEARCH ENGINE FOR CRONOBACTER SPECIES.(2021) Negrete, Flavia; El-Sayeed, Najib NE; Biology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Cronobacter species. are foodborne pathogens that cause serious disease in neonates, infants, and adults. Plasmid classification lays the groundwork for understanding the stable coexistence of various extrachromosomal replicons in a single bacterium, and thus the organization of its genome. This study developed a bioinformatics plasmid-search engine to identify genomic attributes contained on Cronobacter plasmids. A database containing 32 Cronobacter plasmid sequences from all seven Cronobacter species was developed. Another database containing 683 draft and closed plasmids and genomes was also developed. Each strain’s plasmid content was sorted into six different categories based on their genetic attributes: virulence, Type-IV, heavy-metal, cryptic, multi-drug resistant, or mobilization. An in-house BLAST+-python script was used to perform a Linux-BLAST analysis to create a formatted %ID output matrix of plasmid genes. This thesis represents the first bioinformatics plasmid-search engine developed for Cronobacter. Understanding the role of plasmids in virulence and persistence underpins future mitigation strategies to be developed for controlling this pathogen.