Methods for Efficient Processing and Comprehensive Analysis of Single Cell Sequencing Data

dc.contributor.advisorPatro, Rob R.P.en_US
dc.contributor.authorHe, Dongzeen_US
dc.contributor.departmentCell Biology & Molecular Geneticsen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2024-06-28T05:51:29Z
dc.date.available2024-06-28T05:51:29Z
dc.date.issued2024en_US
dc.description.abstractOver the past decade, the rapid development of single-cell RNA-sequencing (scRNA-seq) technology has revolutionized the understanding of cellular differentiation, heterogeneity, transcriptional dynamics, and, many other biological processes. Despite the explosive growth of data analysis methods that aid in biological discovery, there are still many unsolved questions in raw data processing (also known as preprocessing) of scRNA-seq data --- the procedure for analyzing the raw sequenced fragments to generate the quantitative measurements of gene expression. In this dissertation, we first describe a computational ecosystem we developed that provides an end-to-end pipeline for accurately and efficiently processing single-cell sequencing data. Then, we will discuss the computational and analytical challenges we found during the development of alevin-fry and the solutions we provided for tackling these challenges. Chapters 2 and 3 demonstrate the computational successes we achieved for single-cell data processing. In Chapter 2, we present a novel computational framework, alevin-fry, for rapid, accurate, and memory-frugal quantification of single-cell sequencing data. In Chapter 3, we discuss an augmented execution context, simpleaf, of alevin-fry that not only provides a simplified user interface to the alevin-fry framework, but also offers many high-level simplifications for single-cell data processing, and for assisting with data provenance propagation and reproducible analyses. Our results demonstrate that, with the help of alevin-fry and simpleaf, we are able to process single-cell data from both "standard'' chemistries, as well as from more advanced and complex data types, and achieve the same level of accuracy as existing best-in-class methods, while being substantially faster and more memory efficient. Chapter 4 introduces Forseti, a mechanistic model to probabilistically assign a splicing status to scRNA-seq reads. As the first probabilistic and mechanistic model for solving the ambiguity of splicing status in tagged-end, short-read scRNA-seq data, we show that Forseti can be used to accurately and efficiently infer the splicing status of scRNA-seq reads, and to help identify the correct gene origin for multigene-mapped reads. In Chapter 5, we describe the results of a comprehensive analysis of "off-target'' reads (reads whose mappings cannot be accounted for under the presumed and intended components of the underlying protocol) in scRNA-seq. Overall, our results suggest that off-target scRNA-seq reads contain underappreciated information about various transcriptional activities. These observations about yet-unexploited information in existing scRNA-seq data will help guide and motivate the community to improve current algorithms and analysis methods, and to develop novel approaches that utilize off-target reads to extend the reach and accuracy of single-cell data analysis pipelines.en_US
dc.identifierhttps://doi.org/10.13016/g18x-vyhd
dc.identifier.urihttp://hdl.handle.net/1903/32817
dc.language.isoenen_US
dc.subject.pqcontrolledBiologyen_US
dc.subject.pqcontrolledBioinformaticsen_US
dc.subject.pquncontrolledoff target read analysisen_US
dc.subject.pquncontrolledsingle-cell data analysisen_US
dc.subject.pquncontrolledsingle-cell RNA-sequencingen_US
dc.subject.pquncontrolledsplicingen_US
dc.titleMethods for Efficient Processing and Comprehensive Analysis of Single Cell Sequencing Dataen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
He_umd_0117E_24079.pdf
Size:
29.04 MB
Format:
Adobe Portable Document Format