Characterization of the Unannotated Transcriptome: Investigating the Evolution of sORFs and Proto/De Novo Genes in the Drosophila pseudoobscura Subgroup using Comparative Transcriptome and Genome Data

dc.contributor.advisorMachado, Carlos Aen_US
dc.contributor.authorClark, Ronald Jamesen_US
dc.contributor.departmentBiologyen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2025-08-08T12:11:49Z
dc.date.issued2025en_US
dc.description.abstractThe discovery of small open reading frames (sORFs) and de novo genes has reshaped our understanding of genome functionality, revealing that a much larger portion of the genome is transcribed than previously recognized. However, identifying and validating these unannotated transcripts remains challenging due to their short sequence lengths, low expression levels, and lineage-specific divergence. Here, we employ a multifaceted approach to refine the unannotated transcriptome of Drosophila pseudoobscura and Drosophila persimilis using a large RNA-seq dataset. By assessing expression consistency across multiple strains, developmental stages, and tissues, we identify a subset of small ORF-encoded peptides (SEPs) and proto/de novo genes (p/DNGs) with potential biological relevance.Our analysis reduced the unannotated transcriptome to a refined set of 2,864 consistently expressed transcripts in D. pseudoobscura, of which 1,260 exhibit conserved consistent expression in D. persimilis. Clustering these transcripts into distinct expression profiles revealed 764 transcripts with coordinated developmental regulation, and among them, 85 contained ORFs supported by mass spectrometry in D. pseudoobscura. Further validation through Kozak sequence analysis strengthened the case for their potential translation. To contextualize these transcripts, we examined attributes of the genes that encode them, including their genomic distribution, chromosomal localization, and enrichment within Topologically Associated Domains (TADs). Our findings reveal significant differences in transcript expression relative to TAD boundaries, suggesting stable gene expression patterns and potential shared regulatory mechanisms. Additionally, we investigated their co-localization with transposable elements (TEs), uncovering striking differences in TE composition between transcript subsets based on biological support and expression profiles. These results suggest that THE activity may influence the regulatory landscape of unannotated transcripts. By integrating expression conservation, genomic architecture, and evolutionary dynamics, this study refines the strategies for identifying functional sORFs and p/DNGs. This work provides novel insights into the transcriptional landscape of D. pseudoobscura and D. persimilis, offering a framework for future studies on the evolutionary constraints and functional relevance of unannotated genes.en_US
dc.identifierhttps://doi.org/10.13016/yg7w-ma9r
dc.identifier.urihttp://hdl.handle.net/1903/34250
dc.language.isoenen_US
dc.subject.pqcontrolledBiologyen_US
dc.subject.pqcontrolledEvolution & developmenten_US
dc.subject.pqcontrolledGeneticsen_US
dc.subject.pquncontrolledBioinformaticsen_US
dc.subject.pquncontrolledDrosophila pseudoobscuraen_US
dc.subject.pquncontrolledevolutionen_US
dc.subject.pquncontrolledmolecular-geneticsen_US
dc.subject.pquncontrolledsORFen_US
dc.subject.pquncontrolledtranscriptomicsen_US
dc.titleCharacterization of the Unannotated Transcriptome: Investigating the Evolution of sORFs and Proto/De Novo Genes in the Drosophila pseudoobscura Subgroup using Comparative Transcriptome and Genome Dataen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
Clark_umd_0117E_25113.pdf
Size:
12.75 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
Supplementary_Files-20250414T162044Z-001.zip
Size:
2.28 MB
Format:
Unknown data format