Cell Biology & Molecular Genetics

Permanent URI for this communityhttp://hdl.handle.net/1903/11811

Browse

Search Results

Now showing 1 - 3 of 3
  • Thumbnail Image
    Item
    Rosaceae fruit transcriptome database (ROFT)—a useful genomic resource for comparing fruits of apple, peach, strawberry, and raspberry
    (Wiley, 2023-11-14) Li, Muzi; Mount, Stephen M.; Liu, Zhongchi
    Rosaceae is a large plant family consisting of many economically important fruit crops including peach, apple, pear, strawberry, raspberry, plum, and others. Investigations into their growth and development will promote both basic understanding and progress toward increasing fruit yield and quality. With the ever-increasing high-throughput sequencing data of Rosaceae, comparative studies are hindered by inconsistency of sample collection with regard to tissue, stage, growth conditions, and by vastly different handling of the data. Therefore, databases that enable easy access and effective utilization of directly comparable transcript data are highly desirable. Here, we describe a database for comparative analysis, ROsaceae Fruit Transcriptome database (ROFT), based on RNA-seq data generated from the same laboratory using similarly dissected and staged fruit tissues of four important Rosaceae fruit crops: apple, peach, strawberry, and red raspberry. Hence, the database is unique in allowing easy and robust comparisons among fruit gene expression across the four species. ROFT enables researchers to query orthologous genes and their expression patterns during different fruit developmental stages in the four species, identify tissue-specific and tissue-/stage-specific genes, visualize and compare ortholog expression in different fruit types, explore consensus co-expression networks, and download different data types. The database provides users access to vast amounts of RNA-seq data across the four economically important fruits, enables investigations of fruit type specification and evolution, and facilitates the selection of genes with critical roles in fruit development for further studies.
  • Thumbnail Image
    Item
    Evaluation of BLAST-based edge-weighting metrics used for homology inference with the Markov Clustering algorithm
    (Springer Nature, 2015-07-10) Gibbons, Theodore R.; Mount, Stephen M.; Cooper, Endymion D.; Delwiche, Charles F.
    Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov Clustering (MCL) algorithm in 2002, it has been the centerpiece of several popular applications. Each of these approaches generates an undirected graph that represents sequences as nodes connected to each other by edges weighted with a BLAST-based metric. MCL is then used to infer clusters of homologous proteins by analyzing these graphs. The various approaches differ only by how they weight the edges, yet there has been very little direct examination of the relative performance of alternative edge-weighting metrics. This study compares the performance of four BLAST-based edge-weighting metrics: the bit score, bit score ratio (BSR), bit score over anchored length (BAL), and negative common log of the expectation value (NLE). Performance is tested using the Extended CEGMA KOGs (ECK) database, which we introduce here. All metrics performed similarly when analyzing full-length sequences, but dramatic differences emerged as progressively larger fractions of the test sequences were split into fragments. The BSR and BAL successfully rescued subsets of clusters by strengthening certain types of alignments between fragmented sequences, but also shifted the largest correct scores down near the range of scores generated from spurious alignments. This penalty outweighed the benefits in most test cases, and was greatly exacerbated by increasing the MCL inflation parameter, making these metrics less robust than the bit score or the more popular NLE. Notably, the bit score performed as well or better than the other three metrics in all scenarios. The results provide a strong case for use of the bit score, which appears to offer equivalent or superior performance to the more popular NLE. The insight that MCL-based clustering methods can be improved using a more tractable edge-weighting metric will greatly simplify future implementations. We demonstrate this with our own minimalist Python implementation: Porthos, which uses only standard libraries and can process a graph with 25 m + edges connecting the 60 k + KOG sequences in half a minute using less than half a gigabyte of memory.
  • Thumbnail Image
    Item
    Transcriptome analyses reveal SR45 to be a neutral splicing regulator and a suppressor of innate immunity in Arabidopsis thaliana
    (Springer Nature, 2017-10-11) Zhang, Xiao-Ning; Shi, Yifei; Powers, Jordan J.; Gowda, Nikhil B.; Zhang, Chong; Ibrahim, Heba M. M.; Ball, Hannah B.; Chen, Samuel L.; Lu, Hua; Mount, Stephen M.
    Regulation of pre-mRNA splicing diversifies protein products and affects many biological processes. Arabidopsis thaliana Serine/Arginine-rich 45 (SR45), regulates pre-mRNA splicing by interacting with other regulatory proteins and spliceosomal subunits. Although SR45 has orthologs in diverse eukaryotes, including human RNPS1, the sr45–1 null mutant is viable. Narrow flower petals and reduced seed formation suggest that SR45 regulates genes involved in diverse processes, including reproduction. To understand how SR45 is involved in the regulation of reproductive processes, we studied mRNA from the wild-type and sr45–1 inflorescences using RNA-seq, and identified SR45-bound RNAs by immunoprecipitation. Using a variety of bioinformatics tools, we identified a total of 358 SR45 differentially regulated (SDR) genes, 542 SR45-dependent alternative splicing (SAS) events, and 1812 SR45-associated RNAs (SARs). There is little overlap between SDR genes and SAS genes, and neither set of genes is enriched for flower or seed development. However, transcripts from reproductive process genes are significantly overrepresented in SARs. In exploring the fate of SARs, we found that a total of 81 SARs are subject to alternative splicing, while 14 of them are known Nonsense-Mediated Decay (NMD) targets. Motifs related to GGNGG are enriched both in SARs and near different types of SAS events, suggesting that SR45 recognizes this motif directly. Genes involved in plant defense are significantly over-represented among genes whose expression is suppressed by SR45, and sr45–1 plants do indeed show enhanced immunity. We find that SR45 is a suppressor of innate immunity. We find that a single motif (GGNGG) is highly enriched in both RNAs bound by SR45 and in sequences near SR45- dependent alternative splicing events in inflorescence tissue. We find that the alternative splicing events regulated by SR45 are enriched for this motif whether the effect of SR45 is activation or repression of the particular event. Thus, our data suggests that SR45 acts to control splice site choice in a way that defies simple categorization as an activator or repressor of splicing.