Animal & Avian Sciences Research Works

Permanent URI for this collectionhttp://hdl.handle.net/1903/1600

Browse

Search Results

Now showing 1 - 3 of 3
  • Thumbnail Image
    Item
    Term-tissue specific models for prediction of gene ontology biological processes using transcriptional profiles of aging in drosophila melanogaster
    (Springer Nature, 2008-02-28) Zhang, Wensheng; Zou, Sige; Song, Jiuzhou
    Predictive classification on the base of gene expression profiles appeared recently as an attractive strategy for identifying the biological functions of genes. Gene Ontology (GO) provides a valuable source of knowledge for model training and validation. The increasing collection of microarray data represents a valuable source for generating functional hypotheses of uncharacterized genes. This study focused on using support vector machines (SVM) to predict GO biological processes from individual or multiple-tissue transcriptional profiles of aging in Drosophila melanogaster. Ten-fold cross validation was implemented to evaluate the prediction. One-tail Fisher's exact test was conducted on each cross validation and multiple testing was addressed using BH FDR procedure. The results showed that, of the 148 pursued GO biological processes, fifteen terms each had at least one model with FDR-adjusted p-value (Adj.p) <0.05 and six had the values between 0.05 and 0.25. Furthermore, all these models had the prediction sensitivity (SN) over 30% and specificity (SP) over 80%. We proposed the concept of term-tissue specific models indicating the fact that the major part of the optimized prediction models was trained from individual tissue data. Furthermore, we observed that the memberships of the genes involved in all the three pursued children biological processes on mitochondrial electron transport could be predicted from the transcriptional profiles of aging (Adj.p < 0.01). This finding may be important in biology because the genes of mitochondria play a critical role in the longevity of C. elegans and D. melanogaster.
  • Thumbnail Image
    Item
    Principal component tests: applied to temporal gene expression data
    (Springer Nature, 2009-01-30) Zhang, Wensheng; Fang, Hong-Bin; Song, Jiuzhou
    Clustering analysis is a common statistical tool for knowledge discovery. It is mainly conducted when a project still is in the exploratory phase without any priori hypotheses. However, the statistical significance testing between the clusters can be meaningful in helping the researchers to assess if the classification results from implementing a clustering algorithm need to be improved, even after the cluster number has been determined by a well-established criterion. This is important when we want to identify highly-specific patterns through classification. We proposed to use a principal component (PC) test, which is an implementation of an exact F statistic for the measures at multiple endpoints based on elliptical distribution theory, to assess the statistical significance between clusters. A challenge in the implementation is the choice of the number (q) of principal components to be considered, which can severely influence the statistical power of the method. We optimized the determination via validation according to a permutation test based on the clustering to be evaluated. The method was applied to a public dataset in classifying genes according to their temporal gene expression profiles. The results demonstrated that the PC testing were useful for determining the optimal number of clusters.
  • Thumbnail Image
    Item
    Analysis of recent segmental duplications in the bovine genome
    (Springer Nature, 2009-12-01) Liu, George E; Ventura, Mario; Cellamare, Angelo; Chen, Lin; Cheng, Ze; Zhu, Bin; Li, Congjun; Song, Jiuzhou; Eichler, Evan E
    Duplicated sequences are an important source of gene innovation and structural variation within mammalian genomes. We performed the first systematic and genome-wide analysis of segmental duplications in the modern domesticated cattle (Bos taurus). Using two distinct computational analyses, we estimated that 3.1% (94.4 Mb) of the bovine genome consists of recently duplicated sequences (≥ 1 kb in length, ≥ 90% sequence identity). Similar to other mammalian draft assemblies, almost half (47% of 94.4 Mb) of these sequences have not been assigned to cattle chromosomes. In this study, we provide the first experimental validation large duplications and briefly compared their distribution on two independent bovine genome assemblies using fluorescent in situ hybridization (FISH). Our analyses suggest that the (75-90%) of segmental duplications are organized into local tandem duplication clusters. Along with rodents and carnivores, these results now confidently establish tandem duplications as the most likely mammalian archetypical organization, in contrast to humans and great ape species which show a preponderance of interspersed duplications. A cross-species survey of duplicated genes and gene families indicated that duplication, positive selection and gene conversion have shaped primates, rodents, carnivores and ruminants to different degrees for their speciation and adaptation. We identified that bovine segmental duplications corresponding to genes are significantly enriched for specific biological functions such as immunity, digestion, lactation and reproduction. Our results suggest that in most mammalian lineages segmental duplications are organized in a tandem configuration. Segmental duplications remain problematic for genome and assembly and we highlight genic regions that require higher quality sequence characterization. This study provides insights into mammalian genome evolution and generates a valuable resource for cattle genomics research.