Patterns and Complexity in Biological Systems: A Study of Sequence Structure and Ontology-based Networks

Thumbnail Image


Publication or External Link






Biological information can be explored at many different levels, with the most basic information encoded in patterns within the DNA sequence. Through molecular level processes, these patterns are capable of controlling the states of genes, resulting in a complex network of interactions between genes. Key features of biological systems can be determined by evaluating properties of this gene regulatory network. More specifically, a network-based approach helps us to understand how the collective behavior of genes corresponds to patterns in genetic function.

We combine Chromatin-Immunoprecipitation microarray (ChIP-chip) data with genomic sequence data to determine how DNA sequence works to recruit various proteins. We quantify this information using a value termed "nmer-association.'' "Nmer-association'' measures how strongly individual DNA sequences are associated with a protein in a given ChIP-chip experiment. We also develop the "split-motif'' algorithm to study the underlying structural properties of DNA sequence independent of wet-lab data. The "split-motif'' algorithm finds pairs of DNA motifs which preferentially localize relative to one another. These pairs are primarily composed of known transcription factor binding sites and their co-occurrence is indicative of higher-order structure. This kind of structure has largely been missed in standard motif-finding algorithms despite emerging evidence of the importance of complex regulation.

In both simple and complex regulation, two genes that are connected in a regulatory fashion are likely to have shared functions. The Gene Ontology (GO) provides biologists with a controlled terminology with which to describe how genes are associated with function and how those functional terms are related to each other. We introduce a method for processing functional information in GO to produce a gene network. We find that the edges in this network are correlated with known regulatory interactions and that the strength of the functional relationship between two genes can be used as an indicator of how informationally important that link is in the regulatory network. We also investigate the network structure of gene-term annotations found in GO and use these associations to establish an alternate natural way to group the functional terms. These groups of terms are drastically different from the hierarchical structure established by the Gene Ontology and provide an alternative framework with which to describe and predict the functions of experimentally identified groups of genes.