Describing and Modeling Repetitive Sequences in DNA

dc.contributor.advisorYorke, James Aen_US
dc.contributor.authorSindi, Suzanne Sorayaen_US
dc.contributor.departmentApplied Mathematics and Scientific Computationen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2006-09-12T05:46:39Z
dc.date.available2006-09-12T05:46:39Z
dc.date.issued2006-07-19en_US
dc.description.abstractA significant fraction of the <strong>genome</strong>, i.e. the complete DNA sequence, of most organisms is comprised of sequences for which there are similar copies somewhere within the genome. While most repetitive DNA was originally thought to have no function, there is a growing body of literature to suggest that repetitive sequences are vital to the genome. The goal of this dissertation is to analyze statistical properties of repetitive sequences in the genomes of a variety of organisms. We find a variety of striking features of repetitive sequence in the human genome and the genomes of <em>C. elegans</em> (worm), <em>A. thaliana</em> (mustard seed) and <em>D. melanogaster</em> (fruit fly) with some comparison to <em>S. cerevisiae</em> (yeast) and <em>E. coli</em> (a bacteria). We find that the number of times each 40-mer (sequence of 40 bases) occurs in a genome is approximated by a power law distribution. We analyze in detail the separation between copies of 40-mers that occur exactly twice in a chromosome and observe that a significant portion of these pairs, that we call &#34;proximal&#34;, have extremely small separations, while the remaining &#34;distant&#34; pairs have a distribution more consistent with being uniformly distributed throughout the chromosome. We introduce a type of exactly repetitive region, which we call a &#34;repeat string,&#34; and find the distribution of lengths of repeat strings is roughly a power law. Since these properties have been verified for the genomes of a variety of organisms there may be a common explanation of their origin. When possible, we suggest evolutionary mechanisms that could cause the emergence of such statistical properties. In particular, we developed a model of the evolution of repeat strings in a genome. We find that, under quite general conditions, the stationary distribution of our evolutionary model is the Pareto distribution, a close relative of the power law distribution.en_US
dc.format.extent1044703 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/1903/3796
dc.language.isoen_US
dc.subject.pqcontrolledMathematicsen_US
dc.subject.pqcontrolledBiology, Geneticsen_US
dc.subject.pquncontrolledpower lawen_US
dc.subject.pquncontrolledDNA word countsen_US
dc.subject.pquncontrolledsequence evolutionen_US
dc.subject.pquncontrolledrepetitive DNAen_US
dc.titleDescribing and Modeling Repetitive Sequences in DNAen_US
dc.typeDissertationen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
umi-umd-3639.pdf
Size:
1020.22 KB
Format:
Adobe Portable Document Format