Skip to content
University of Maryland LibrariesDigital Repository at the University of Maryland
    • Login
    View Item 
    •   DRUM
    • Theses and Dissertations from UMD
    • UMD Theses and Dissertations
    • View Item
    •   DRUM
    • Theses and Dissertations from UMD
    • UMD Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Describing and Modeling Repetitive Sequences in DNA

    Thumbnail
    View/Open
    umi-umd-3639.pdf (1020.Kb)
    No. of downloads: 684

    Date
    2006-07-19
    Author
    Sindi, Suzanne Soraya
    Advisor
    Yorke, James A
    Metadata
    Show full item record
    Abstract
    A significant fraction of the <strong>genome</strong>, i.e. the complete DNA sequence, of most organisms is comprised of sequences for which there are similar copies somewhere within the genome. While most repetitive DNA was originally thought to have no function, there is a growing body of literature to suggest that repetitive sequences are vital to the genome. The goal of this dissertation is to analyze statistical properties of repetitive sequences in the genomes of a variety of organisms. We find a variety of striking features of repetitive sequence in the human genome and the genomes of <em>C. elegans</em> (worm), <em>A. thaliana</em> (mustard seed) and <em>D. melanogaster</em> (fruit fly) with some comparison to <em>S. cerevisiae</em> (yeast) and <em>E. coli</em> (a bacteria). We find that the number of times each 40-mer (sequence of 40 bases) occurs in a genome is approximated by a power law distribution. We analyze in detail the separation between copies of 40-mers that occur exactly twice in a chromosome and observe that a significant portion of these pairs, that we call &#34;proximal&#34;, have extremely small separations, while the remaining &#34;distant&#34; pairs have a distribution more consistent with being uniformly distributed throughout the chromosome. We introduce a type of exactly repetitive region, which we call a &#34;repeat string,&#34; and find the distribution of lengths of repeat strings is roughly a power law. Since these properties have been verified for the genomes of a variety of organisms there may be a common explanation of their origin. When possible, we suggest evolutionary mechanisms that could cause the emergence of such statistical properties. In particular, we developed a model of the evolution of repeat strings in a genome. We find that, under quite general conditions, the stationary distribution of our evolutionary model is the Pareto distribution, a close relative of the power law distribution.
    URI
    http://hdl.handle.net/1903/3796
    Collections
    • Computer Science Theses and Dissertations
    • Mathematics Theses and Dissertations
    • UMD Theses and Dissertations

    DRUM is brought to you by the University of Maryland Libraries
    University of Maryland, College Park, MD 20742-7011 (301)314-1328.
    Please send us your comments.
    Web Accessibility
     

     

    Browse

    All of DRUMCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister
    Pages
    About DRUMAbout Download Statistics

    DRUM is brought to you by the University of Maryland Libraries
    University of Maryland, College Park, MD 20742-7011 (301)314-1328.
    Please send us your comments.
    Web Accessibility