Improving Genome Assembly

dc.contributor.advisorYorke, James Aen_US
dc.contributor.authorUstun, Cevaten_US
dc.contributor.departmentPhysicsen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2005-10-11T10:37:26Z
dc.date.available2005-10-11T10:37:26Z
dc.date.issued2005-08-15en_US
dc.description.abstractWe present a reliable, easy to implement algorithm to generate a set of highly reliable overlaps based on identifying repeat k-mers. Our method is coverage independent. Whereas traditionally reads have been trimmed to have expected error rates of 2%, we find our error correction allows extending usable sequence in reads to 16% trimming. We use a version of the Phrap assembly program that uses only overlaps computed by the UMD overlapper, called PhrapUMD. We integrate the UMD algorithms with Baylor's ATLAS assembler applied to Rattus norvegicus. Starting with the same data as the Nov. 2002 ATLAS assembly, we compare our results to 4.5 Mbp of rat sequence in 21 BACs that have been finished. We find that after extension and error correction, (i) the reads are 30% longer than reads trimmed to 2%; (ii) the average error rate across the extended reads is about 3 in 10,000 bases, with 88% of the extended reads matching finished sequence exactly across their entire length; and (iii) PhrapUMD with these reads and our reliable overlaps produces a draft assembly of the rat which has no misassemblies and increases the coverage of finished sequence from 92.2% to 95.7%, while simultaneously reducing the base error rate for quality 20 or higher bases from 1.50 to 0.87 errors per 10,000.en_US
dc.format.extent3482709 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/1903/2957
dc.language.isoen_US
dc.subject.pqcontrolledBiology, Geneticsen_US
dc.subject.pquncontrolledgenome assembly en_US
dc.subject.pquncontrolledphrap en_US
dc.subject.pquncontrolledBAC en_US
dc.subject.pquncontrolledoverlapsen_US
dc.titleImproving Genome Assemblyen_US
dc.typeDissertationen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
umi-umd-2750.pdf
Size:
3.32 MB
Format:
Adobe Portable Document Format