High Performance Computing for DNA Sequence Alignment and Assembly

dc.contributor.advisorSalzberg, Steven Len_US
dc.contributor.authorSchatz, Michael Christopheren_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2010-07-02T05:58:59Z
dc.date.available2010-07-02T05:58:59Z
dc.date.issued2010en_US
dc.description.abstractRecent advances in DNA sequencing technology have dramatically increased the scale and scope of DNA sequencing. These data are used for a wide variety of important biological analyzes, including genome sequencing, comparative genomics, transcriptome analysis, and personalized medicine but are complicated by the volume and complexity of the data involved. Given the massive size of these datasets, computational biology must draw on the advances of high performance computing. Two fundamental computations in computational biology are read alignment and genome assembly. Read alignment maps short DNA sequences to a reference genome to discover conserved and polymorphic regions of the genome. Genome assembly computes the sequence of a genome from many short DNA sequences. Both computations benefit from recent advances in high performance computing to efficiently process the huge datasets involved, including using highly parallel graphics processing units (GPUs) as high performance desktop processors, and using the MapReduce framework coupled with cloud computing to parallelize computation to large compute grids. This dissertation demonstrates how these technologies can be used to accelerate these computations by orders of magnitude, and have the potential to make otherwise infeasible computations practical.en_US
dc.identifier.urihttp://hdl.handle.net/1903/10351
dc.subject.pqcontrolledComputer Scienceen_US
dc.subject.pqcontrolledBiology, Bioinformaticsen_US
dc.subject.pquncontrolledDNA Sequence Analysisen_US
dc.subject.pquncontrolledGenome Assemblyen_US
dc.subject.pquncontrolledGenomicsen_US
dc.subject.pquncontrolledGPGPUen_US
dc.subject.pquncontrolledMapReduceen_US
dc.subject.pquncontrolledParallel Algorithmsen_US
dc.titleHigh Performance Computing for DNA Sequence Alignment and Assemblyen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Schatz_umd_0117E_11191.pdf
Size:
7.19 MB
Format:
Adobe Portable Document Format