Highly Scalable Short Read Alignment with the Burrows-Wheeler Transform and Cloud Computing
Langmead, Benjamin Thomas
Salzberg, Steven L
MetadataShow full item record
Improvements in DNA sequencing have both broadened its utility and dramatically increased the size of sequencing datasets. Sequencing instruments are now used regularly as sources of high-resolution evidence for genotyping, methylation profiling, DNA-protein interaction mapping, and characterizing gene expression in the human genome and in other species. With existing methods, the computational cost of aligning short reads from the Illumina instrument to a mammalian genome can be very large: on the order of many CPU months for one human genotyping project. This thesis presents a novel application of the Burrows-Wheeler Transform that enables the alignment of short DNA sequences to mammalian genomes at a rate much faster than existing hashtable-based methods. The thesis also presents an extension of the technique that exploits the scalability of Cloud Computing to perform the equivalent of one human genotyping project in hours.