Using Many-Core Computing to Speed Up De Novo Transcriptome Assembly

Thumbnail Image


Publication or External Link





The central dogma of molecular biology implies that DNA holds the blueprint which determines an organism's structure and functioning. However, this blueprint can be read in different ways to accommodate various needs, depending on a cell's location in the body, its environment, or other external factors. This is accomplished by first transcribing DNA into messenger RNA (mRNA), and then translating mRNA into proteins. The cell regulates how much each gene is transcribed into mRNA, and even which parts of each gene is transcribed. A single gene may be transcribed in different ways by splicing out different parts of the sequence. Thus, one gene may be transcribed into many different mRNA sequences, and eventually into different proteins.

The set of mRNA sequences found in a cell is known as its transcriptome, and it differs between tissues and with time. The transcriptome gives a biologist a snapshot of the cell's state, and can help them track the progression of disease, etc. Some modern methods of transcriptome sequencing give only short reads of the mRNA, up to 100 nucleotides. In order to reconstruct the mRNA sequences, one must use an assembly algorithm to stitch these short reads back into full length transcripts.

De novo transcriptome assemblers are an important family of transcriptome assemblers. Such assemblers reconstruct the transcriptome without using a reference genome to align to and are, therefore, computationally intensive. We present here a de novo transcriptome assembler designed for a parallel computer architecture, the XMT architecture. With this assembler we produce speedups over existing de novo transcriptome assemblers without sacrificing performance on traditional quality metrics.