A comparative evaluation of sequence classiﬁcation programs

Bazinet, Adam L.; Cummings, Michael P.

A comparative evaluation of sequence classiﬁcation programs

Files

Bazinet and Cummings.pdf (503.41 KB)

No. of downloads: 511

Publication or External Link

https://doi.org/10.1186/1471-2105-13-92

Date

2012-05-10

Authors

Bazinet, Adam L.

Cummings, Michael P.

Citation

Bazinet, A.L., Cummings, M.P. A comparative evaluation of sequence classification programs. BMC Bioinformatics 13, 92 (2012).

Abstract

Background: A fundamental problem in modern genomics is to taxonomically or functionally classify DNA sequence fragments derived from environmental sampling (i.e., metagenomics). Several diﬀerent methods have been proposed for doing this eﬀectively and eﬃciently, and many have been implemented in software. In addition to varying their basic algorithmic approach to classiﬁcation, some methods screen sequence reads for ’barcoding genes’ like 16S rRNA, or various types of protein-coding genes. Due to the sheer number and complexity of methods, it can be diﬃcult for a researcher to choose one that is well-suited for a particular analysis. Results: We divided the very large number of programs that have been released in recent years for solving the sequence classiﬁcation problem into three main categories based on the general algorithm they use to compare a query sequence against a database of sequences. We also evaluated the performance of the leading programs in each category on data sets whose taxonomic and functional composition is known. Conclusions: We found signiﬁcant variability in classiﬁcation accuracy, precision, and resource consumption of sequence classiﬁcation programs when used to analyze various metagenomics data sets. However, we observe some general trends and patterns that will be useful to researchers who use sequence classiﬁcation programs.

URI (handle)

http://hdl.handle.net/1903/13346

Collections

Computer Science Research Works

Full item page