De novo likelihood-based measures for comparing genome assemblies

Ghodsi, Mohammadreza; Hill, Christopher M; Astrovskaya, Irina; Lin, Henry; Sommer, Dan D; Koren, Sergey; Pop, Mihai

De novo likelihood-based measures for comparing genome assemblies

dc.contributor.author	Ghodsi, Mohammadreza
dc.contributor.author	Hill, Christopher M
dc.contributor.author	Astrovskaya, Irina
dc.contributor.author	Lin, Henry
dc.contributor.author	Sommer, Dan D
dc.contributor.author	Koren, Sergey
dc.contributor.author	Pop, Mihai
dc.date.accessioned	2021-09-27T16:51:51Z
dc.date.available	2021-09-27T16:51:51Z
dc.date.issued	2013-08-22
dc.description.abstract	The current revolution in genomics has been made possible by software tools called genome assemblers, which stitch together DNA fragments “read” by sequencing machines into complete or nearly complete genome sequences. Despite decades of research in this field and the development of dozens of genome assemblers, assessing and comparing the quality of assembled genome sequences still relies on the availability of independently determined standards, such as manually curated genome sequences, or independently produced mapping data. These “gold standards” can be expensive to produce and may only cover a small fraction of the genome, which limits their applicability to newly generated genome sequences. Here we introduce a de novo probabilistic measure of assembly quality which allows for an objective comparison of multiple assemblies generated from the same set of reads. We define the quality of a sequence produced by an assembler as the conditional probability of observing the sequenced reads from the assembled sequence. A key property of our metric is that the true genome sequence maximizes the score, unlike other commonly used metrics. We demonstrate that our de novo score can be computed quickly and accurately in a practical setting even for large datasets, by estimating the score from a relatively small sample of the reads. To demonstrate the benefits of our score, we measure the quality of the assemblies generated in the GAGE and Assemblathon 1 assembly “bake-offs” with our metric. Even without knowledge of the true reference sequence, our de novo metric closely matches the reference-based evaluation metrics used in the studies and outperforms other de novo metrics traditionally used to measure assembly quality (such as N50). Finally, we highlight the application of our score to optimize assembly parameters used in genome assemblers, which enables better assemblies to be produced, even without prior knowledge of the genome being assembled. Likelihood-based measures, such as ours proposed here, will become the new standard for de novo assembly evaluation.	en_US
dc.description.uri	https://doi.org/10.1186/1756-0500-6-334
dc.identifier	https://doi.org/10.13016/noim-3q0k
dc.identifier.citation	Ghodsi, M., Hill, C.M., Astrovskaya, I. et al. De novo likelihood-based measures for comparing genome assemblies. BMC Res Notes 6, 334 (2013).	en_US
dc.identifier.uri	http://hdl.handle.net/1903/28018
dc.language.iso	en_US	en_US
dc.publisher	Springer Nature	en_US
dc.relation.isAvailableAt	College of Computer, Mathematical & Physical Sciences	en_us
dc.relation.isAvailableAt	Digital Repository at the University of Maryland	en_us
dc.relation.isAvailableAt	Biology	en_us
dc.relation.isAvailableAt	University of Maryland (College Park, MD)	en_us
dc.subject	Sequencing Process	en_US
dc.subject	Dynamic Programming Algorithm	en_US
dc.subject	Likelihood Score	en_US
dc.subject	Assembly Quality	en_US
dc.subject	Substitution Error	en_US
dc.title	De novo likelihood-based measures for comparing genome assemblies	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 1756-0500-6-334.pdf
Size:: 657.34 KB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Biology Research Works