Analysis and correction of compositional bias in sparse sequencing count data

dc.contributor.authorKumar, M. Senthil
dc.contributor.authorSlud, Eric V.
dc.contributor.authorOkrah, Kwame
dc.contributor.authorHicks, Stephanie C.
dc.contributor.authorHannenhalli, Sridhar
dc.contributor.authorBravo, Héctor Corrada
dc.date.accessioned2021-06-15T14:50:33Z
dc.date.available2021-06-15T14:50:33Z
dc.date.issued2018-11-06
dc.description.abstractCount data derived from high-throughput deoxy-ribonucliec acid (DNA) sequencing is frequently used in quantitative molecular assays. Due to properties inherent to the sequencing process, unnormalized count data is compositional, measuring relative and not absolute abundances of the assayed features. This compositional bias confounds inference of absolute abundances. Commonly used count data normalization approaches like library size scaling/rarefaction/subsampling cannot correct for compositional or any other relevant technical bias that is uncorrelated with library size. We demonstrate that existing techniques for estimating compositional bias fail with sparse metagenomic 16S count data and propose an empirical Bayes normalization approach to overcome this problem. In addition, we clarify the assumptions underlying frequently used scaling normalization methods in light of compositional bias, including scaling methods that were not designed directly to address it.en_US
dc.description.urihttps://doi.org/10.1186/s12864-018-5160-5
dc.identifierhttps://doi.org/10.13016/u8ji-0tnw
dc.identifier.citationKumar, M., Slud, E., Okrah, K. et al. Analysis and correction of compositional bias in sparse sequencing count data. BMC Genomics 19, 799 (2018).en_US
dc.identifier.urihttp://hdl.handle.net/1903/27166
dc.language.isoen_USen_US
dc.publisherSpringer Natureen_US
dc.relation.isAvailableAtCollege of Computer, Mathematical & Natural Sciencesen_us
dc.relation.isAvailableAtMathematicsen_us
dc.relation.isAvailableAtDigital Repository at the University of Marylanden_us
dc.relation.isAvailableAtUniversity of Maryland (College Park, MD)en_us
dc.subjectCompositional biasen_US
dc.subjectNormalizationen_US
dc.subjectEmpirical Bayesen_US
dc.subjectData integrationen_US
dc.subjectCount dataen_US
dc.subjectMetagenomicsen_US
dc.subjectAbsolute abundanceen_US
dc.subjectscRNAseqen_US
dc.subjectSpike-inen_US
dc.titleAnalysis and correction of compositional bias in sparse sequencing count dataen_US
dc.typeArticleen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
s12864-018-5160-5.pdf
Size:
1.86 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.57 KB
Format:
Item-specific license agreed upon to submission
Description: