Thousands of missed genes found in bacterial genomes and their analysis with COMBREX

dc.contributor.authorWood, Derrick E
dc.contributor.authorLin, Henry
dc.contributor.authorLevy-Moonshine, Ami
dc.contributor.authorSwaminathan, Rajiswari
dc.contributor.authorChang, Yi-Chien
dc.contributor.authorAnton, Brian P
dc.contributor.authorOsmani, Lais
dc.contributor.authorSteffen, Martin
dc.contributor.authorKasif, Simon
dc.contributor.authorSalzberg, Steven L
dc.date.accessioned2021-09-28T18:41:24Z
dc.date.available2021-09-28T18:41:24Z
dc.date.issued2012-10-30
dc.description.abstractThe dramatic reduction in the cost of sequencing has allowed many researchers to join in the effort of sequencing and annotating prokaryotic genomes. Annotation methods vary considerably and may fail to identify some genes. Here we draw attention to a large number of likely genes missing from annotations using common tools such as Glimmer and BLAST. By analyzing 1,474 prokaryotic genome annotations in GenBank, we identify 13,602 likely missed genes that are homologs to non-hypothetical proteins, and 11,792 likely missed genes that are homologs only to hypothetical proteins, yet have supporting evidence of their protein-coding nature from COMBREX, a newly created gene function database. We also estimate the likelihood that each potential missing gene found is a genuine protein-coding gene using COMBREX. Our analysis of the causes of missed genes suggests that larger annotation centers tend to produce annotations with fewer missed genes than smaller centers, and many of the missed genes are short genes <300 bp. Over 1,000 of the likely missed genes could be associated with phenotype information available in COMBREX. 359 of these genes, found in pathogenic organisms, may be potential targets for pharmaceutical research. The newly identified genes are available on COMBREX’s website.en_US
dc.description.urihttps://doi.org/10.1186/1745-6150-7-37
dc.identifierhttps://doi.org/10.13016/i3dm-vkvw
dc.identifier.citationWood, D.E., Lin, H., Levy-Moonshine, A. et al. Thousands of missed genes found in bacterial genomes and their analysis with COMBREX. Biol Direct 7, 37 (2012).en_US
dc.identifier.urihttp://hdl.handle.net/1903/28038
dc.language.isoen_USen_US
dc.publisherSpringer Natureen_US
dc.relation.isAvailableAtCollege of Computer, Mathematical & Natural Sciencesen_us
dc.relation.isAvailableAtComputer Scienceen_us
dc.relation.isAvailableAtDigital Repository at the University of Marylanden_us
dc.relation.isAvailableAtUniversity of Maryland (College Park, MD)en_us
dc.subjectGenome Annotationen_US
dc.subjectAnnotate Geneen_US
dc.subjectProkaryotic Genomeen_US
dc.subjectTrue Geneen_US
dc.subjectSignificant Sequence Similarityen_US
dc.titleThousands of missed genes found in bacterial genomes and their analysis with COMBREXen_US
dc.typeArticleen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
1745-6150-7-37.pdf
Size:
789.28 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.57 KB
Format:
Item-specific license agreed upon to submission
Description: