JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions

View/ Open
External Link(s)
https://doi.org/10.1186/gb-2006-7-s1-s9Date
2006-08-07Author
Allen, Jonathan E.
Majoros, William H.
Pertea, Mihaela
Salzberg, Steven L.
Citation
Allen, J.E., Majoros, W.H., Pertea, M. et al. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions. Genome Biol 7, S9 (2006).
Metadata
Show full item recordAbstract
Background: Predicting complete protein-coding genes in human DNA remains a significant
challenge. Though a number of promising approaches have been investigated, an ideal suite of
tools has yet to emerge that can provide near perfect levels of sensitivity and specificity at the
level of whole genes. As an incremental step in this direction, it is hoped that controlled gene
finding experiments in the ENCODE regions will provide a more accurate view of the relative
benefits of different strategies for modeling and predicting gene structures.
Results: Here we describe our general-purpose eukaryotic gene finding pipeline and its major
components, as well as the methodological adaptations that we found necessary in
accommodating human DNA in our pipeline, noting that a similar level of effort may be necessary
by ourselves and others with similar pipelines whenever a new class of genomes is presented to
the community for analysis. We also describe a number of controlled experiments involving the
differential inclusion of various types of evidence and feature states into our models and the
resulting impact these variations have had on predictive accuracy.
Conclusions: While in the case of the non-comparative gene finders we found that adding
model states to represent specific biological features did little to enhance predictive accuracy, for
our evidence-based ‘combiner’ program the incorporation of additional evidence tracks tended
to produce significant gains in accuracy for most evidence types, suggesting that improved
modeling efforts at the hidden Markov model level are of relatively little value. We relate these
findings to our current plans for future research.