An empirical analysis of training protocols for probabilistic gene finders

Majoros, William H.; Salzberg, Steven L.

An empirical analysis of training protocols for probabilistic gene finders

dc.contributor.author	Majoros, William H.
dc.contributor.author	Salzberg, Steven L.
dc.date.accessioned	2008-06-13T13:14:48Z
dc.date.available	2008-06-13T13:14:48Z
dc.date.issued	2004-12-21
dc.description.abstract	Background: Generalized hidden Markov models (GHMMs) appear to be approaching acceptance as a de facto standard for state-of-the-art ab initio gene finding, as evidenced by the recent proliferation of GHMM implementations. While prevailing methods for modeling and parsing genes using GHMMs have been described in the literature, little attention has been paid as of yet to their proper training. The few hints available in the literature together with anecdotal observations suggest that most practitioners perform maximum likelihood parameter estimation only at the local submodel level, and then attend to the optimization of global parameter structure using some form of ad hoc manual tuning of individual parameters. Results: We decided to investigate the utility of applying a more systematic optimization approach to the tuning of global parameter structure by implementing a global discriminative training procedure for our GHMM-based gene finder. Our results show that significant improvement in prediction accuracy can be achieved by this method. Conclusions: We conclude that training of GHMM-based gene finders is best performed using some form of discriminative training rather than simple maximum likelihood estimation at the submodel level, and that generalized gradient ascent methods are suitable for this task. We also conclude that partitioning of training data for the twin purposes of maximum likelihood initialization and gradient ascent optimization appears to be unnecessary, but that strict segregation of test data must be enforced during final gene finder evaluation to avoid artificially inflated accuracy measurements.	en
dc.format.extent	349552 bytes
dc.format.mimetype	application/pdf
dc.identifier.citation	An empirical analysis of training protocols for probabilistic gene finders. W.H. Majoros and S.L. Salzberg. BMC Bioinformatics 5 (2004), 206.	en
dc.identifier.uri	http://hdl.handle.net/1903/8001
dc.language.iso	en_US	en
dc.publisher	BMC Bioinformatics	en
dc.relation.isAvailableAt	College of Computer, Mathematical & Physical Sciences	en_us
dc.relation.isAvailableAt	Computer Science	en_us
dc.relation.isAvailableAt	Digital Repository at the University of Maryland	en_us
dc.relation.isAvailableAt	University of Maryland (College Park, MD)	en_us
dc.subject	Generalized hidden Markov models (GHMMs)	en
dc.subject	ab initio gene finding	en
dc.subject	gene finder	en
dc.title	An empirical analysis of training protocols for probabilistic gene finders	en
dc.type	Article	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: An empirical.pdf
Size:: 341.36 KB
Format:: Adobe Portable Document Format

Download

Collections

Computer Science Research Works