Abstract
The computational genome-wide annotation of gene functions requires the prediction of hierarchically structured functional classes and can be formalized as a multiclass, multilabel, multipath hierarchical classification problem, characterized by very unbalanced classes. We recently proposed two hierarchical protein function prediction methods: the Hierarchical Bayes (hbayes) and True Path Rule (tpr) ensemble methods, both able to reconcile the prediction of component classifiers trained locally at each term of the ontology and to control the overall precision-recall trade-off. In this contribution, we focus on the experimental comparison of the hbayes and tpr hierarchical gene function prediction methods and their cost-sensitive variants, using the model organism S. cerevisiae and the FunCat taxonomy. The results show that cost-sensitive variants of these methods achieve comparable results, and significantly outperform both flat and their non cost-sensitive hierarchical counterparts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Friedberg, I.: Automated protein function prediction-the genomic challenge. Brief. Bioinformatics 7, 225–242 (2006)
Pena-Castillo, L., et al.: A critical assessment of Mus musculus gene function prediction using integrated genomic evidence. Genome Biology 9(S1) (2008)
Guan, Y., Myers, C., Hess, D., Barutcuoglu, Z., Caudy, A., Troyanskaya, O.: Predicting gene function in a hierarchical context with an ensemble of classifiers. Genome Biology 9(S2) (2008)
Sokolov, A., Ben-Hur, A.: A structured-outputs method for prediction of protein function. In: MLSB 2008, the Second International Workshop on Machine Learning in Systems Biology (2008)
Astikainen, K., Holm, L., Pitkanen, E., Szedmak, S., Rousu, J.: Towards structured output prediction of enzyme function. BMC Proceedings 2(suppl. 4:S2) (2008)
Obozinski, G., Lanckriet, G., Grant, C., Jordan, M.I., Noble, W.S.: Consistent probabilistic output for protein function prediction. Genome Biology 9(S6) (2008)
Jiang, X., Nariai, N., Steffen, M., Kasif, S., Kolaczyk, E.: Integration of relational and hierarchical network information for protein function prediction. BMC Bioinformatics 9(350) (2008)
Cesa-Bianchi, N., Valentini, G.: Hierarchical cost-sensitive algorithms for genome-wide gene function prediction. Journal of Machine Learning Research, W&C Proceedings (to appear)
Valentini, G.: True path rule hierarchical ensembles. In: Benediktsson, J.A., Kittler, J., Roli, F. (eds.) MCS 2009. LNCS, vol. 5519, pp. 232–241. Springer, Heidelberg (2009)
Ruepp, A., Zollner, A., Maier, D., Albermann, K., Hani, J., Mokrejs, M., Tetko, I., Guldener, U., Mannhaupt, G., Munsterkotter, M., Mewes, H.: The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Research 32(18), 5539–5545 (2004)
The Gene Ontology Consortium: Gene ontology: tool for the unification of biology. Nature Genet. 25, 25–29 (2000)
Cesa-Bianchi, N., Gentile, C., Tironi, A., Zaniboni, L.: Incremental algorithms for hierarchical classification. In: Advances in Neural Information Processing Systems, vol. 17, pp. 233–240. MIT Press, Cambridge (2005)
Cesa-Bianchi, N., Gentile, C., Zaniboni, L.: Hierarchical classification: Combining Bayes with SVM. In: Proc. of the 23rd Int. Conf. on Machine Learning, pp. 177–184. ACM Press, New York (2006)
Gene Ontology Consortium: True path rule (2009), http://www.geneontology.org/GO.usage.shtml#truePathRule
Valentini, G., Re, M.: Weighted True Path Rule: a multilabel hierarchical algorithm for gene function prediction. In: MLD-ECML 2009, 1st International Workshop on learning from Multi-Label Data, Bled, Slovenia, pp. 133–146 (2009)
Valentini, G.: True Path Rule hierarchical ensembles for genome-wide gene function prediction. IEEE ACM Trans. on Comp. Biol. and Bioinformatics (in press)
Lin, H., Lin, C., Weng, R.: A note on Platt’s probabilistic outputs for support vector machines. Machine Learning 68, 267–276 (2007)
Verspoor, K., Cohn, J., Mnizewski, S., Joslyn, C.: A categorization approach to automated ontological function annotation. Protein Science 15, 1544–1549 (2006)
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Re, M., Valentini, G. (2010). An Experimental Comparison of Hierarchical Bayes and True Path Rule Ensembles for Protein Function Prediction. In: El Gayar, N., Kittler, J., Roli, F. (eds) Multiple Classifier Systems. MCS 2010. Lecture Notes in Computer Science, vol 5997. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12127-2_30
Download citation
DOI: https://doi.org/10.1007/978-3-642-12127-2_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12126-5
Online ISBN: 978-3-642-12127-2
eBook Packages: Computer ScienceComputer Science (R0)