Abstract
Negative examples, which are required for most machine learning methods to infer new predictions, are rarely directly recorded in several real world databases for classification problems. A variety of heuristics for the choice of negative examples have been proposed, ranging from simply under-sampling non positive instances, to the analysis of class taxonomy structures. Here we propose an efficient strategy for selecting negative examples designed for Hopfield networks which exploits the clustering properties of positive instances. The method has been validated on the prediction of protein functions of a model organism.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al.: Gene ontology: tool for the unification of biology. Nat. Genet. 25(1), 25–29 (2000)
Bertoni, A., Frasca, M., Valentini, G.: Cosnet: A cost sensitive neural network for semi-supervised learning in graphs. In: Machine Learning and Knowledge Discovery in Databases—European Conference, ECML PKDD 2011, Athens, Greece, 5–9 September 2011. Proceedings, Part I. LNAI, vol. 6911, pp. 219–234. Springer-Verlag (2011)
Bezdek, J.C., Ehrlich, R., Full, W.: Fcm: the fuzzy c-means clustering algorithm. Comput. Geosci. 10(2), 191–203 (1984)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Burghouts, G.J., Schutte, K., Bouma, H., den Hollander, R.J.M.: Selection of negative samples and two-stage combination of multiple features for action detection in thousands of videos. Mach. Vis. Appl. 25(1), 85–98 (2014)
Campello, R.J.G.B., Hruschka, E.R.: A fuzzy extension of the silhouette width criterion for cluster analysis. Fuzzy Sets Syst. 157(21), 2858–2875 (2006)
Fagni, T., Sebastiani, F.: On the selection of negative examples for hierarchical text categorization. In: Proceedings of the 3rd Language & Technology Conference (LTC07). pp. 24–28 (2007)
Ferretti, E., Errecalde, M.L., Anderka, M., Stein, B.: On the use of reliable-negatives selection strategies in the PU learning approach for quality flaws prediction in wikipedia. In: 2014 25th International Workshop on Database and Expert Systems Applications (DEXA), pp. 211–215 (2014)
Frasca, M., Bertoni, A., Re, M., Valentini, G.: A neural network algorithm for semi-supervised node label learning from unbalanced data. Neural Netw. 43, 84–98 (2013)
Gomez, S.M., Noble, W.S., Rzhetsky, A.: Learning to predict protein-protein interactions from protein sequences. Bioinformatics 19(15), 1875–1881 (2003)
Hopfield, J.J.: Neural networks and physical systems with emergent collective compatational abilities. Proc. Natl. Acad. Sci. 79(8), 2554–2558 (1982)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data. Wiley, New York (1990)
Lin, H.T., Lin, C.J., Weng, R.C.: A note on Platt’s probabilistic outputs for support vector machines. Mach. Learn. 68(3), 267–276 (2007)
Liu, B., Dai, Y., Li, X., Lee, W.S., Yu, P.S.: Building text classifiers using positive and unlabeled examples. In: Third IEEE International Conference on Data Mining, 2003. ICDM 2003. pp. 179–186 (2003)
Lovász, L.: Random walks on graphs: A survey. In: Combinatorics, Paul Erdős is Eighty. pp. 353–397 (1993)
Marshall, E.: Getting the noise out of gene arrays. Science 306(5696), 630–631 (2004)
Mostafavi, S., Goldenberg, A., Morris, Q.: Labeling nodes using three degrees of propagation. PLoS ONE 7(12), e51947 (2012)
Mostafavi, S., Morris, Q.: Using the gene ontology hierarchy when predicting gene function. In: Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence. pp. 419–427 (2009)
Wilcoxon, F.: Individual comparisons by ranking methods. Biom. Bull. 1, 80–83 (1945)
Youngs, N., Penfold-Brown, D., Drew, K., Shasha, D., Bonneau, R.: Parametric bayesian priors and better choice of negative examples improve protein function prediction. Bioinformatics 29(9), tt10–98 (2013)
Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using gaussian fields and harmonic functions. In: ICML. pp. 912–919 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Frasca, M., Malchiodi, D. (2016). Selection of Negative Examples for Node Label Prediction Through Fuzzy Clustering Techniques. In: Bassis, S., Esposito, A., Morabito, F., Pasero, E. (eds) Advances in Neural Networks. WIRN 2015. Smart Innovation, Systems and Technologies, vol 54. Springer, Cham. https://doi.org/10.1007/978-3-319-33747-0_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-33747-0_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-33746-3
Online ISBN: 978-3-319-33747-0
eBook Packages: EngineeringEngineering (R0)