Abstract
The approach of combining models learned from multiple batches of data provide an alternative to the common practice of learning one model from all the available data (i.e., the data combination approach). This paper empirically examines the base-line behaviour of the model combination approach in this multiple-data-batches scenario. We find that model combination can lead to better performance even if the disjoint batches of data are drawn randomly from a larger sample, and relate the relative performance of the two approaches to the learning curve of the classifier used.
The practical implication of our results is that one should consider using model combination rather than data combination, especially when multiple batches of data for the same task are readily available.
Another interesting result is that we empirically show that the near-asymptotic performance of a single model, in some classification task, can be significantly improved by combining multiple models (derived from the same algorithm) if the constituent models are substantially different and there is some regularity in the models to be exploited by the combination method used. Comparisons with known theoretical results are also provided.
Chapter PDF
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
Keywords
References
Aha, D.W., D. Kibler & M.K. Albert (1991), Instance-Based Learning Algorithms, Machine Learning, 6, pp. 37–66.
Ali, K.M. & M.J. Pazzani (1996), Error Reduction through Learning Multiple Descriptions, Machine Learning, Vol. 24, No. 3, pp. 173–206.
Baxt, W.G. (1992), Improving the Accuracy of an Artificial Neural Network using Multiple Differently Trained Networks, Neural Computation, Vol. 4, No. 5, pp. 772–780, The MIT Press.
Brazdil,P. & Torgo,L. (1990), Knowledge Acquisition via Knowledge Integration. In Current Trends in Knowledge Acquisition, Wielinga, B. et al.(eds.).
Breiman, L. (1996a), Bagging Predictors, Machine Learning, Vol. 24, No. 2, pp. 123–140.
Breiman, L. (1996b), Bias, Variance, and Arcing Classifiers, Technical Report 460, Department of Statistics, University of California, Berkeley, CA.
Breiman, L. (1996c), Pasting Bites Together for Prediction in Large Data Sets and On-Line, [ftp.stat.berkeley.edu/users/pub/breiman/pasting.ps].
Breiman, L., J.H. Friedman, R.A. Olshen & C.J. Stone (1984), Classification And Regression Trees, Belmont, CA: Wadsworth.
Brodley, C.E. (1993), Addressing the Selective Superiority Problem: Automatic Algorithm/Model Class Selection, in Proceedings of the Tenth International Conference on Machine Learning, pp. 17–24.
Buntine, W. (1991), Classifiers: A Theoretical and Empirical Study, in Proceedings of the Twelfth International Joint Conference on Artificial Intelligence, pp. 638–644, Morgan-Kaufmann.
Cestnik, B. (1990), Estimating Probabilities: A Crucial Task in Machine Learning, in Proceedings of the European Conference on Artificial Intelligence, pp. 147–149.
Chan, P.K. & S.J. Stolfo (1995), A Comparative Evaluation of Voting and Metalearning on Partitioned Data, in Proceedings of the Twelfth International Conference on Machine Learning, pp. 90–98, Morgan Kaufmann.
Chan, P.K. & S.J. Stolfo (1996), On the Accuracy of Meta-learning for Scalable Data Mining, in Journal of Intelligent System, to appear.
Cost, S. & S. Salzberg (1993), A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features, Machine Learning, 10, pp. 57–78.
Craven, M.W. & J.W. Shavlik (1993), Learning to Represent Codons: A Challenge Problem for Constructive Induction, Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pp. 1319–1324.
Fayyad, U.M. & K.B. Irani (1993), Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning, in Proceedings of 13th International Joint Conference on Artificial Intelligence, pp. 1022–1027.
Freund, Y. & R.E. Schapire (1996), Experiments with a New Boosting Algorithm, in Proceedings of the Thirteenth International Conference on Machine Learning, pp. 148–156, Morgan Kaufmann.
Hansen, L.K. & P. Salamon (1990), Neural Network Ensembles, in IEEE Transactions of Pattern Analysis and Machine Intelligence, 12, pp. 993–1001.
Kearns, M. & H.S. Seung (1995), Learning from a Population of Hypotheses, Machine Learning, 18, pp. 255–276, Kluwer Academic Publishers.
Kononenko, I. & M. Kovačič (1992), Learning as Optimization: Stochastic Generation of Multiple Knowledge, in Proceedings of the Ninth International Conference on Machine Learning, pp. 257–262, Morgan Kaufmann.
Krogh, A. & J. Vedelsby (1995), Neural Network Ensembles, Cross Validation, and Active Learning, in Advances in Neural Information Processing Systems 7, G. Tesauro, D.S. Touretsky & T.K. Leen (Editors), pp. 231–238.
Kwok, S. & C. Carter (1990), Multiple Decision Trees, Uncertainty in Artificial Intelligence 4, R. Shachter, T. Levitt, L. Kanal and J. Lemmer (Editors), pp. 327–335, North-Holland.
Merz, C.J. & Murphy, P.M. (1996), UCI Repository of machine learning data-bases [http:// www.ics.uci.edu/ mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science.
Oliver, J.J. & D.J. Hand (1995), On Pruning and Averaging Decision Trees, in Proceedings of the Twelfth International Conference on Machine Learning, pp. 430–437. Morgan Kaufmann.
Perrone, M.P. & L.N. Cooper (1993), When Networks Disagree: Ensemble Methods for Hybrid Neural Networks, in Artificial Neural Networks for Speech and Vision, R.J. Mammone (Editor), Chapman-Hall.
Provost, F.J. & D.N. Hennessy (1996), Scaling Up: Distributed Machine Learning with Cooperation, in Proceedings of the Thirteen National Conference on Artificial Intelligence, pp. 74–79, Menlo Park, CA: AAAI Press.
Quinlan, J.R. (1993), C4.5: Program for machine learning, Morgan Kaufmann.
Quinlan, J.R. (1996), Boosting, Bagging, and C4.5, in Proceedings of the 13th National Conference on Artificial Intelligence, pp. 725–730, AAAI Press.
Quinlan, J.R., P.J. Compton, K.A. Horn & L. Lazarus (1987), Inductive Knowledge Acquisition: A Case Study, in Applications of Expert Systems, J.R. Quinlan (Editor). Turing Institute Press with Addison Wesley.
Schapire, R.E. (1990), The Strength of Weak Learnability, Machine Learning, 5, pp. 197–227, Kluwer Academic Publishers.
Sejnowski, T.J. & C.R. Rosenberg (1987), Parallel networks that learn to pronounce English text, Complex Systems, 1, pp. 145–168.
Tcheng, D., B. Lambert, C-Y. Lu & L. Rendell (1989), Building Robust Learning Systems by Combining Induction and Optimization, in Proceedings of the 11th International Joint Conference on Artificial Intelligence, pp. 806–812.
Ting, K.M. (1994), Discretization of Continuous-Valued Attributes and Instance-Based Learning, TR 491, Basser Department of Computer Science, University of Sydney.
Ting, K.M. (1996), The Characterisation of Predictive Accuracy and Decision Combination, in Proceedings of the Thirteenth International Conference on Machine Learning, pp. 498–506, Morgan Kaufmann.
Ting, K.M. (1997), Discretisation in Lazy Learning Algorithms, to appear in the special issue on Lazy Learning in Artificial Intelligence Review Journal.
Ting, K.M. & B.T. Low (1996), Theory Combination: an alternative to Data Combination, Working Paper 96/19, Department of Computer Science, University of Waikato. [http://www.cs.waikato.ac.nz/cs/Staff/kaiming.html].
Ting, K.M. & I. H. Witten (1997), Stacked Generalization: when does it work?, Working Paper 97/1, Dept of Computer Science, University of Waikato.
Towell, G., J. Shavlik & M. Noordewier (1990), Refinement of Approximate Domain Theories by Knowledge-Based Artificial Neural Networks, in Proceedings of the Eighth National Conference on Artificial Intelligence.
Utgoff, P.E. (1989), Perceptron Trees: A case study in hybrid concept representations, Connection Science, 1, pp. 337–391.
Wettschereck, D. (1994), A Hybrid Nearest-Neighbor and Nearest-Hyperrectangle Algorithm, in Proceedings of the Seventh European Conference on Machine Learning, LNAI-784, pp. 323–335, Springer Verlag.
Wolpert, D.H. (1992), Stacked Generalization, Neural Networks, Vol. 5, pp. 241–259, Pergamon Press.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ting, K.M., Low, B.T. (1997). Model combination in the multiple-data-batches scenario. In: van Someren, M., Widmer, G. (eds) Machine Learning: ECML-97. ECML 1997. Lecture Notes in Computer Science, vol 1224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62858-4_90
Download citation
DOI: https://doi.org/10.1007/3-540-62858-4_90
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62858-3
Online ISBN: 978-3-540-68708-5
eBook Packages: Springer Book Archive