Database Implementation of a Model-Free Classifier

Morfonios, Konstantinos

doi:10.1007/978-3-540-75185-4_8

Konstantinos Morfonios¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4690))

Included in the following conference series:

East European Conference on Advances in Databases and Information Systems

572 Accesses

Abstract

Most methods proposed so far for classification of high-dimensional data are memory-based and obtain a model of the data classes through training before actually performing any classification. As a result, these methods are ineffective on (a) very large datasets stored in databases or data warehouses, (b) data whose partitioning into classes cannot be captured by global models and is sensitive to local characteristics, and (c) data that arrives continuously to the system with pre-classified and unclassified instances mutually interleaved and whose successful classification is sensitive to using the most complete and/or most up-to-date information. In this paper, we propose LOCUS, a scalable model-free classifier that overcomes these problems. LOCUS is based on ideas from pattern recognition and is shown to converge to the optimal Bayes classifier as the size of the datasets involved increases. Moreover, LOCUS is data-scalable and can be implemented using standard SQL over arbitrary database tables. To the best of our knowledge, LOCUS is the first classifier that combines all the characteristics above. We demonstrate the effectiveness of LOCUS through experiments over both real-world and synthetic datasets, comparing it against memory-based decision trees. The results indicate an overall superiority of LOCUS over decision trees on both classification accuracy and data sizes that it can handle.

The project is co-financed within Op. Education by the ESF (European Social Fund) and National Resources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Ghosh, S.P., Imielinski, T., Iyer, B.R., Swami, A.N.: An Interval Classifier for Database Mining Applications. In: VLDB 1992 (1992)
Google Scholar
Agrawal, R., Imielinski, T., Swami, A.N.: Database Mining: A Performance Perspective. IEEE Trans. Knowl. Data Eng. 5(6), 914–925 (1993)
Article Google Scholar
Aha, D.W., Kibler, D.F., Albert, M.K.: Instance-Based Learning Algorithms. Machine Learning 6, 37–66 (1991)
Google Scholar
Beyer, K.S., Goldstein, J., Ramakrishnan, R., Shaft, U.: When Is “Nearest Neighbor” Meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, Springer, Heidelberg (1998)
Google Scholar
Burges, C.J.C.: A Tutorial on Support Vector Machines for Pattern Recognition. Data Min. Knowl. Discov. 2(2), 121–167 (1998)
Article Google Scholar
Chen, M.S., Han, J., Yu, P.S.: Data Mining: An Overview from a Database Perspective. IEEE Trans. Knowl. Data Eng. 8(6), 866–883 (1996)
Article Google Scholar
Friedman, J.H., Kohavi, R., Yun, Y.: Lazy Decision Trees. In: AAAI/IAAI, vol. 1, pp. 717–724 (1996)
Google Scholar
Gehrke, J., Ganti, V., Ramakrishnan, R., Loh, W.Y.: BOAT-Optimistic Decision Tree Construction. In: SIGMOD 1999 (1999)
Google Scholar
Gehrke, J., Ramakrishnan, R., Ganti, V.: RainForest - A Framework for Fast Decision Tree Construction of Large Datasets. In: VLDB 1998 (1998)
Google Scholar
John, G.H., Lent, B.: SIPping from the Data Firehose. In: KDD 1997 (1997)
Google Scholar
Kamber, M., Winstone, L., Gon, W., Han, J.: Generalization and Decision Tree Induction: Efficient Classification in Data Mining. In: RIDE 1997 (1997)
Google Scholar
Katayama, N., Satoh, S.: The SR-tree: An Index Structure for High-Dimensional Nearest Neighbor Queries. In: SIGMOD 1997 (1997)
Google Scholar
Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: A Fast Scalable Classifier for Data Mining. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, Springer, Heidelberg (1996)
Chapter Google Scholar
Melli, G.: A Lazy Model-Based Algorithm for On-Line Classification. In: Zhong, N., Zhou, L. (eds.) Methodologies for Knowledge Discovery and Data Mining. LNCS (LNAI), vol. 1574, Springer, Heidelberg (1999)
Google Scholar
Mitchel, T.: Machine Learning. McGraw-Hill, New York (1997)
Google Scholar
Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases, http://www.ics.uci.edu/~mlearn/MLRepository.html
Provost, F.J., Kolluri, V.: A Survey of Methods for Scaling Up Inductive Algorithms. Data Min. Knowl. Discov. 3(2), 131–169 (1999)
Article Google Scholar
Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1(1), 81–106 (1986)
Google Scholar
Shafer, J.C., Agrawal, R., Mehta, M.: SPRINT: A Scalable Parallel Classifier for Data Mining. In: VLDB 1996 (1996)
Google Scholar
Shaft, U., Ramakrishnan, R.: When Is Nearest Neighbors Indexable? In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363, Springer, Heidelberg (2004)
Google Scholar
Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 3rd edn. Academic Press, London (2005)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Yu, H., Yang, J., Han, J.: Classifying large data sets using SVMs with hierarchical clusters. In: KDD 2003 (2003)
Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An Efficient Data Clustering Method for Very Large Databases. In: SIGMOD 1996 (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics and Telecommunications, University of Athens,
Konstantinos Morfonios

Authors

Konstantinos Morfonios
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Yannis Ioannidis Boris Novikov Boris Rachev

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Morfonios, K. (2007). Database Implementation of a Model-Free Classifier. In: Ioannidis, Y., Novikov, B., Rachev, B. (eds) Advances in Databases and Information Systems. ADBIS 2007. Lecture Notes in Computer Science, vol 4690. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75185-4_8

Download citation

DOI: https://doi.org/10.1007/978-3-540-75185-4_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75184-7
Online ISBN: 978-3-540-75185-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics