Data Mining in Large Databases Using Domain Generalization Graphs

Hilderman, Robert J.; Hamilton, Howard J.; Cercone, Nick

doi:10.1023/A:1008769516670

Data Mining in Large Databases Using Domain Generalization Graphs

Published: November 1999

Volume 13, pages 195–234, (1999)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Robert J. Hilderman¹,
Howard J. Hamilton¹ &
Nick Cercone²

115 Accesses
27 Citations
Explore all metrics

Abstract

Attribute-oriented generalization summarizes the information in a relational database by repeatedly replacing specific attribute values with more general concepts according to user-defined concept hierarchies. We introduce domain generalization graphs for controlling the generalization of a set of attributes and show how they are constructed. We then present serial and parallel versions of the Multi-Attribute Generalization algorithm for traversing the generalization state space described by joining the domain generalization graphs for multiple attributes. Based upon a generate-and-test approach, the algorithm generates all possible summaries consistent with the domain generalization graphs. Our experimental results show that significant speedups are possible by partitioning path combinations from the DGGs across multiple processors. We also rank the interestingness of the resulting summaries using measures based upon variance and relative entropy. Our experimental results also show that these measures provide an effective basis for analyzing summary data generated from relational databases. Variance appears more useful because it tends to rank the less complex summaries (i.e., those with few attributes and/or tuples) as more interesting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Making data visualization more efficient and effective: a survey

Article 19 November 2019

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

Genetic algorithms: theory, genetic operators, solutions, and applications

Article 03 February 2023

References

Agrawal, R., Imielinski, T., and Swami, A. (1993). Mining association rules between sets of items in large databases. Proceedings of the ACM SIGMOD International Conference on the Management of Data (SIGMOD'93) (pp. 207–216). Washington, D.C.
Agrawal, R., Lin, K., Sawhney, H.S., and Shim, K. (1995). Fast similarity search in the presence of noise, scaling, and translation in time-series databases. Proceedings of the 21th International Conference on Very Large Databases (VLDB'95) (pp. 490–501). Zurich, Switzerland.
Agrawal, R. and Srikant, R. (1995.) Mining sequential patterns. Proceedings of the 11th International Conference on Data Engineering (ICDE'95) (pp. 3–14).
Atkinson, A.B. (1970). On the Measurement of Inequality, Journal of Economic Theory, 2, 244–263.
Google Scholar
Barber, D.B. and Hamilton, H.J. (1997). Comparison of attribute selection strategies for attribute-oriented generalization. In Lecture Notes in Artificial Intelligence, The 11th International Symposium on Methodologies for Intelligent Systems (ISMIS'97) (pp. 106–116). Charlotte, North Carolina.
Bray, J.R. and Curtis, J.T. (1957). An Ordination of the Upland Forest Communities of Southern Wisconsin, Ecological Monographs, 27, 325–349.
Google Scholar
Brin, S., Motwani, R., and Silverstein, C. (1997). Beyond market baskets: Generalizing association rules to correlations. Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'97) (pp. 265–276).
Brin, S., Motwani, R., Ullman, J.D., and Tsur, S. (1997). Dynamic itemset counting and implication rules for market basket data. Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'97) (pp. 255–264).
Cai, Y., Cercone, N., and Han, J. (1991). Attribute-oriented induction in relational databases. In G. Piatetsky-Shapiro and W. Frawley (Eds.), Knowledge Discovery in Databases (pp. 213–228). Cambridge, MA: AAAI/MIT Press.
Google Scholar
Carter, C.L. and Hamilton, H.J. (1995). Fast, incremental generalization and regeneralization for knowledge discovery from databases. Proceedings of the 8th Florida Artificial Intelligence Symposium (pp. 319–323). Melbourne, Florida.
Carter, C.L. and Hamilton, H.J. (1995). A Fast, On-Line Generalization Algorithm for Knowledge Discovery, Applied Mathematics Letters, 8(2), 5–11.
Google Scholar
Carter, C.L. and Hamilton, H.J. (1995). Performance evaluation of attribute-oriented algorithms for knowledge discovery from databases. Proceedings of the Seventh IEEE International Conference on Tools with Artificial Intelligence (ICTAI'95) (pp. 486–489). Washington, D.C.
Carter, C.L. and Hamilton, H.J. (1998). Efficient Attribute-Oriented Algorithms for Knowledge Discovery from Large Databases, IEEE Transactions on Knowledge and Data Engineering, 10(2), 193–208.
Google Scholar
Carter, C.L., Hamilton, H.J., and Cercone, N. (1997). Share-based measures for itemsets. In J. Komorowski and J. Zytkow (Eds.), Proceedings of the First European Conference on the Principles of Data Mining and Knowledge Discovery (PKDD'97) (pp. 14–24). Trondheim, Norway.
Chu, W.W., Chiang, K., Hsu, C.C., and Yau, H. (1996). An Error-Based Conceptual Clustering Method for Providing Approximate Query Answers, Communications of the ACM, 39(12), VE. http://www.acm.org/-pubs/cacm/extension.
Feldman, R. and Dagan, I. (1995). Knowledge discovery in textual databases (KDT). Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD'95) (pp. 112–117). Montreal.
Godin, R., Missaoui, R., and Alaoui, H. (1995). Incremental Concept Formation Algorithms Based on Galois (Concept) Lattices, Computational Intelligence, 11(2), 246–267.
Google Scholar
Hamilton, H.J. and Fudger, D.F. (1995). Measuring the Potential for Knowledge Discovery in Databases with DBLearn, Computational Intelligence, 11(2), 280–296.
Google Scholar
Hamilton, H.J., Hilderman, R.J., and Cercon, N. (1996). Attribute-oriented induction using domain generalization graphs. Proceedings of the Eighth IEEE International Conference on Tools with Artificial Intelligence (ICTAI'96) (pp. 246–253). Toulouse, France.
Han, J. (1994). Towards Efficient Induction Mechanisms in Database Systems, Theoretical Computer Science, 133, 361–385.
Google Scholar
Han, J., Cai, Y., and Cercone, N. (1992). Knowledge discovery in databases: An attribute-oriented approach. Proceedings of the 18th International Conference on Very Large Data Bases (pp. 547–559). Vancouver.
Han, J., Cai, Y., and Cercone, N. (1993). Data-Driven Discovery of Quantitative Rules in Relational Databases, IEEE Transactions on Knowledge and Data Engineering, 5(1), 29–40.
Google Scholar
Han, J. and Fu, Y. (1995). Discovery of multiple-level association rules from large databases. Proceedings of the 1995 International Conference on Very Large Data Bases (VLDB'95) (pp. 420–431).
Han, J., Fu, Y., and Tang, S. (1995). Advances of the DBLearn system for knowledge discovery in large databases. Proceedings of the 1995 International Joint Conference on Artificial Intelligence (IJCAI'95) (pp. 2049–2050).
Hilderman, R.J., Carter, C.L., Hamilton, H.J., and Cercone, N. (1998). Mining Association Rules from Market Basket Data Using Share Measures and Characterized Itemsets, International Journal on Artificial Intelligence Tools, 7(2), 189–220.
Google Scholar
Hilderman, R.J., Carter, C.L., Hamilton, H.J., and Cercone, N. (1998). Mining market basket data using share measures and characterized itemsets. In X. Wu, R. Kotagiri, and K. Korb (Eds.), Proceedings of the Second Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'98) (pp. 159–173). Melbourne, Australia.
Hilderman, R.J. and Hamilton, H.J. (1999). Heuristics for ranking the interestingness of discovered knowledge. In N. Zhong (Ed.), Proceedings of the Third Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'99). Beijing, China.
Hilderman, R.J., Hamilton, H.J., and Brock Barber. (1999). Ranking the interestingness of summaries from data mining systems. Proceedings of the 12th Annual Florida Artificial Intelligence Research Symposium (FLAIRS'99). Orlando, FL.
Hilderman, R.J., Hamilton, H.J., Kowalchuk, R.J., and Cercone, N. (1997). Parallel knowledge discovery using domain generalization graphs. In J. Komorowski and J. Zytkow (Eds.), Proceedings of the First European Conference on the Principles of Data Mining and Knowledge Discovery (PKDD'97) (pp. 25–35). Trondheim, Norway.
Hilderman, R.J., Li, L., and Hamilton, H.J. (1997). Data visualization in the DB-Discover system. Proceedings of the Ninth IEEE International Conference on Tools with Artificial Intelligence (ICTAI'97) (pp. 474–477). Newport Beach, CA.
Hu, T. and Cercone, N. (1994). Object Aggregation and Cluster Identification, Applied Mathematics Letters, 7(4), 29–34.
Google Scholar
Hwang, H.-Y. and Fu, W.-C. (1995). Efficient algorithms for attribute-oriented induction. Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD'95) (pp. 168–173). Montreal.
Kullback, S. and Leibler, R.A. (1951). On Information and Sufficiency, Annals of Mathematical Statistics, 22, 79–86.
Google Scholar
MacArthur, R.H. (1965). Patterns of Species Diversity, Biological Reviews of the Cambridge Philosophical Society, 40, 510–533.
Google Scholar
Michalski, R.S. (1983). A theory and methodology of inductive learning. In R.S. Michalski, J.G. Carbonell, and T.M. Mitchell (Eds.), Machine Learning: An Artificial Intelligence Approach (pp. 83–134). Tioga Publishing Company.
Mitchell, T.M. (1978). Version Spaces: An Approach to Concept Learning. PhD thesis, Stanford University.
Pang, W., Hilderman, R.J., Hamilton, H.J., and Goodwin, S.D. (1996). Data mining with concept generalization graphs. Proceedings of the 9th Annual Florida Artificial Iintelligence Research Symposium (FLAIRS'96) (pp. 390–394). Key West, FL.
Park, J.S., Chen, M.-S., and Yu, P.S. (1995). An effective hash-based algorithm for mining association rules. Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'95) (pp. 175– 186).
Piatetsky-Shapiro, G. (1991). Discovery, analysis and presentation of strong rules. Knowledge Discovery in Databases (pp. 229–248). AAAI/MIT Press.
Quinlan, J.R. (1993). C4.5 Programs for Machine Learning. Morgan Kaufmann.
Schutz, R.R. (1951). On the Measurement of Income Inequality, American Economic Review, 41, 107–122.
Google Scholar
Srikant, R. and Agrawal, R. (1995). Mining generalized association rules. Proceedings of the 21th International Conference on Very Large Databases (VLDB'95) (pp. 407–419). Zurich, Switzerland.
Srikant, R. and Agrawal, R. (1996). Mining sequential patterns: Generalization and performance improvements. Proceedings of the Fifth International Conference on Extending Database Technology (EDBT'96). Avignon, France.
Stumme, G., Wille, R., and Wille, U. (1998). Conceptual knowledge discovery in databases using formal concept analysis methods. In J. Zytkow and M. Quafafou (Eds.), Proceedings of the Second European Conference on the Principles of Data Mining and Knowledge Discovery (PKDD'98) (pp. 450–458). Nantes, France.
Theil, H. (1967). Economics and Information Theory. Rand McNally.
Toivonen, H. (1996). Sampling large databases for finding association rules. Proceedings of the 22nd International Conference on Very Large Databases (VLDB'96) (pp. 134–145). Mumbay, India.
Whittaker, R.H. (1972). Evolution and Measurement of Species Diversity, Taxon, 21(2/3), 213–251.
Google Scholar
Wille, R. (1992). Concept lattices and conceptual knowledge systems, Computers and Mathematics with Applications, 23, 493–515.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada, S4S 0A2
Robert J. Hilderman & Howard J. Hamilton
Department of Computer Science, Faculty of Mathematics, University of Waterloo, Waterloo, Ontario, Canada, N2L 3G1
Nick Cercone

Authors

Robert J. Hilderman
View author publications
You can also search for this author in PubMed Google Scholar
Howard J. Hamilton
View author publications
You can also search for this author in PubMed Google Scholar
Nick Cercone
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hilderman, R.J., Hamilton, H.J. & Cercone, N. Data Mining in Large Databases Using Domain Generalization Graphs. Journal of Intelligent Information Systems 13, 195–234 (1999). https://doi.org/10.1023/A:1008769516670

Download citation

Issue Date: November 1999
DOI: https://doi.org/10.1023/A:1008769516670

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data Mining in Large Databases Using Domain Generalization Graphs

Abstract

Access this article

Similar content being viewed by others

Making data visualization more efficient and effective: a survey

Feature selection techniques for machine learning: a survey of more than two decades of research

Genetic algorithms: theory, genetic operators, solutions, and applications

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Data Mining in Large Databases Using Domain Generalization Graphs

Abstract

Access this article

Similar content being viewed by others

Making data visualization more efficient and effective: a survey

Feature selection techniques for machine learning: a survey of more than two decades of research

Genetic algorithms: theory, genetic operators, solutions, and applications

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation