Skip to main content

Incremental generalization for mining in a data warehousing environment

  • Conference paper
  • First Online:
Advances in Database Technology — EDBT'98 (EDBT 1998)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1377))

Included in the following conference series:

Abstract

On a data warehouse, either manual analyses supported by appropriate visualization tools or (semi-) automatic data mining may be performed, e.g. clustering, classification and summarization. Attribute-oriented generalization is a common method for the task of summarization. Typically, in a data warehouse update operations are collected and applied to the data warehouse periodically. Then, all derived information has to be updated as well. Due to the very large size of the base relations, it is highly desirable to perform these updates incrementally. In this paper, we present algorithms for incremental attribute-oriented generalization with the conflicting goals of good efficiency and minimal overly generalization. The algorithms for incremental insertions and deletions are based on the materialization of a relation at an intermediate generalization level, i.e. the anchor relation. Our experiments demonstrate that incremental generalization can be performed efficiently at a low degree of overly generalization. Furthermore, an optimal cardinality for the sets of updates can be determined experimentally yielding the best efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal R., Srikant R.: “Fast Algorithms for Mining Association Rules”, Proc. 20th Int. Conf. on Very Large Data Bases, Santiago, Chile, 1994, pp. 487–499.

    Google Scholar 

  2. Cheung D.W., Han J., Ng V.T., Wong Y.: “Maintenance of Discovered Association Rules in Large Databases: An Incremental Technique”, Proc. 12th Int. Conf. on Data Engineering, New Orleans, USA, 1996, pp. 106–114.

    Google Scholar 

  3. Ester M., Kriegel H.-P., Xu X.: “A Database Interface for Clustering in Large Spatial Databases”, Proc. 1st Int. Conf. on Knowledge Discovery and Data Mining, Montreal, Canada, 1995, AAAI Press, 1995, pp. 94–99.

    Google Scholar 

  4. Ester M., Kriegel H-P, Sander J. and Xu X.: “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise”, Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, Oregon, 1996, pp. 226–231.

    Google Scholar 

  5. Fayyad U., Piatetsky-Shapiro G., and Smyth P.: “Knowledge Discovery and Data Mining: Towards a Unifying Framework”, Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, Oregon, 1996, pp.82–88.

    Google Scholar 

  6. Han J., Cai Y., Cercone N.: “Data-driven Discovery of Quantitative Rules in Relational Databases”, IEEE Transactions on Knowledge and Data Engineering, Vol.5, No.1, 1993, pp. 29–40.

    Article  Google Scholar 

  7. Han J., Fu Y: “Exploration of the Power of Attribute-Oriented Induction in Data Mining”, in Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, 1996, pp. 399–421.

    Google Scholar 

  8. Huyn N.: “Multiple-View Self-Maintenance in Data Warehousing Environments”, Proc. 23rd Int. Conf. on Very Large Data Bases, Athens, Greece, 1997, pp. 26–35.

    Google Scholar 

  9. Mumick I.S., Quass D., Mumick B.S.: “Maintenance of Data Cubes and Summary Tables in a Warehouse”, Proc. ACM SIGMOD Int. Conf. on Management of Data, 1997, pp. 100–111.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Hans-Jörg Schek Gustavo Alonso Felix Saltor Isidro Ramos

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ester, M., Wittmann, R. (1998). Incremental generalization for mining in a data warehousing environment. In: Schek, HJ., Alonso, G., Saltor, F., Ramos, I. (eds) Advances in Database Technology — EDBT'98. EDBT 1998. Lecture Notes in Computer Science, vol 1377. Springer, Berlin, Heidelberg . https://doi.org/10.1007/BFb0100982

Download citation

  • DOI: https://doi.org/10.1007/BFb0100982

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64264-0

  • Online ISBN: 978-3-540-69709-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics