Abstract
The need for sophisticated analysis of textual documents is becoming more apparent as data is being placed on the Web and digital libraries are surfacing. This paper presents an algorithm for generating constrained association rules from textual documents. The user specifies a set of constraints, concepts and/or structured values. Our algorithm creates matrices and lists based on these prespecified constraints and uses them to generate large itemsets. Because these matrices are small and sparse, we are able to quickly generate higher order large itemsets. Further, since we maintain concept relationship information in a concept library, we can also generate rulesets involving concepts related to the initial set of constraints.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. Agrawal and R. Srikant, “Fast algorithms for mining association rules, ” In Proceedings of the International Conference of Very Large Databases, September 1994.
A. Amir, R. Feldman, R. Kashi, “A new and versatile method for association generation, ” In Information Systems, Vol 22, 1997.
R. Feldman and H. Hirsh, “Mining associations in the presence of background knowledge. In Proceedings of the International Conference on Knowledge Discovery in Databases, 1996.
G. Miller. WORDNET: An on-line lexical database. International Journal of Lexography, 1990.
L. Singh, B. Chen, R. Haight, P. Scheuermann, and K. Aoki, “A robust system architecture for mining semi-structured data, ” In Proceedings of International Conference on Knowledge Discovery and Data Mining, 1998.
L. Singh, P. Scheuermann, and B. Chen. “Generating association rules from semi-structured documents using an extended concept hierarchy, ” In Proceedings of the International Conference on Information and Knowledge Management, November 1997.
T. Teorey and J. Fry. Design of Database Structures, Prentice Hall, 1982.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Singh, L., Chen, B., Haight, R., Scheuermann, P. (1999). An Algorithm for Constrained Association Rule Mining in Semi-structured Data. In: Zhong, N., Zhou, L. (eds) Methodologies for Knowledge Discovery and Data Mining. PAKDD 1999. Lecture Notes in Computer Science(), vol 1574. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48912-6_21
Download citation
DOI: https://doi.org/10.1007/3-540-48912-6_21
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65866-5
Online ISBN: 978-3-540-48912-2
eBook Packages: Springer Book Archive