Mining Indirect Associations in Web Data

Tan, Pang-Ning; Kumar, Vipin

doi:10.1007/3-540-45640-6_7

Pang-Ning Tan⁵ &
Vipin Kumar⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2356))

Included in the following conference series:

International Workshop on Mining Web Log Data Across All Customers Touch Points

297 Accesses
13 Citations

Abstract

Web associations are valuable patterns because they provide useful insights into the browsing behavior of Web users. However, there are two major drawbacks of using current techniques for mining Web association patterns, namely, their inability to detect interesting negative associations in data and their failure to account for the impact of site structure on the support of a pattern. To address these issues, a new data mining technique called indirect association is applied to the Web click-stream data. The idea here is to find pairs of pages that are negatively associated with each other, but are positively associated with another set of pages called the mediator. These pairs of pages are said to be indirectly associated via their common mediator. Indirect associations are interesting patterns because they represent the diverse interests of Web users who share a similar traversal path. These patterns are not easily found using existing data mining techniques unless the groups of users are known a priori. The effectiveness of indirect association is demonstrated using Web data from an academic institution and an online Web store.

This work was partially supported by NSF grant # ACI-9982274 and by Army High Performance Computing Research Center contract number DAAD19-01-2-0014. The content of this work does not necessarily reflect the position or policy of the government and no official endorsement should be inferred. Access to computing facilities was provided by AHPCRC and the Minnesota Supercomputing Institute.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Agrawal, T. Imielinski, and A. Swami. Database mining: a performance perspective. IEEE Transactions on Knowledge and Data Eng., 5(6):914–925, December 1993.
Google Scholar
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. of the 20th VLDB Conference, pages 487–499, Santiago, Chile, September 1994.
Google Scholar
R. Agrawal and R. Srikant. Mining sequential patterns. In Proc. of the Eleventh Int’l Conf. on Data Engineering, pages 3–14, Taipei, Taiwan, March 1995.
Google Scholar
A. Banerjee and J. Ghosh. Clickstream clustering using weighted longest common subsequences. In Workshop on Web Mining at the First SIAM Int’l Conf. on Data Mining, pages 33–40, Chicago, IL, 2001.
Google Scholar
J. Borges and M. Levene. Mining association rules in hypertext databases. In Proc. of the Fourth Int’l Conference on Knowledge Discovery and Data Mining, pages 149–153, New York, NY, August 1998.
Google Scholar
T. Brijs, G. Swinnen, K. Vanhoof, and G. Wets. Using association rules for product assortment decisions: A case study. In Proc. of the Fifth Int’l Conference on Knowledge Discovery and Data Mining, pages 254–260, San Diego, August 1999.
Google Scholar
S. Brin, R. Motwani, and C. Silverstein. Beyond market baskets: Generalizing association rules to correlations. In Proc. of 1997 ACM-SIGMOD Int. Conf. on Management of Data, pages 255–264, Tucson, Arizona, June 1997.
Google Scholar
M.S. Chen, J.S. Park, and P.S. Yu. Efficient data mining for path traversal patterns. IEEE Transactions on Knowledge and Data Eng., 10(2):209–221, 1998.
Article Google Scholar
R. Cooley, B. Mobasher, and J. Srivastava. Web mining: Information and pattern discovery on the world wide web. In International Conference on Tools with Artificial Intelligence, pages 558–567, Newport Beach, CA, 1997.
Google Scholar
R. Cooley, P.N. Tan, and J. Srivastava. Discovery of interesting usage patterns from web data. In M. Spiliopoulou and B. Masand, editors, Advances in Web Usage Analysis and User Profiling, volume 1836, pages 163–182. Lecture Notes in Computer Science, 2000.
Chapter Google Scholar
M. Deshpande and G. Karypis. Selective markov models for predicting web page access. In Proc. of First SIAM Int’l Conf. on Data Mining, Chicago, 2001.
Google Scholar
Y. Fu, K. Sandhu, and M. Shih. A generalization-based approach to clustering of web usage sessions. In B. Masand and M. Spiliopoulou, editors, Web Usage Analysis and User Profiling. Springer-Verlag, 2000.
Google Scholar
M.N. Garofalakis, R. Rastogi, and K. Shim. Spirit: Sequential pattern mining with regular expression constraints. In Proc. of the 25th VLDB Conference, pages 223–234, Edinburgh, Scotland, 1999.
Google Scholar
B. Liu, W. Hsu, and Y. Ma. Pruning and summarizing the discovered associations. In Proc. of the Fifth Int’l Conference on Knowledge Discovery and Data Mining, pages 125–134, San Diego, CA, August 1999.
Google Scholar
H. Mannila, Toivonen H., and A.I. Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3):259–289, 1997.
Article Google Scholar
J. Pei, J. Han, B. Mortazavi-Asl, and H. Zhu. Mining access patterns efficiently from web logs. In 4th Pacific-Asia Conference (PAKDD 2000), pages 396–407, Kyoto, Japan, April 2000.
Google Scholar
P. Pirolli, J.E. Pitkow, and R. Rao. Silk from a sow’s ear: Extracting usable structures from the web. In Proc. of the CHI’ 96 Conference on Human Factors in Computing Systems, pages 118–125, Vancouver, BC, April 1996.
Google Scholar
J.E. Pitkow and P. Pirolli. Mining longest repeating subsequences to predict world wide web surfing. In USENIX Symposium on Internet Technologies and Systems, Boulder, CO, October 1999.
Google Scholar
A. Savasere, E. Omiecinski, and S. Navathe. Mining for strong negative associations in a large database of customer transactions. In Proc. of the Fourteenth Int’l Conf. on Data Engineering, pages 494–502, Orlando, Florida, February 1998.
Google Scholar
C. Shahabi, A.M. Zarkesh, J. Adibi, and V. Shah. Knowledge discovery from users web-page navigation. In Workshop on Research Issues in Data Engineering, Birmingham, England, 1997.
Google Scholar
A. Silberschatz and A. Tuzhilin. What makes patterns interesting in knowledge discovery systems. IEEE Trans. on Knowledge and Data Engineering, 8(6):970–974, 1996.
Article Google Scholar
M. Spiliopoulou, L.C. Faulstich, and K. Winkler. A data miner analyzing the navigational behaviour of web users. In Proc. of the Workshop on Machine Learning in User Modelling of the ACAI’99 Int. Conf., Creta, Greece, July 1999.
Google Scholar
R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. In Proc. of the Fifth Int’l Conf. on Extending Database Technology (EDBT), pages 3–17, Avignon, France, March 1996.
Google Scholar
J. Srivastava, R. Cooley, M. Deshpande, and P.N. Tan. Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations, 1(2):12–23, 2000.
Article Google Scholar
P.N. Tan and V. Kumar. Interestingness measures for association patterns: A perspective. In KDD 2000 Workshop on Postprocessing in Machine Learning and Data Mining, Boston, MA, August 2000.
Google Scholar
P.N. Tan and V. Kumar. Discovery of web robot sessions based on their navigational patterns. Data Mining and Knowledge Discovery, 6(1):9–35, 2001.
Article MathSciNet Google Scholar
P.N. Tan and V. Kumar. Mining association patterns in web usage data. In International Conference on Advances in Infrastructure for e-Business, L’Aquila, Italy, January 2002.
Google Scholar
P.N. Tan, V. Kumar, and J. Srivastava. Indirect association: Mining higher order dependencies in data. In Proc. of the 4th European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 632–637, Lyon, France, 2000.
Google Scholar
P.N. Tan, V. Kumar, and J. Srivastava. Selecting the right interestingness measure for association patterns. Technical report, AHPCRC, 2002.
Google Scholar
H Toivonen, M. Klemettinen, P. Ronkainen, K. Hatonen, and H. Mannila. Pruning and grouping discovered association rules. In ECML-95 Workshop on Statistics, Machine Learning and Knowledge Discovery in Databases, pages 47–52, Heraklion, Greece, April 1995.
Google Scholar
A. Wexelblat. An environment for aiding information-browsing tasks. In Proc. of AAAI Symposium on Acquisition, Learning and Demonstration: Automating Tasks for Users, Birmingham, UK, 1996.
Google Scholar
T. Yan, M. Jacobsen, H. Garcia-Molina, and U. Dayal. From user access patterns to dynamic hypertext linking. In Fifth International World Wide Web Conference, Paris, France, 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Minnesota, Minneapolis, MN, 55455
Pang-Ning Tan & Vipin Kumar

Authors

Pang-Ning Tan
View author publications
You can also search for this author in PubMed Google Scholar
Vipin Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Blue Martini Software, 2600 Campus Drive, San Mateo, CA, 94403, USA
Ron Kohavi
Data Miners Inc., 77 North Washington Street, Boston, MA, 02114, USA
Brij M. Masand
Leipzig Graduate School of Management, Jahnallee 59, 04109, Leipzig, Germany
Myra Spiliopoulou
University of Minnesota, 4-192 EECS Building 200 Union St SE, Minneapolis, MN, 55455
Jaideep Srivastava

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tan, PN., Kumar, V. (2002). Mining Indirect Associations in Web Data. In: Kohavi, R., Masand, B.M., Spiliopoulou, M., Srivastava, J. (eds) WEBKDD 2001 — Mining Web Log Data Across All Customers Touch Points. WebKDD 2001. Lecture Notes in Computer Science(), vol 2356. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45640-6_7

Download citation

DOI: https://doi.org/10.1007/3-540-45640-6_7
Published: 29 August 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43969-1
Online ISBN: 978-3-540-45640-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics