Skip to main content

Mining Indirect Associations in Web Data

  • Conference paper
  • First Online:
WEBKDD 2001 — Mining Web Log Data Across All Customers Touch Points (WebKDD 2001)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2356))

Abstract

Web associations are valuable patterns because they provide useful insights into the browsing behavior of Web users. However, there are two major drawbacks of using current techniques for mining Web association patterns, namely, their inability to detect interesting negative associations in data and their failure to account for the impact of site structure on the support of a pattern. To address these issues, a new data mining technique called indirect association is applied to the Web click-stream data. The idea here is to find pairs of pages that are negatively associated with each other, but are positively associated with another set of pages called the mediator. These pairs of pages are said to be indirectly associated via their common mediator. Indirect associations are interesting patterns because they represent the diverse interests of Web users who share a similar traversal path. These patterns are not easily found using existing data mining techniques unless the groups of users are known a priori. The effectiveness of indirect association is demonstrated using Web data from an academic institution and an online Web store.

This work was partially supported by NSF grant # ACI-9982274 and by Army High Performance Computing Research Center contract number DAAD19-01-2-0014. The content of this work does not necessarily reflect the position or policy of the government and no official endorsement should be inferred. Access to computing facilities was provided by AHPCRC and the Minnesota Supercomputing Institute.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Agrawal, T. Imielinski, and A. Swami. Database mining: a performance perspective. IEEE Transactions on Knowledge and Data Eng., 5(6):914–925, December 1993.

    Google Scholar 

  2. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. of the 20th VLDB Conference, pages 487–499, Santiago, Chile, September 1994.

    Google Scholar 

  3. R. Agrawal and R. Srikant. Mining sequential patterns. In Proc. of the Eleventh Int’l Conf. on Data Engineering, pages 3–14, Taipei, Taiwan, March 1995.

    Google Scholar 

  4. A. Banerjee and J. Ghosh. Clickstream clustering using weighted longest common subsequences. In Workshop on Web Mining at the First SIAM Int’l Conf. on Data Mining, pages 33–40, Chicago, IL, 2001.

    Google Scholar 

  5. J. Borges and M. Levene. Mining association rules in hypertext databases. In Proc. of the Fourth Int’l Conference on Knowledge Discovery and Data Mining, pages 149–153, New York, NY, August 1998.

    Google Scholar 

  6. T. Brijs, G. Swinnen, K. Vanhoof, and G. Wets. Using association rules for product assortment decisions: A case study. In Proc. of the Fifth Int’l Conference on Knowledge Discovery and Data Mining, pages 254–260, San Diego, August 1999.

    Google Scholar 

  7. S. Brin, R. Motwani, and C. Silverstein. Beyond market baskets: Generalizing association rules to correlations. In Proc. of 1997 ACM-SIGMOD Int. Conf. on Management of Data, pages 255–264, Tucson, Arizona, June 1997.

    Google Scholar 

  8. M.S. Chen, J.S. Park, and P.S. Yu. Efficient data mining for path traversal patterns. IEEE Transactions on Knowledge and Data Eng., 10(2):209–221, 1998.

    Article  Google Scholar 

  9. R. Cooley, B. Mobasher, and J. Srivastava. Web mining: Information and pattern discovery on the world wide web. In International Conference on Tools with Artificial Intelligence, pages 558–567, Newport Beach, CA, 1997.

    Google Scholar 

  10. R. Cooley, P.N. Tan, and J. Srivastava. Discovery of interesting usage patterns from web data. In M. Spiliopoulou and B. Masand, editors, Advances in Web Usage Analysis and User Profiling, volume 1836, pages 163–182. Lecture Notes in Computer Science, 2000.

    Chapter  Google Scholar 

  11. M. Deshpande and G. Karypis. Selective markov models for predicting web page access. In Proc. of First SIAM Int’l Conf. on Data Mining, Chicago, 2001.

    Google Scholar 

  12. Y. Fu, K. Sandhu, and M. Shih. A generalization-based approach to clustering of web usage sessions. In B. Masand and M. Spiliopoulou, editors, Web Usage Analysis and User Profiling. Springer-Verlag, 2000.

    Google Scholar 

  13. M.N. Garofalakis, R. Rastogi, and K. Shim. Spirit: Sequential pattern mining with regular expression constraints. In Proc. of the 25th VLDB Conference, pages 223–234, Edinburgh, Scotland, 1999.

    Google Scholar 

  14. B. Liu, W. Hsu, and Y. Ma. Pruning and summarizing the discovered associations. In Proc. of the Fifth Int’l Conference on Knowledge Discovery and Data Mining, pages 125–134, San Diego, CA, August 1999.

    Google Scholar 

  15. H. Mannila, Toivonen H., and A.I. Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3):259–289, 1997.

    Article  Google Scholar 

  16. J. Pei, J. Han, B. Mortazavi-Asl, and H. Zhu. Mining access patterns efficiently from web logs. In 4th Pacific-Asia Conference (PAKDD 2000), pages 396–407, Kyoto, Japan, April 2000.

    Google Scholar 

  17. P. Pirolli, J.E. Pitkow, and R. Rao. Silk from a sow’s ear: Extracting usable structures from the web. In Proc. of the CHI’ 96 Conference on Human Factors in Computing Systems, pages 118–125, Vancouver, BC, April 1996.

    Google Scholar 

  18. J.E. Pitkow and P. Pirolli. Mining longest repeating subsequences to predict world wide web surfing. In USENIX Symposium on Internet Technologies and Systems, Boulder, CO, October 1999.

    Google Scholar 

  19. A. Savasere, E. Omiecinski, and S. Navathe. Mining for strong negative associations in a large database of customer transactions. In Proc. of the Fourteenth Int’l Conf. on Data Engineering, pages 494–502, Orlando, Florida, February 1998.

    Google Scholar 

  20. C. Shahabi, A.M. Zarkesh, J. Adibi, and V. Shah. Knowledge discovery from users web-page navigation. In Workshop on Research Issues in Data Engineering, Birmingham, England, 1997.

    Google Scholar 

  21. A. Silberschatz and A. Tuzhilin. What makes patterns interesting in knowledge discovery systems. IEEE Trans. on Knowledge and Data Engineering, 8(6):970–974, 1996.

    Article  Google Scholar 

  22. M. Spiliopoulou, L.C. Faulstich, and K. Winkler. A data miner analyzing the navigational behaviour of web users. In Proc. of the Workshop on Machine Learning in User Modelling of the ACAI’99 Int. Conf., Creta, Greece, July 1999.

    Google Scholar 

  23. R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. In Proc. of the Fifth Int’l Conf. on Extending Database Technology (EDBT), pages 3–17, Avignon, France, March 1996.

    Google Scholar 

  24. J. Srivastava, R. Cooley, M. Deshpande, and P.N. Tan. Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations, 1(2):12–23, 2000.

    Article  Google Scholar 

  25. P.N. Tan and V. Kumar. Interestingness measures for association patterns: A perspective. In KDD 2000 Workshop on Postprocessing in Machine Learning and Data Mining, Boston, MA, August 2000.

    Google Scholar 

  26. P.N. Tan and V. Kumar. Discovery of web robot sessions based on their navigational patterns. Data Mining and Knowledge Discovery, 6(1):9–35, 2001.

    Article  MathSciNet  Google Scholar 

  27. P.N. Tan and V. Kumar. Mining association patterns in web usage data. In International Conference on Advances in Infrastructure for e-Business, L’Aquila, Italy, January 2002.

    Google Scholar 

  28. P.N. Tan, V. Kumar, and J. Srivastava. Indirect association: Mining higher order dependencies in data. In Proc. of the 4th European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 632–637, Lyon, France, 2000.

    Google Scholar 

  29. P.N. Tan, V. Kumar, and J. Srivastava. Selecting the right interestingness measure for association patterns. Technical report, AHPCRC, 2002.

    Google Scholar 

  30. H Toivonen, M. Klemettinen, P. Ronkainen, K. Hatonen, and H. Mannila. Pruning and grouping discovered association rules. In ECML-95 Workshop on Statistics, Machine Learning and Knowledge Discovery in Databases, pages 47–52, Heraklion, Greece, April 1995.

    Google Scholar 

  31. A. Wexelblat. An environment for aiding information-browsing tasks. In Proc. of AAAI Symposium on Acquisition, Learning and Demonstration: Automating Tasks for Users, Birmingham, UK, 1996.

    Google Scholar 

  32. T. Yan, M. Jacobsen, H. Garcia-Molina, and U. Dayal. From user access patterns to dynamic hypertext linking. In Fifth International World Wide Web Conference, Paris, France, 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tan, PN., Kumar, V. (2002). Mining Indirect Associations in Web Data. In: Kohavi, R., Masand, B.M., Spiliopoulou, M., Srivastava, J. (eds) WEBKDD 2001 — Mining Web Log Data Across All Customers Touch Points. WebKDD 2001. Lecture Notes in Computer Science(), vol 2356. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45640-6_7

Download citation

  • DOI: https://doi.org/10.1007/3-540-45640-6_7

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43969-1

  • Online ISBN: 978-3-540-45640-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics