Skip to main content
Log in

A general framework for relation graph clustering

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Relation graphs, in which multi-type (or single type) nodes are related to each other, frequently arise in many important applications, such as Web mining, information retrieval, bioinformatics, and epidemiology. In this study, We propose a general framework for clustering on relation graphs. Under this framework, we derive a family of clustering algorithms including both hard and soft versions, which are capable of learning cluster patterns from relation graphs with various structures and statistical properties. A number of classic approaches on special cases of relation graphs, such as traditional graphs with singly-type nodes and bi-type relation graphs with two types of nodes, can be viewed as special cases of the proposed framework. The theoretic analysis and experiments demonstrate the great potential and effectiveness of the proposed framework and algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Banerjee A, Dhillon I, Ghosh J, Merugu S, Modha D (2004) A generalized maximum entropy approach to bregman co-clustering and matrix approximation. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, USA, pp 509–514

  2. Banerjee A, Merugu S, Dhillon I, Ghosh J (2005) Clustering with Bregman divergences. J Mach Learn Res 6: 1705–1749

    MATH  MathSciNet  Google Scholar 

  3. Bui T, Jones C (1993) A heuristic for reducing fill in sparse matrix factorization. In: 6th SIAM conference on parallel processing for scientific computing, pp 445–452

  4. Chan P, Schlag M, Zien J (1993) Spectral k-way ratio-cut partitioning and clustering. In: Proceedings of the 30th international conference on design automation. ACM, New York, NY, USA, pp 749–754

  5. Della Pietra S, Della Pietra V, Lafferty J (2002) Duality and auxiliary functions for Bregman distances. Technical report, Technical Report CMU-CS-01-109, School of Computer Science, Carnegie Mellon University, 2001

  6. Dhillon I (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, USA, pp 269–274

  7. Dhillon I, Guan Y, Kulis B (2005) A fast kernel-based multilevel algorithm for graph clustering. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining. ACM, New York, NY, USA, pp 629–634

  8. Dhillon I, Guan Y, Kulis B (2005) A unified view of kernel k-means, spectral clustering and graph cuts. The University of Texas at Austin, Department of Computer Sciences, Technical Report TR-04

  9. Dhillon I, Mallela S, Modha D (2003) Information-theoretic co-clustering. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, USA, pp 89–98

  10. Ding C, He X, Zha H, Gu M, Simon H (2001) A min-max cut algorithm for graph partitioning and data clustering. In: Proceedings IEEE international conference on data mining, 2001, ICDM 2001, pp 107–114

  11. El-Yaniv R, Souroujon O (2001) Iterative double clustering for unsupervised and semi-supervised learning. In: EMCL’01: Proceedings of the 12th European conference on machine learning. Springer, London, UK, pp 121–132. ISBN:3-540-42536-5

  12. Gao B, Liu T, Zheng X, Cheng Q, Ma W (2005) Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining. ACM, New York, NY, USA, pp 41–50

  13. Golub G, Van Loan C (1996) Matrix computations. Johns Hopkins University Press, Baltimore

    MATH  Google Scholar 

  14. Hendrickson B, Leland R (1995) A multilevel algorithm for partitioning graphs. In: Supercomputing ’95: proceedings of the 1995 ACM/IEEE conference on supercomputing (CDROM). ACM, New York, NY, USA, p 28

  15. Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, NY, USA, pp 50–57

  16. Hofmann T, Puzicha J (1999), Latent class models for collaborative filtering. In: IJCAI ’99: proceedings of the sixteenth international joint conference on artificial intelligence. Morgan Kaufmann, San Francisco, CA, USA, pp 688–693

  17. Karypis G, Kumar V (1999) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1): 359

    Article  MATH  MathSciNet  Google Scholar 

  18. Kernighan B, Lin S (1970) An efficient heuristic procedure for partitioning graphs. Bell Syst Tech J 49(2): 291–307

    Google Scholar 

  19. Lang K (1995) Newsweeder: learning to filter netnews. In: Proceedings of the twelfth international conference on machine learning, pp 331–339

  20. Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. Adv Neural Inf Process Syst 13: 556–562

    Google Scholar 

  21. Li T (2005) A general model for clustering binary data. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining. ACM, New York, NY, USA, pp 188–197

  22. Long B, Binghamton S, Wu X, Zhang Z, Yu P (2007) Community learning by graph approximation. In: Proceedings of the 2007 seventh IEEE international conference on data mining. IEEE Computer Society, Washington, DC, USA, pp 232–241

  23. Long B, Wu X, Zhang Z, Yu P (2006) Unsupervised learning on k-partite graphs. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, USA, pp 317–326

  24. Long B, Zhang Z, Yu P (2005) Co-clustering by block value decomposition. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining. ACM, New York, NY, USA, pp 635–640

  25. Long B, Zhang Z, Wu X, Yu P (2006) Spectral clustering for multi-type relational data. In: Proceedings of the 23rd international conference on machine learning. ACM, New York, NY, USA, pp 585–592

  26. Neville J, Adler M, Jensen D (2003) Clustering relational data using attribute and link information. In: Proceedings of the IJCAI text mining and link analysis workshop

  27. Ng A, Jordan M, Weiss Y (2002) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 2: 849–856

    Google Scholar 

  28. Salakhutdinov R, Roweis S (2003) Adaptive overrelaxed bound optimization methods. In: Machine learning-international workshop then conference, vol 20, p 664

  29. Shental N, Zomet A, Hertz T, Weiss Y (2003) Pairwise clustering and graphical models. Adv Neural Inf Process Syst 16

  30. Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8): 888–905

    Article  Google Scholar 

  31. Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3: 583–617

    Article  MATH  MathSciNet  Google Scholar 

  32. Tishby N, Pereira FC, Bialek W (1999) The information bottleneck method. In: Proceedings of the 37th annual allerton conference on communication, control and computing, pp 368–377

  33. Van Dongen SM (2000) Graph clustering by flow simulation. PhD thesis, University of Utrecht

  34. Wang J, Zeng H, Chen Z, Lu H, Tao L, Ma W (2003) ReCoM: reinforcement clustering of multi-type interrelated data objects. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval. ACM, New York, NY, USA, pp 274–281

  35. Yu K, Yu S, Tresp V (2006) Soft clustering on graphs. Adv Neural Inf Process Syst 18: 1553

    Google Scholar 

  36. Yu SX, Shi J (2003) Multiclass spectral clustering. In: ICCV ’03: Proceedings of the ninth IEEE international conference on computer vision. IEEE Computer Society, Washington, DC, USA, p 313

  37. Zeng H-J, Chen Z, Ma W-Y (2002) A unified framework for clustering heterogeneous web objects. In: WISE’02: Proceedings of the 3rd international conference on web information systems engineering. IEEE Computer Society, Washington, DC, USA, pp 161–172. ISBN:0-7695-1766-8

  38. Zha H, He X, Ding C, Simon H, Gu M (2001) Bipartite graph partitioning and data clustering. In: Proceedings of the tenth international conference on information and knowledge management. ACM, New York, NY, USA, pp 25–32

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Long.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Long, B., Zhang, Z. & Yu, P.S. A general framework for relation graph clustering. Knowl Inf Syst 24, 393–413 (2010). https://doi.org/10.1007/s10115-009-0255-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-009-0255-6

Keywords

Navigation