Abstract
Business intelligence focuses on the discovery of useful retail patterns by combining both historical and prognostic data. Ultimate goal is the orchestration of more targeted sales and marketing efforts. A frequent analytic task includes the discovery of associations between customers and products. Matrix co-clustering techniques represent a common abstraction for solving this problem. We identify shortcomings of previous approaches, such as the explicit input for the number of co-clusters and the common assumption for existence of a block-diagonal matrix form. We address both of these issues and present techniques for automated matrix co-clustering. We formulate the problem as a recursive bisection on Fiedler vectors in conjunction with an eigengap-driven termination criterion. Our technique does not assume perfect block-diagonal matrix structure after reordering. We explore and identify off-diagonal cluster structures by devising a Gaussian-based density estimator. Finally, we show how to explicitly couple co-clustering with product recommendations, using real-world business intelligence data. The final outcome is a robust co-clustering algorithm that can discover in an automatic manner both disjoint and overlapping cluster structures, even in the preserve of noisy observations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anagnostopoulos, A., Dasgupta, A., Kumar, R.: Approximation Algorithms for co-Clustering. In: Proceedings of ACM Symposium on Principles of Database Systems (PODS), pp. 201–210 (2008)
Arora, S., Rao, S., Vazirani, U.: Expander Flows, Geometric Embeddings and Graph Partitioning. J. ACM 56, 5:1–5:37 (2009)
Chakrabarti, D., Papadimitriou, S., Modha, D.S., Faloutsos, C.: Fully Automatic Cross-associations. In: Proc. of International Conference on Knowledge Discovery and Data Mining (KDD), pp. 79–88 (2004)
Cho, H., Dhillon, I.S., Guan, Y., Sra, S.: Minimum Sum-Squared Residue co-Clustering of Gene Expression Data. In: Proc. of SIAM Conference on Data Mining, SDM (2004)
Chung, F.R.K.: Spectral Graph Theory. American Mathematical Society (1994)
Dhillon, I.S.: Co-Clustering Documents and Words using Bipartite Spectral Graph Partitioning. In: Proc. of International Conference on Knowledge Discovery and Data Mining (KDD), pp. 269–274 (2001)
Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-Clustering. In: Proc. of International Conference on Knowledge Discovery and Data Mining (KDD), pp. 89–98 (2003)
Fiedler, M.: Algebraic Connectivity of Graphs. Czechoslovak Mathematical Journal 23(98), 298–305 (1973)
Guattery, S., Miller, G.L.: On the Performance of Spectral Graph Partitioning Methods. In: Proc. of ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 233–242 (1995)
Hagen, L., Kahng, A.: New Spectral Methods for Ratio Cut Partitioning and Clustering. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 11(9), 1074–1085 (1992)
Hartigan, J.A.: Direct Clustering of a Data Matrix. Journal of the American Statistical Association 67(337), 123–129 (1972)
Leighton, T., Rao, S.: Multicommodity Max-flow Min-cut Theorems and their Use in Designing Approximation Algorithms. J. ACM 46, 787–832 (1999)
Luxburg, U.: A Tutorial on Spectral Clustering. Statistics and Computing 17, 395–416 (2007)
Madeira, S., Oliveira, A.L.: Biclustering Algorithms for Biological Data Analysis: a survey. Trans. on Comp. Biology and Bioinformatics 1(1), 24–45 (2004)
Newman, M.E.J.: Fast Algorithm for Detecting Community Structure in Networks. Phys. Rev. E 69, 066133 (2004)
Papadimitriou, S., Sun, J.: DisCo: Distributed Co-clustering with Map-Reduce: A Case Study towards Petabyte-Scale End-to-End Mining. In: Proc. of International Conference on Data Mining (ICDM), pp. 512–521 (2008)
Shi, J., Malik, J.: Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)
Salomon, D.: Data Compression: The Complete Reference, 2nd edn. Springer-Verlag New York, Inc. (2000)
Shmoys, D.B.: Cut Problems and their Application to Divide-and-conquer, pp. 192–235. PWS Publishing Co. (1997)
Sun, J., Faloutsos, C., Papadimitriou, S., Yu, P.S.: GraphScope: Parameter-free Mining of Large Time-evolving Graphs. In: Proc. of KDD, pp. 687–696 (2007)
Tanay, A., Sharan, R., Shamir, R.: Biclustering Algorithms: a survey. Handbook of Computational Molecular Biology (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zouzias, A., Vlachos, M., Freris, N.M. (2012). Unsupervised Sparse Matrix Co-clustering for Marketing and Sales Intelligence. In: Tan, PN., Chawla, S., Ho, C.K., Bailey, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2012. Lecture Notes in Computer Science(), vol 7301. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30217-6_49
Download citation
DOI: https://doi.org/10.1007/978-3-642-30217-6_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30216-9
Online ISBN: 978-3-642-30217-6
eBook Packages: Computer ScienceComputer Science (R0)