Abstract
With the exponential growth of collected data in different fields like recommender system (user, items), text mining (document, term), bioinformatics (individual, gene), co-clustering which is a simultaneous clustering of both dimensions of a data matrix, has become a popular technique. Co-clustering aims to obtain homogeneous blocks leading to an easy simultaneous interpretation of row clusters and column clusters. Many approaches exist, in this paper we rely on the latent block model (LBM) which is flexible allowing to model different types of data matrices. We extend its use to the case of a tensor (3D matrix) data in proposing a Tensor LBM (TLBM) allowing different relations between entities. To show the interest of TLBM, we consider continuous and binary datasets. To estimate the parameters, a variational EM algorithm is developed. Its performances are evaluated on synthetic and real datasets to highlight different possible applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Banerjee, A., Krumpelman, C., Ghosh, J., Basu, S., Mooney, R.J.: Model-based overlapping clustering. In: Proceedings of the Eleventh ACM SIGKDD, pp. 532–537 (2005)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc.: Ser. B 39, 1–38 (1977)
Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: Proceedings of the Ninth ACM SIGKDD, pp. 89–98 (2003)
Feizi, S., Javadi, H., Tse, D.: Tensor biclustering. In: Advances in Neural Information Processing Systems 30, pp. 1311–1320. Curran Associates, Inc. (2017)
Fraley, C., Raftery, A.E.: How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput. J. 41(8), 578–588 (1998)
Govaert, G., Nadif, M.: An EM algorithm for the block mixture model. IEEE Trans. Pattern Anal. Mach. Intell. 27(4), 643–647 (2005)
Govaert, G., Nadif, M.: Fuzzy clustering to estimate the parameters of block mixture models. Soft Comput. 10(5), 415–422 (2006)
Govaert, G., Nadif, M.: Co-clustering. Wiley-IEEE Press (2013)
Haralick, R., Shanmugam, K., Dinstein, I.: Textural features for image classification. IEEE Trans. Syst. Man Cybern. 3(6), 610–621 (1973)
Kumar, R.M., Sreekumar, K.: A survey on image feature descriptors. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 5(1), 7668–7673 (2014)
Steinley, D.: Properties of the hubert-arabie adjusted rand index. Psychol. Methods 9(3), 386 (2004)
Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)
Vu, D., Aitkin, M.: Variational algorithms for biclustering models. Comput. Stat. Data Anal. 89, 12–24 (2015)
Wu, T., Benson, A.R., Gleich, D.F.: General tensor spectral co-clustering for higher-order data. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 2559–2567. Curran Associates, Inc. (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix: Update and \(\forall i,k,j,\ell \)
A Appendix: Update and \(\forall i,k,j,\ell \)
To obtain the expression of , we maximize the above soft criterion \(F_C(\tilde{\mathbf {z}},\tilde{\mathbf {w}};\varOmega )\) with respect to , subject to the constraint . The corresponding Lagrangian, up to terms which are not function of , is given by:
Taking derivatives with respect to , we obtain:
Setting this derivative to zero yields: Summing both sides over all \(k'\) yields \(\exp (\beta + 1)= \sum _{k'} \pi _{k'} \exp (\sum _{j,\ell }w_{j\ell }\log (\varPhi (\mathbf {x}_{ij},\varvec{\lambda }_{k'\ell })).\) Plugging \(\exp (\beta )\) in leads to: In the same way, we can estimate maximizing \(F_C(\tilde{\mathbf {z}},\tilde{\mathbf {w}};\varOmega )\) with respect to , subject to the constraint ; we obtain
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Boutalbi, R., Labiod, L., Nadif, M. (2019). Co-clustering from Tensor Data. In: Yang, Q., Zhou, ZH., Gong, Z., Zhang, ML., Huang, SJ. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11439. Springer, Cham. https://doi.org/10.1007/978-3-030-16148-4_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-16148-4_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16147-7
Online ISBN: 978-3-030-16148-4
eBook Packages: Computer ScienceComputer Science (R0)