An Approach for Clustering Semantically Heterogeneous XML Schemas

De Meo, Pasquale; Quattrone, Giovanni; Terracina, Giorgio; Ursino, Domenico

doi:10.1007/11575771_22

Pasquale De Meo¹⁸,
Giovanni Quattrone¹⁸,
Giorgio Terracina¹⁹ &
…
Domenico Ursino¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3760))

Included in the following conference series:

OTM Confederated International Conferences "On the Move to Meaningful Internet Systems"

1227 Accesses
2 Citations

Abstract

In this paper we illustrate an approach for clustering semantically heterogeneous XML Schemas. The proposed approach is driven mainly by the semantics of the involved Schemas that is defined by means of the interschema properties existing among concepts represented therein. An important feature of our approach consists of its capability to be integrated with almost all the clustering algorithms already proposed in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Beil, F., Ester, M., Xu, X.: Frequent term-based text clustering. In: Proc. of the International Conference on Knowledge Discovery and Data Mining (KDD 2002), Edmonton, Alberta, Canada, pp. 436–442. ACM Press, New York (2002)
Chapter Google Scholar
Bergamaschi, S., Castano, S., Vincini, M.: Semantic integration of semistructured and structured data sources. SIGMOD Record 28(1), 54–59 (1999)
Article Google Scholar
Buccafurri, F., Rosaci, D., Sarnè, G.M.L., Ursino, D.: An agent-based hierarchical clustering approach for e-commerce environments. In: Bauknecht, K., Tjoa, A.M., Quirchmayr, G. (eds.) EC-Web 2002. LNCS, vol. 2455, pp. 109–118. Springer, Heidelberg (2002)
Chapter Google Scholar
Castano, S., De Antonellis, V., De Capitani di Vimercati, S.: Global viewing of heterogeneous data sources. IEEE Transactions on Data and Knowledge Engineering 13(2), 277–297 (2001)
Article Google Scholar
Costa, G., Manco, G., Ortale, R., Tagarelli, A.: A tree-based approach to clustering XML documents by structure. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 137–148. Springer, Heidelberg (2004)
Chapter Google Scholar
Dalamagas, T., Cheng, T., Winkel, K., Sellis, T.K.: Clustering XML documents using structural summaries. In: Proc. of the International Workshop on Clustering Information Over the Web (ClustWeb 2004), Heraklion, Crete, Greece. LNCS, pp. 547–556. Springer, Heidelberg (2004)
Google Scholar
De Meo, P., Quattrone, G., Terracina, G., Ursino, D.: “Almost automatic” and semantic integration of XML Schemas at various “severity levels”. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds.) CoopIS 2003, DOA 2003, and ODBASE 2003. LNCS, vol. 2888, pp. 4–21. Springer, Heidelberg (2003)
Chapter Google Scholar
De Meo, P., Quattrone, G., Terracina, G., Ursino, D.: Extraction of synonymies, hyponymies, overlappings and homonymies from XML Schemas at various “severity” levels. In: Proc. of the International Database Engineering and Applications Symposium (IDEAS 2004), Coimbra, Portugal, pp. 389–394. IEEE Computer Society, Los Alamitos (2004)
Chapter Google Scholar
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society Series B 30(1), 1–38 (1977)
MathSciNet Google Scholar
Fankhauser, P., Kracker, M., Neuhold, E.J.: Semantic vs. structural resemblance of classes. ACM SIGMOD RECORD 20(4), 59–63 (1991)
Article Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001)
Google Scholar
He, B., Tao, T., Chang, K.C.-C.: Organizing structured Web sources by query schemas: a clustering approach. In: Proc. of the ACM International Conference on Information and Knowledge Management (CIKM 2004), Washington, Columbia, USA, pp. 22–31. ACM Press, New York (2004)
Google Scholar
Hochbaum, D.S., Shmoys, D.B.: A best possible heuristic for the k-center problem. International Journal on Digital Libraries 10(2), 180–184 (1985)
MATH MathSciNet Google Scholar
Lee, M.L., Yang, L.H., Hsu, W., Yang, X.: XClust: clustering XML schemas for effective integration. In: Proc. of the ACM International Conference on Information and Knowledge Management (CIKM 2002), McLean, Virginia, USA, pp. 292–299. ACM Press, New York (2002)
Google Scholar
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proc. of the International Symposium on Mathematics, Statistics and Probability, Berkeley, California, USA, pp. 281–297. University of California Press, Berkeley (1967)
Google Scholar
Nierman, A., Jagadish, H.V.: Evaluating structural similarity in XML documents. In: Proc. of the International Workshop on the Web and Databases (WebDB 2002), Madison, Wisconsin, USA, pp. 61–66 (2002)
Google Scholar
Palopoli, L., Saccà, D., Terracina, G., Ursino, D.: Uniform techniques for deriving similarities of objects and subschemes in heterogeneous databases. IEEE Transactions on Knowledge and Data Engineering 15(2), 271–294 (2003)
Article Google Scholar
Passi, K., Lane, L., Madria, S.K., Sakamuri, B.C., Mohania, M.K., Bhowmick, S.S.: A model for XML Schema integration. In: Bauknecht, K., Tjoa, A.M., Quirchmayr, G. (eds.) EC-Web 2002. LNCS, vol. 2455, pp. 193–202. Springer, Heidelberg (2002)
Chapter Google Scholar
Qian, W., Zhang, L., Liang, Y., Qian, H., Jin, W.: A two-level method for clustering DTDs. In: Lu, H., Zhou, A. (eds.) WAIM 2000. LNCS, vol. 1846, pp. 41–52. Springer, Heidelberg (2000)
Chapter Google Scholar
Qian, Y., Zhang, K.: A customizable hybrid approach to data clustering. In: Proc. of the International Symposium on Applied Computing (SAC 2003), Melbourne, Florida, USA, pp. 485–489. ACM Press, New York (2003)
Chapter Google Scholar
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB Journal 10(4), 334–350 (2001)
Article MATH Google Scholar
Van Rijsbergen, C.J.: Information Retrieval. Butterworth, London (1979)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Xu, L., Jordan, M.I.: On convergence properties of the em algorithm for gaussian mixtures. Neural Computation 8(1), 129–151 (1996)
Article Google Scholar

Download references

Author information

Authors and Affiliations

DIMET, Università “Mediterranea” di Reggio Calabria, Via Graziella, Località Feo di Vito, 89060, Reggio Calabria, Italy
Pasquale De Meo, Giovanni Quattrone & Domenico Ursino
Dipartimento di Matematica, Università della Calabria, Via Pietro Bucci, 87036, Rende (CS), Italy
Giorgio Terracina

Authors

Pasquale De Meo
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Quattrone
View author publications
You can also search for this author in PubMed Google Scholar
Giorgio Terracina
View author publications
You can also search for this author in PubMed Google Scholar
Domenico Ursino
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

STARLab, Vrije Universiteit Brussel (VUB), Bldg G/10, Pleinlaan 2, 1050, Brussels, Belgium
Robert Meersman
School of Computer Science and Information Technology, RMIT University, Bld 10.10, 376-392 Swanston Street, 3001, Melbourne, VIC, Australia
Zahir Tari

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

De Meo, P., Quattrone, G., Terracina, G., Ursino, D. (2005). An Approach for Clustering Semantically Heterogeneous XML Schemas. In: Meersman, R., Tari, Z. (eds) On the Move to Meaningful Internet Systems 2005: CoopIS, DOA, and ODBASE. OTM 2005. Lecture Notes in Computer Science, vol 3760. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11575771_22

Download citation

DOI: https://doi.org/10.1007/11575771_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29736-9
Online ISBN: 978-3-540-32116-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics