Skip to main content

An Approach for Clustering Semantically Heterogeneous XML Schemas

  • Conference paper
On the Move to Meaningful Internet Systems 2005: CoopIS, DOA, and ODBASE (OTM 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3760))

Abstract

In this paper we illustrate an approach for clustering semantically heterogeneous XML Schemas. The proposed approach is driven mainly by the semantics of the involved Schemas that is defined by means of the interschema properties existing among concepts represented therein. An important feature of our approach consists of its capability to be integrated with almost all the clustering algorithms already proposed in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Beil, F., Ester, M., Xu, X.: Frequent term-based text clustering. In: Proc. of the International Conference on Knowledge Discovery and Data Mining (KDD 2002), Edmonton, Alberta, Canada, pp. 436–442. ACM Press, New York (2002)

    Chapter  Google Scholar 

  2. Bergamaschi, S., Castano, S., Vincini, M.: Semantic integration of semistructured and structured data sources. SIGMOD Record 28(1), 54–59 (1999)

    Article  Google Scholar 

  3. Buccafurri, F., Rosaci, D., Sarnè, G.M.L., Ursino, D.: An agent-based hierarchical clustering approach for e-commerce environments. In: Bauknecht, K., Tjoa, A.M., Quirchmayr, G. (eds.) EC-Web 2002. LNCS, vol. 2455, pp. 109–118. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  4. Castano, S., De Antonellis, V., De Capitani di Vimercati, S.: Global viewing of heterogeneous data sources. IEEE Transactions on Data and Knowledge Engineering 13(2), 277–297 (2001)

    Article  Google Scholar 

  5. Costa, G., Manco, G., Ortale, R., Tagarelli, A.: A tree-based approach to clustering XML documents by structure. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 137–148. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  6. Dalamagas, T., Cheng, T., Winkel, K., Sellis, T.K.: Clustering XML documents using structural summaries. In: Proc. of the International Workshop on Clustering Information Over the Web (ClustWeb 2004), Heraklion, Crete, Greece. LNCS, pp. 547–556. Springer, Heidelberg (2004)

    Google Scholar 

  7. De Meo, P., Quattrone, G., Terracina, G., Ursino, D.: “Almost automatic” and semantic integration of XML Schemas at various “severity levels”. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds.) CoopIS 2003, DOA 2003, and ODBASE 2003. LNCS, vol. 2888, pp. 4–21. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  8. De Meo, P., Quattrone, G., Terracina, G., Ursino, D.: Extraction of synonymies, hyponymies, overlappings and homonymies from XML Schemas at various “severity” levels. In: Proc. of the International Database Engineering and Applications Symposium (IDEAS 2004), Coimbra, Portugal, pp. 389–394. IEEE Computer Society, Los Alamitos (2004)

    Chapter  Google Scholar 

  9. Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society Series B 30(1), 1–38 (1977)

    MathSciNet  Google Scholar 

  10. Fankhauser, P., Kracker, M., Neuhold, E.J.: Semantic vs. structural resemblance of classes. ACM SIGMOD RECORD 20(4), 59–63 (1991)

    Article  Google Scholar 

  11. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001)

    Google Scholar 

  12. He, B., Tao, T., Chang, K.C.-C.: Organizing structured Web sources by query schemas: a clustering approach. In: Proc. of the ACM International Conference on Information and Knowledge Management (CIKM 2004), Washington, Columbia, USA, pp. 22–31. ACM Press, New York (2004)

    Google Scholar 

  13. Hochbaum, D.S., Shmoys, D.B.: A best possible heuristic for the k-center problem. International Journal on Digital Libraries 10(2), 180–184 (1985)

    MATH  MathSciNet  Google Scholar 

  14. Lee, M.L., Yang, L.H., Hsu, W., Yang, X.: XClust: clustering XML schemas for effective integration. In: Proc. of the ACM International Conference on Information and Knowledge Management (CIKM 2002), McLean, Virginia, USA, pp. 292–299. ACM Press, New York (2002)

    Google Scholar 

  15. MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proc. of the International Symposium on Mathematics, Statistics and Probability, Berkeley, California, USA, pp. 281–297. University of California Press, Berkeley (1967)

    Google Scholar 

  16. Nierman, A., Jagadish, H.V.: Evaluating structural similarity in XML documents. In: Proc. of the International Workshop on the Web and Databases (WebDB 2002), Madison, Wisconsin, USA, pp. 61–66 (2002)

    Google Scholar 

  17. Palopoli, L., Saccà, D., Terracina, G., Ursino, D.: Uniform techniques for deriving similarities of objects and subschemes in heterogeneous databases. IEEE Transactions on Knowledge and Data Engineering 15(2), 271–294 (2003)

    Article  Google Scholar 

  18. Passi, K., Lane, L., Madria, S.K., Sakamuri, B.C., Mohania, M.K., Bhowmick, S.S.: A model for XML Schema integration. In: Bauknecht, K., Tjoa, A.M., Quirchmayr, G. (eds.) EC-Web 2002. LNCS, vol. 2455, pp. 193–202. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  19. Qian, W., Zhang, L., Liang, Y., Qian, H., Jin, W.: A two-level method for clustering DTDs. In: Lu, H., Zhou, A. (eds.) WAIM 2000. LNCS, vol. 1846, pp. 41–52. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  20. Qian, Y., Zhang, K.: A customizable hybrid approach to data clustering. In: Proc. of the International Symposium on Applied Computing (SAC 2003), Melbourne, Florida, USA, pp. 485–489. ACM Press, New York (2003)

    Chapter  Google Scholar 

  21. Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB Journal 10(4), 334–350 (2001)

    Article  MATH  Google Scholar 

  22. Van Rijsbergen, C.J.: Information Retrieval. Butterworth, London (1979)

    Google Scholar 

  23. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  24. Xu, L., Jordan, M.I.: On convergence properties of the em algorithm for gaussian mixtures. Neural Computation 8(1), 129–151 (1996)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

De Meo, P., Quattrone, G., Terracina, G., Ursino, D. (2005). An Approach for Clustering Semantically Heterogeneous XML Schemas. In: Meersman, R., Tari, Z. (eds) On the Move to Meaningful Internet Systems 2005: CoopIS, DOA, and ODBASE. OTM 2005. Lecture Notes in Computer Science, vol 3760. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11575771_22

Download citation

  • DOI: https://doi.org/10.1007/11575771_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29736-9

  • Online ISBN: 978-3-540-32116-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics