Using n-grams for the Automated Clustering of Structural Models

Babur, Önder; Cleophas, Loek

doi:10.1007/978-3-319-51963-0_40

Önder Babur¹⁹ &
Loek Cleophas^19,20

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10139))

Included in the following conference series:

International Conference on Current Trends in Theory and Practice of Informatics

1251 Accesses
15 Citations

Abstract

Model comparison and clustering are important for dealing with many models in data analysis and exploration, e.g. in domain model recovery or model repository management. Particularly in structural models, information is captured not only in model elements (e.g. in names and types) but also in the structural context, i.e. the relation of one element to the others. Some approaches involve a large number of models ignoring the structural context of model elements; others handle very few (typically two) models applying sophisticated structural techniques. In this paper we address both aspects and extend our previous work on model clustering based on vector space model, with a technique for incorporating structural context in the form of n-grams. We compare the n-gram accuracy on two datasets of Ecore metamodels in AtlanMod Zoo: small random samples using up to trigrams and a larger one (\({\sim }\)100 models) up to bigrams.

The research leading to these results has been funded by EU programme FP7-NMP-2013-SMALL-7 under grant agreement number 604279 (MMP).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Babur, Ö., Cleophas, L., van den Brand, M.: Hierarchical clustering of metamodels for comparative analysis and visualization. In: Proceedings of the 12th European Conference on Modelling Foundations and Applications, 2016, pp. 2–18 (2016)
Google Scholar
Babur, Ö., Cleophas, L., Verhoeff, T., van den Brand, M.: Towards statistical comparison and analysis of models. In: Proceedings of the 4th International Conference on Model-Driven Engineering and Software Development, pp. 361–367 (2016)
Google Scholar
Basciani, F., Rocco, J., Ruscio, D., Iovino, L., Pierantonio, A.: Automated clustering of metamodel repositories. In: Nurcan, S., Soffer, P., Bajec, M., Eder, J. (eds.) CAiSE 2016. LNCS, vol. 9694, pp. 342–358. Springer, Heidelberg (2016). doi:10.1007/978-3-319-39696-5_21
Chapter Google Scholar
Bergroth, L., Hakonen, H., Raita, T.: A survey of longest common subsequence algorithms. In: Seventh International Symposium on String Processing and Information Retrieval, 2000, SPIRE 2000, Proceedings, pp. 39–48. IEEE (2000)
Google Scholar
Bislimovska, B., Bozzon, A., Brambilla, M., Fraternali, P.: Textual and content-based search in repositories of web application models. ACM Trans. Web (TWEB) 8(2), 11 (2014)
Google Scholar
Klint, P., Landman, D., Vinju, J.: Exploring the limits of domain model recovery. In: 2013 29th IEEE International Conference on Software Maintenance (ICSM), pp. 120–129. IEEE (2013)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H., et al.: Introduction to Information Retrieval, vol. 1. Cambridge University Press, Cambridge (2008)
Google Scholar
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing, vol. 999. MIT Press, Cambridge (1999)
Google Scholar
Mass, Y., Mandelbrod, M.: Retrieving the most relevant xml components. In: INEX 2003 Workshop Proceedings, p. 58. Citeseer (2003)
Google Scholar
Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: 18th International Conference on Data Engineering, 2002, Proceedings, pp. 117–128. IEEE (2002)
Google Scholar
Rubin, J., Chechik, M.: N-way model merging. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, pp. 301–311. ACM (2013)
Google Scholar
Stahl, T., Völter, M., Bettin, J., Haase, A., Helsen, S.: Model-Driven Software Development: Technology, Engineering, Management. Wiley, New York (2006)
Google Scholar
Stephan, M., Cordy, J.R.: A survey of model comparison approaches and applications. In: Modelsward, pp. 265–277 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Eindhoven University of Technology, 5600 MB, Eindhoven, The Netherlands
Önder Babur & Loek Cleophas
Stellenbosch University, Matieland, 7602, South Africa
Loek Cleophas

Authors

Önder Babur
View author publications
You can also search for this author in PubMed Google Scholar
Loek Cleophas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Önder Babur .

Editor information

Editors and Affiliations

TU Dortmund , Dortmund, Germany
Bernhard Steffen
TU Dresden , Dresden, Germany
Christel Baier
Eindhoven University of Technology , Eindhoven, The Netherlands
Mark van den Brand
Alpen Adria University Klagenfurt , Klagenfurt, Austria
Johann Eder
Lero - Irish Software Research Center , Limerick, Ireland
Mike Hinchey
Lero - Irish Software Research Center , Limerick, Ireland
Tiziana Margaria

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Babur, Ö., Cleophas, L. (2017). Using n-grams for the Automated Clustering of Structural Models. In: Steffen, B., Baier, C., van den Brand, M., Eder, J., Hinchey, M., Margaria, T. (eds) SOFSEM 2017: Theory and Practice of Computer Science. SOFSEM 2017. Lecture Notes in Computer Science(), vol 10139. Springer, Cham. https://doi.org/10.1007/978-3-319-51963-0_40

Download citation

DOI: https://doi.org/10.1007/978-3-319-51963-0_40
Published: 11 January 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-51962-3
Online ISBN: 978-3-319-51963-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics