Abstract
Model Driven Engineering (MDE), where models are the core elements in the entire life cycle from the specification to maintenance phases, is one of the promising techniques to provide abstraction and automation. However, model management is another challenging issue due to the increasing number of models, their size, and their structural complexity. So that the available models should be organized by modelers to be reused and overcome the development of the new and more complex models with less cost and effort. In this direction, many studies are conducted to categorize models automatically. However, most of the studies focus either on the textual data or structural information in the intelligent model management, leading to less precision in the model management activities. Therefore, we utilized a model classification using baseline machine learning approaches on a dataset including 555 Ecore metamodels through hybrid feature vectors including both textual and structural information. In the proposed approach, first, the textual information of each model has been summarized in its elements through text processing as well as the ontology of synonyms within a specific domain. Then, the performances of machine learning classifiers were observed on two different variants of the datasets. The first variant includes only textual features (represented both in TF-IDF and word2vec representations), whereas the second variant consists of the determined structural features and textual features. It was finally concluded that each experimented machine learning algorithm gave more successful prediction performance on the variant containing structural features. The presented model yields promising results for the model classification task with a classification accuracy of 89.16%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Tekinerdogan, B., Babur, Ö., Cleophas, L., van den Brand, M., Akşit, M.: Introduction to model management and analytics. In: Model Management and Analytics for Large Scale Systems, pp. 3–11. Elsevier (2020)
Harish, B.S., Guru, D.S., Manjunath, S.: Representation and classification of text documents: a brief review. Int. J. Comput. Appl. 2, 110–119 (2010)
Basciani, F., Rocco, J., Ruscio, D., Iovino, L., Pierantonio, A.: Automated clustering of metamodel repositories. In: Nurcan, S., Soffer, P., Bajec, M., Eder, J. (eds.) CAiSE 2016. LNCS, vol. 9694, pp. 342–358. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-39696-5_21
Babur, O.: Statistical analysis of large sets of models. In: 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 888–891. IEEE, Singapore (2016)
Babur, Ö., Cleophas, L.: Using n-grams for the automated clustering of structural models. In: Steffen, B., Baier, C., van den Brand, M., Eder, J., Hinchey, M., Margaria, T. (eds.) SOFSEM 2017. LNCS, vol. 10139, pp. 510–524. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-51963-0_40
Babur, O.: Clone detection for Ecore metamodels using n-grams. In: Proceedings of the 6th International Conference on Model-Driven Engineering and Software Development, MODELSWARD 2018, pp. 411–219. SciTePress, Portugal (2018)
Babur, O., Cleophas, L., Brand, M.: Metamodel clone detection with Samos. J. Comput. Lang. 51, 57–74 (2019)
Steinberg, D., Budinsky, F., Merks, E., Paternostro, M.: EMF: Eclipse Modeling Framework. Pearson Education (2008)
Fellbaum, C.: WordNet. In: Poli, R., Healy, M., Kameas, A. (eds.) Theory and Applications of Ontology: Computer Applications, pp. 231–243. Springer, Dordrecht (2010). https://doi.org/10.1007/978-90-481-8847-5_10
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Zhang, W., Yoshida, T., Tang, X.: A comparative study of TF*IDF, LSI, and multi-words for text classification. Exp. Syst. Appl. 38(3), 2758–2765 (2011)
Church, K.W.: Word2vec. Nat. Lang. Eng. 23(1), 155–162 (2017)
Chidamber, S.R., Kemerer, C.F.: A metrics suite for object-oriented design. IEEE Trans. Softw. Eng. 20(6), 293–318 (1994)
Bozyiğit, A., Utku, S., Nasibov, E.: Cyberbullying detection: utilizing social media features. Exp. Syst. Appl. 179, 115001 (2021)
Bozyiğit, A., Utku, S., and Nasibov, E.: Cyberbullying detection by using artificial neural network models. In: 2019 4th International Conference on Computer Science and Engineering, pp. 520–524 (2019)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Basaran, K., Bozyiğit, F., Siano, P., Taser, P., Kilinc, D.: Systematic literature review of photovoltaic output power forecasting. IET Renew. Power Gener. 14(19), 3961–3973 (2020)
Mishra, M., Srivastava, M.: A view of artificial neural network. In: 2014 International Conference on Advances in Engineering & Technology Research, pp. 1–3 (2014)
Babur, O.: A labeled ecore metamodel dataset for domain clustering (2019)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
McKinney, W.: Pandas: a foundational Python library for data analysis and statistics. Python High Perform. Sci. Comput. 14(9), 1–9 (2011)
Srinivasa-Desikan, B.: Natural Language Processing and Computational Linguistics: A Practical Guide to Text Analysis with Python, Gensim, spaCy, and Keras. Packt Publishing Ltd., Birmingham (2018)
Khalilipour, A., Bozyigit, F., Utku, C., Challenger, M.: Categorization of the models based on structural information extraction and machine learning. In: Cengiz Kahraman, A., Tolga, C., Onar, S.C., Cebi, S., Oztaysi, B., Sari, I.U. (eds.) Intelligent and Fuzzy Systems: Digital Acceleration and The New Normal - Proceedings of the INFUS 2022 Conference, Volume 2, pp. 173–181. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-09176-6_21
Challenger, M., Erata, F., Onat, M., Gezgen, H., Kardas, G.: A model-driven engineering technique for developing composite content applications. In: 5th Symposium on Languages, Applications and Technologies, SLATE 2016, pp. 11:1–11:10 (2016)
Asici, TZ., Karaduman, B., Eslampanah, R., Challenger, M., Denil, J., Vangheluwe, H.: Applying model driven engineering techniques to the development of Contiki-based IoT systems. In: IEEE/ACM 1st International Workshop on Software Engineering Research & Practices for the Internet of Things (SERP4IoT), pp. 25–32 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Khalilipour, A., Bozyigit, F., Utku, C., Challenger, M. (2022). Machine Learning-Based Model Categorization Using Textual and Structural Features. In: Chiusano, S., et al. New Trends in Database and Information Systems. ADBIS 2022. Communications in Computer and Information Science, vol 1652. Springer, Cham. https://doi.org/10.1007/978-3-031-15743-1_39
Download citation
DOI: https://doi.org/10.1007/978-3-031-15743-1_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15742-4
Online ISBN: 978-3-031-15743-1
eBook Packages: Computer ScienceComputer Science (R0)