Skip to main content

Machine Learning-Based Model Categorization Using Textual and Structural Features

  • Conference paper
  • First Online:
New Trends in Database and Information Systems (ADBIS 2022)

Abstract

Model Driven Engineering (MDE), where models are the core elements in the entire life cycle from the specification to maintenance phases, is one of the promising techniques to provide abstraction and automation. However, model management is another challenging issue due to the increasing number of models, their size, and their structural complexity. So that the available models should be organized by modelers to be reused and overcome the development of the new and more complex models with less cost and effort. In this direction, many studies are conducted to categorize models automatically. However, most of the studies focus either on the textual data or structural information in the intelligent model management, leading to less precision in the model management activities. Therefore, we utilized a model classification using baseline machine learning approaches on a dataset including 555 Ecore metamodels through hybrid feature vectors including both textual and structural information. In the proposed approach, first, the textual information of each model has been summarized in its elements through text processing as well as the ontology of synonyms within a specific domain. Then, the performances of machine learning classifiers were observed on two different variants of the datasets. The first variant includes only textual features (represented both in TF-IDF and word2vec representations), whereas the second variant consists of the determined structural features and textual features. It was finally concluded that each experimented machine learning algorithm gave more successful prediction performance on the variant containing structural features. The presented model yields promising results for the model classification task with a classification accuracy of 89.16%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Tekinerdogan, B., Babur, Ö., Cleophas, L., van den Brand, M., Akşit, M.: Introduction to model management and analytics. In: Model Management and Analytics for Large Scale Systems, pp. 3–11. Elsevier (2020)

    Chapter  Google Scholar 

  2. Harish, B.S., Guru, D.S., Manjunath, S.: Representation and classification of text documents: a brief review. Int. J. Comput. Appl. 2, 110–119 (2010)

    Google Scholar 

  3. Basciani, F., Rocco, J., Ruscio, D., Iovino, L., Pierantonio, A.: Automated clustering of metamodel repositories. In: Nurcan, S., Soffer, P., Bajec, M., Eder, J. (eds.) CAiSE 2016. LNCS, vol. 9694, pp. 342–358. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-39696-5_21

    Chapter  Google Scholar 

  4. Babur, O.: Statistical analysis of large sets of models. In: 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 888–891. IEEE, Singapore (2016)

    Google Scholar 

  5. Babur, Ö., Cleophas, L.: Using n-grams for the automated clustering of structural models. In: Steffen, B., Baier, C., van den Brand, M., Eder, J., Hinchey, M., Margaria, T. (eds.) SOFSEM 2017. LNCS, vol. 10139, pp. 510–524. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-51963-0_40

    Chapter  Google Scholar 

  6. Babur, O.: Clone detection for Ecore metamodels using n-grams. In: Proceedings of the 6th International Conference on Model-Driven Engineering and Software Development, MODELSWARD 2018, pp. 411–219. SciTePress, Portugal (2018)

    Google Scholar 

  7. Babur, O., Cleophas, L., Brand, M.: Metamodel clone detection with Samos. J. Comput. Lang. 51, 57–74 (2019)

    Article  Google Scholar 

  8. Steinberg, D., Budinsky, F., Merks, E., Paternostro, M.: EMF: Eclipse Modeling Framework. Pearson Education (2008)

    Google Scholar 

  9. Fellbaum, C.: WordNet. In: Poli, R., Healy, M., Kameas, A. (eds.) Theory and Applications of Ontology: Computer Applications, pp. 231–243. Springer, Dordrecht (2010). https://doi.org/10.1007/978-90-481-8847-5_10

  10. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)

    Article  Google Scholar 

  11. Zhang, W., Yoshida, T., Tang, X.: A comparative study of TF*IDF, LSI, and multi-words for text classification. Exp. Syst. Appl. 38(3), 2758–2765 (2011)

    Article  Google Scholar 

  12. Church, K.W.: Word2vec. Nat. Lang. Eng. 23(1), 155–162 (2017)

    Article  Google Scholar 

  13. Chidamber, S.R., Kemerer, C.F.: A metrics suite for object-oriented design. IEEE Trans. Softw. Eng. 20(6), 293–318 (1994)

    Article  Google Scholar 

  14. Bozyiğit, A., Utku, S., Nasibov, E.: Cyberbullying detection: utilizing social media features. Exp. Syst. Appl. 179, 115001 (2021)

    Article  Google Scholar 

  15. Bozyiğit, A., Utku, S., and Nasibov, E.: Cyberbullying detection by using artificial neural network models. In: 2019 4th International Conference on Computer Science and Engineering, pp. 520–524 (2019)

    Google Scholar 

  16. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  17. Basaran, K., Bozyiğit, F., Siano, P., Taser, P., Kilinc, D.: Systematic literature review of photovoltaic output power forecasting. IET Renew. Power Gener. 14(19), 3961–3973 (2020)

    Article  Google Scholar 

  18. Mishra, M., Srivastava, M.: A view of artificial neural network. In: 2014 International Conference on Advances in Engineering & Technology Research, pp. 1–3 (2014)

    Google Scholar 

  19. Babur, O.: A labeled ecore metamodel dataset for domain clustering (2019)

    Google Scholar 

  20. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  21. McKinney, W.: Pandas: a foundational Python library for data analysis and statistics. Python High Perform. Sci. Comput. 14(9), 1–9 (2011)

    Google Scholar 

  22. Srinivasa-Desikan, B.: Natural Language Processing and Computational Linguistics: A Practical Guide to Text Analysis with Python, Gensim, spaCy, and Keras. Packt Publishing Ltd., Birmingham (2018)

    Google Scholar 

  23. Khalilipour, A., Bozyigit, F., Utku, C., Challenger, M.: Categorization of the models based on structural information extraction and machine learning. In: Cengiz Kahraman, A., Tolga, C., Onar, S.C., Cebi, S., Oztaysi, B., Sari, I.U. (eds.) Intelligent and Fuzzy Systems: Digital Acceleration and The New Normal - Proceedings of the INFUS 2022 Conference, Volume 2, pp. 173–181. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-09176-6_21

    Chapter  Google Scholar 

  24. Challenger, M., Erata, F., Onat, M., Gezgen, H., Kardas, G.: A model-driven engineering technique for developing composite content applications. In: 5th Symposium on Languages, Applications and Technologies, SLATE 2016, pp. 11:1–11:10 (2016)

    Google Scholar 

  25. Asici, TZ., Karaduman, B., Eslampanah, R., Challenger, M., Denil, J., Vangheluwe, H.: Applying model driven engineering techniques to the development of Contiki-based IoT systems. In: IEEE/ACM 1st International Workshop on Software Engineering Research & Practices for the Internet of Things (SERP4IoT), pp. 25–32 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fatma Bozyigit .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Khalilipour, A., Bozyigit, F., Utku, C., Challenger, M. (2022). Machine Learning-Based Model Categorization Using Textual and Structural Features. In: Chiusano, S., et al. New Trends in Database and Information Systems. ADBIS 2022. Communications in Computer and Information Science, vol 1652. Springer, Cham. https://doi.org/10.1007/978-3-031-15743-1_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-15743-1_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-15742-4

  • Online ISBN: 978-3-031-15743-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics