Skip to main content

Comparing Graph Similarity Measures for Semantic Representations of Documents

  • Conference paper
  • First Online:
Advances in Computing (CCC 2018)

Abstract

Documents semantic representations built from open Knowledge Graphs (KGs) have proven to be beneficial in tasks such as recommendation, user profiling, and document retrieval. Broadly speaking, a semantic representation of a document can be defined as a graph whose nodes represent concepts and whose edges represent the semantic relationships between them. Fine-grained information about the concepts found in the KGs (e.g. DBpedia, YAGO, BabelNet) can be exploited to enrich and refine the representation. Although this kind of semantic representation is a graph, most applications that compare semantic representations reduce this graph to a “flattened” concept-weight representation and use existing well-known vector similarity measures. Consequently, relevant information related to the graph structure is not exploited. In this paper, different graph-based similarity measures are adapted to semantic representation graphs and are implemented and evaluated. Experiments performed on two datasets reveal better results when using the graph similarity measures than when using vector similarity measures. This paper presents the conceptual background, the adapted measures and their evaluation and ends with some conclusions on the threshold between precision and computational complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://dbpedia.org/.

  2. 2.

    www.yago-knowledge.org/.

  3. 3.

    http://babelnet.org/.

  4. 4.

    Hereafter, we use concept and entity interchangeably to refer to resources of the KG.

  5. 5.

    http://www.dbpedia-spotlight.org/.

  6. 6.

    http://babelfy.org/.

References

  1. Bunke, H.: Recent developments in graph matching. In: Proceedings 15th International Conference on Pattern Recognition, ICPR-2000, vol. 2, pp. 117–124 (2000)

    Google Scholar 

  2. Bunke, H., Shearer, K.: A graph distance metric based on the maximal common subgraph. Pattern Recognit. Lett. 19(3), 255–259 (1998)

    Article  Google Scholar 

  3. Corcoglioniti, F., Dragoni, M., Rospocher, M., Aprosio, A.P.: Knowledge extraction for information retrieval. In: Sack, H., Blomqvist, E., d’Aquin, M., Ghidini, C., Ponzetto, S.P., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9678, pp. 317–333. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-34129-3_20

    Chapter  Google Scholar 

  4. Fankhauser, S., Riesen, K., Bunke, H.: Speeding up graph edit distance computation through fast bipartite matching. In: Jiang, X., Ferrer, M., Torsello, A. (eds.) GbRPR 2011. LNCS, vol. 6658, pp. 102–111. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20844-7_11

    Chapter  MATH  Google Scholar 

  5. Färber, M., Ell, B., Menne, C., Rettinger, A.: A comparative survey of DBpedia, freebase, OpenCyc, Wikidata, and YAGO. Semant. Web J. 1–26 (2015)

    Google Scholar 

  6. Gao, X., Xiao, B., Tao, D., Li, X.: A survey of graph edit distance. Pattern Anal. Appl. 13(1), 113–129 (2010)

    Article  MathSciNet  Google Scholar 

  7. Hassan, S., Mihalcea, R.: Semantic relatedness using salient semantic analysis. In: AAAI (2011)

    Google Scholar 

  8. Jouili, S., Tabbone, S., Valveny, E.: Comparing graph similarity measures for graphical recognition. In: Ogier, J.-M., Liu, W., Lladós, J. (eds.) GREC 2009. LNCS, vol. 6020, pp. 37–48. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13728-0_4

    Chapter  Google Scholar 

  9. Lee, M.D., Welsh, M.: An empirical evaluation of models of text document similarity. In: CogSci 2005, pp. 1254–1259. Erlbaum (2005)

    Google Scholar 

  10. Manrique, R., Herazo, O., Mariño, O.: Exploring the use of linked open data for user research interest modeling. In: Solano, A., Ordoñez, H. (eds.) CCC 2017. CCIS, vol. 735, pp. 3–16. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66562-7_1

    Chapter  Google Scholar 

  11. Manrique, R., Mariño, O.: How does the size of a document affect linked open data user modeling strategies? In: Proceedings of the International Conference on Web Intelligence, WI 2017, pp. 1246–1252. ACM, New York (2017)

    Google Scholar 

  12. Manrique, R., Mariño, O.: Diversified semantic query reformulation. In: Różewski, P., Lange, C. (eds.) KESW 2017. CCIS, vol. 786, pp. 23–37. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69548-8_3

    Chapter  Google Scholar 

  13. Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: Dbpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, I-Semantics 2011, pp. 1–8. ACM, New York (2011)

    Google Scholar 

  14. Musto, C., Lops, P., de Gemmis, M., Semeraro, G.: Semantics-aware recommender systems exploiting linked open data and graph-based features. Knowl.-Based Syst. 136, 1–14 (2017)

    Article  Google Scholar 

  15. Nunes, B.P., Fetahu, B., Kawase, R., Dietze, S., Casanova, M.A., Maynard, D.: Interlinking documents based on semantic graphs with an application. In: Tweedale, J.W., Jain, L.C., Watada, J., Howlett, R.J. (eds.) Knowledge-Based Information Systems in Practice. SIST, vol. 30, pp. 139–155. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-13545-8_9

    Chapter  Google Scholar 

  16. Papadimitriou, P., Dasdan, A., Garcia-Molina, H.: Web graph similarity for anomaly detection. J. Internet Serv. Appl. 1(1), 19–30 (2010)

    Article  Google Scholar 

  17. Piao, G., Breslin, J.G.: Analyzing aggregated semantics-enabled user modeling on google+ and twitter for personalized link recommendations. In: Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization, UMAP 2016, pp. 105–109. ACM, New York (2016)

    Google Scholar 

  18. Schuhmacher, M., Ponzetto, S.P.: Knowledge-based graph document modeling. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, WSDM 2014, pp. 543–552. ACM, New York (2014)

    Google Scholar 

  19. Sugiyama, K., Kan, M.Y.: A comprehensive evaluation of scholarly paper recommendation using potential citation papers. Int. J. Digit. Libr. 16(2), 91–109 (2015)

    Article  Google Scholar 

  20. Waitelonis, J., Exeler, C., Sack, H.: Enabled generalized vector space model to improve document retrieval. In: Proceedings of the Third NLP & DBpedia Workshop (NLP & DBpedia 2015) Co-located with the 14th International Semantic Web Conference 2015 (ISWC 2015), 11 October 2015, Bethlehem, Pennsylvania, USA, pp. 33–44 (2015)

    Google Scholar 

  21. Willett, P.: Matching of chemical and biological structures using subgraph and maximal common subgraph isomorphism algorithms. In: Truhlar, D.G., Howe, W.J., Hopfinger, A.J., Blaney, J., Dammkoehler, R.A. (eds.) Rational Drug Design, vol. 108, pp. 11–38. Springer, New York (1999). https://doi.org/10.1007/978-1-4612-1480-9_3

    Chapter  Google Scholar 

Download references

Acknowledgment

This work was partially supported by COLCIENCIAS PhD scholarship (Call 647-2014).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rubén Manrique .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Manrique, R., Cueto-Ramirez, F., Mariño, O. (2018). Comparing Graph Similarity Measures for Semantic Representations of Documents. In: Serrano C., J., Martínez-Santos, J. (eds) Advances in Computing. CCC 2018. Communications in Computer and Information Science, vol 885. Springer, Cham. https://doi.org/10.1007/978-3-319-98998-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-98998-3_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-98997-6

  • Online ISBN: 978-3-319-98998-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics