Skip to main content

A Graph-Based Approach to Topic Clustering of Tourist Attraction Reviews

  • Conference paper
  • First Online:
Information and Software Technologies (ICIST 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1078))

Included in the following conference series:

  • 945 Accesses

Abstract

A large volume of user reviews on tourist attractions can prohibit travel businesses from acquiring overall consumers’ expectations and consumers themselves from seeing the big picture and making thoughtful decisions on trip planning. Summarization of the reviews allows both parties to catch the main themes and underlying tones of the attractions. In this paper, we address the task of topic clustering, by applying a graph-based approach to group the reviews into clusters. To interpret the resulting review clusters, WordNet and Inverse Document Frequency (IDF) are utilized to extract keywords from each cluster which represents the topic. We evaluate the graph-based clustering approach against gold standard data annotated by human and the results are compared against Latent Dirichlet Allocation (LDA), a widely used algorithm for topic discovery. The approach is shown to be competitive to LDA in terms of clustering user reviews on tourist attractions. The graph-based approach, unlike LDA which requires the number of clusters as an input, can dynamically clusters the reviews into groups, revealing the number of clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Lexical database which measures the relatedness of terms.

  2. 2.

    https://www.tripadvisor.com/.

  3. 3.

    https://github.com/koadman/proxigenomics.

  4. 4.

    Path similarity computes the shortest path between two word senses. Word senses are more similar when their path distance is closer to 1.

  5. 5.

    BCubed is an evaluation metric which compares the resulting clusters generated by an algorithm with the gold standard clusters.

  6. 6.

    There is a statistically significant difference between the two results if p-value is less than 0.05 (p < 0.05).

  7. 7.

    https://cytoscape.org/.

References

  1. Aker, A., et al.: A graph-based approach to topic clustering for online comments to news. In: Ferro, N., et al. (eds.) ECIR 2016. LNCS, vol. 9626, pp. 15–29. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30671-1_2

    Chapter  Google Scholar 

  2. Alghamdi, R., Alfalqi, K.: A survey of topic modeling in text mining. Int. J. Adv. Comput. Sci. Appl. 6 (2015). https://doi.org/10.14569/IJACSA.2015.060121

  3. Alkhodair, S.A., Fung, B.C.M., Rahman, O., Hung, P.C.K.: Improving interpretations of topic modeling in microblogs. J. Assoc. Inf. Sci. Technol. 69(4), 528–540 (2018). https://doi.org/10.1002/asi.23980

    Article  Google Scholar 

  4. Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retrieval 12(4), 461–486 (2009). https://doi.org/10.1007/s10791-008-9066-8

    Article  Google Scholar 

  5. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). http://dl.acm.org/citation.cfm?id=944919.944937

    MATH  Google Scholar 

  6. DeMaere, M.Z., Darling, A.E.: Deconvoluting simulated metagenomes: the performance of hard- and soft- clustering algorithms applied to metagenomic chromosome conformation capture (3C). PeerJ 4, e2676 (2016). https://doi.org/10.7717/peerj.2676

    Article  Google Scholar 

  7. van Dongen, S.M.: Graph clustering by flow simulation. Ph.D. thesis, University of Utrecht, The Netherlands (2000). https://dspace.library.uu.nl/handle/1874/848

  8. Dorow, B., Widdows, D.: Discovering corpus-specific word senses. In: 10th Conference of the European Chapter of the Association for Computational Linguistics (2003). http://aclweb.org/anthology/E03-1020

  9. Grant, C.E., George, C.P., Kanjilal, V., Nirkhiwale, S., Wilson, J.N., Wang, D.Z.: A topic-based search, visualization, and exploration system. In: FLAIRS Conference (2015)

    Google Scholar 

  10. Griffiths, T.L., Steyvers, M.: Finding scientific topics. In: Proceedings of the National Academy of Sciences, vol. 101, pp. 5228–5235. National Academy of Sciences (2004). https://doi.org/10.1073/pnas.0307752101

    Article  Google Scholar 

  11. Holten, D., van Wijk, J.J.: Force-directed edge bundling for graph visualization. In: Proceedings of the 11th Eurographics/IEEE - VGTC Conference on Visualization, EuroVis 2009, pp. 983–998. The Eurographics Association and Wiley, Chichester (2009). https://doi.org/10.1111/j.1467-8659.2009.01450.x

    Article  Google Scholar 

  12. Ji, Z., Pi, H., Wei, W., Xiong, B., Woźniak, M., Damasevicius, R.: Recommendation based on review texts and social communities. A hybrid model. IEEE Access 7, 40416–40427 (2019). https://doi.org/10.1109/ACCESS.2019.2897586

    Article  Google Scholar 

  13. Jindal, V.: A personalized Markov clustering and deep learning approach for Arabic text categorization. In: Proceedings of the ACL 2016 Student Research Workshop, pp. 145–151. Association for Computational Linguistics (2016). https://doi.org/10.18653/v1/P16-3022

  14. Jurgens, D., Klapaftis, I.: SemEval-2013 task 13: word sense induction for graded and non-graded senses. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), pp. 290–299. Association for Computational Linguistics (2013). http://aclweb.org/anthology/S13-2049

  15. Litvin, S., Hoffman, L.M.: Responses to consumer-generated media in the hospitality marketplace: an empirical study. J. Vacation Mark. 18, 135–145 (2012). https://doi.org/10.1177/1356766712443467

    Article  Google Scholar 

  16. Llewellyn, C., Grover, C., Oberlander, J.: Improving topic model clustering of newspaper comments for summarisation. In: Proceedings of the ACL 2016 Student Research Workshop, pp. 43–50. Association for Computational Linguistics, Berlin, August 2016. http://anthology.aclweb.org/P/P16/P16-3007

  17. Phuong, D.V., Phuong, T.M.: A keyword-topic model for contextual advertising. In: Proceedings of the Third Symposium on Information and Communication Technology, SoICT 2012, pp. 63–70 (2012). https://doi.org/10.1145/2350716.2350728

  18. Satuluri, V., Parthasarathy, S.: Scalable graph clustering using stochastic flows: applications to community discovery. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2009, pp. 737–746. ACM, New York (2009). https://doi.org/10.1145/1557019.1557101

  19. Satuluri, V., Parthasarathy, S., Ucar, D.: Markov clustering of protein interaction networks with improved balance and scalability. In: Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology, BCB 2010, pp. 247–256. ACM, New York (2010). https://doi.org/10.1145/1854776.1854812

  20. Shih, Y.K., Parthasarathy, S.: Identifying functional modules in interaction networks through overlapping Markov clustering. Bioinformatics 28(18), i473–i479 (2012). https://doi.org/10.1093/bioinformatics/bts370

    Article  Google Scholar 

Download references

Acknowledgments

This research project was supported by Faculty of Information and Communication Technology, Mahidol University. The study was carried out under the research framework of Mahidol University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nuttha Sirilertworakul .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sirilertworakul, N., Yimwadsana, B. (2019). A Graph-Based Approach to Topic Clustering of Tourist Attraction Reviews. In: Damaševičius, R., Vasiljevienė, G. (eds) Information and Software Technologies. ICIST 2019. Communications in Computer and Information Science, vol 1078. Springer, Cham. https://doi.org/10.1007/978-3-030-30275-7_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30275-7_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30274-0

  • Online ISBN: 978-3-030-30275-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics