Skip to main content

Comparative Analysis of Scientific Papers Collections via Topic Modeling and Co-authorship Networks

  • Conference paper
  • First Online:
Artificial Intelligence and Natural Language (AINL 2019)

Abstract

In this paper, the authors present an approach to benchmarking the collections of scientific journals based on the analysis of co-authorship graphs and a text models. The main methodical result is Comparative Topic Modeling (CTM) technique. The application of time series to the metrics of co-authorship graphs allowed trends in the development of author collaborations in scientific journals to be analyzed. A text model was created using machine learning methods. The content of journals was classified to determine the degree of authenticity both in various journals and their issues. Experiments was conducted on the archives of two journals in the field of Rheumatology. The authors used public data sets from the SNAP research laboratory at Stanford University to benchmark the co-authorship network metrics. The application of the research results is improving editorial strategies for development of co-authorship collaborations and scientific content excellence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://snap.stanford.edu/data/ca-GrQc.html.

  2. 2.

    https://snap.stanford.edu/data/ca-HepTh.html.

References

  1. Aizawa, A.: An information-theoretic perspective of TF-IDF measures. Inf. Process. Manage. 39(1), 45–65 (2003)

    Article  MATH  Google Scholar 

  2. Alba, R.D.: A graph-theoretic definition of a sociometric clique. J. Math. Sociol. 3(1), 113–126 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  3. Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics (2007)

    Google Scholar 

  4. Bholowalia, P., Kumar, A.: EBK-means: a clustering technique based on elbow method and K-means in WSN. Int. J. Comput. Appl. 105(9), 17–24 (2014)

    Google Scholar 

  5. Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media Inc., Sebastopol (2009)

    MATH  Google Scholar 

  6. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)

    MATH  Google Scholar 

  7. Bondy, J.A., Murty, U.S.R., et al.: Graph Theory with Applications, vol. 290. Citeseer (1976)

    Google Scholar 

  8. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  9. Cunningham, S.J., Dillon, S.M.: Authorship patterns in information systems. Scientometrics 39(1), 19 (1997)

    Article  Google Scholar 

  10. Egghe, L., Rousseau, R., Van Hooydonk, G.: Methods for accrediting publications to authors or countries: consequences for evaluation studies. J. Am. Soc. Inf. Sci. 51(2), 145–157 (2000)

    Article  Google Scholar 

  11. Farkas, I., Derényi, I., Jeong, H., Neda, Z., Oltvai, Z., Ravasz, E., Schubert, A., Barabási, A.L., Vicsek, T.: Networks in life: scaling properties and eigenvalue spectra. Physica A: Stat. Mech. Appl. 314(1–4), 25–34 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  12. Garfield, E.: Is citation analysis a legitimate evaluation tool? Scientometrics 1(4), 359–375 (1979)

    Article  Google Scholar 

  13. Hofmann, T.: Probabilistic latent semantic indexing. In: ACM SIGIR Forum, vol. 51, pp. 211–218. ACM (2017)

    Google Scholar 

  14. Kleene, S.C.: Representation of events in nerve nets and finite automata. Technical report, RAND PROJECT AIR FORCE SANTA MONICA CA (1951)

    Google Scholar 

  15. Korobov, M.: Morphological analyzer and generator for Russian and Ukrainian languages. In: Khachay, M.Y., Konstantinova, N., Panchenko, A., Ignatov, D.I., Labunets, V.G. (eds.) AIST 2015. CCIS, vol. 542, pp. 320–332. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26123-2_31

    Chapter  Google Scholar 

  16. Krasnov, F., Sen, A.: The number of topics optimization: clustering approach. Mach. Learn. Knowl. Extr. 1(1), 416–426 (2019)

    Article  Google Scholar 

  17. Krasnov, F., Ushmaev, O.: Exploration of hidden research directions in oil and gas industry via full text analysis of OnePetro digital library. Int. J. Open Inf. Technol. 6(5), 7–14 (2018)

    Google Scholar 

  18. Kucera, H., Francis, W.N.: Computational Analysis of Present - Day American English. Dartmouth Publishing Group, Hanover (1967)

    Google Scholar 

  19. Law, J., Zhuo, H.H., He, J.H., Rong, E.: LTSG: latent topical skip-gram for mutually improving topic model and vector representations. In: Lai, J.-H., et al. (eds.) PRCV 2018. LNCS, vol. 11258, pp. 375–387. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03338-5_32

    Chapter  Google Scholar 

  20. Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph evolution: densification and shrinking diameters. ACM Trans. Knowl. Discovery Data (TKDD) 1(1), 2 (2007)

    Article  Google Scholar 

  21. Lovins, J.B.: Development of a stemming algorithm. Mech. Translat. Comp. Linguist. 11(2), 22–31 (1968)

    Google Scholar 

  22. Lu, X., Zheng, X., Li, X.: Latent semantic minimal hashing for image retrieval. IEEE Trans. Image Process. 26(1), 355–368 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  23. Lucas, C., Nielsen, R.A., Roberts, M.E., Stewart, B.M., Storer, A., Tingley, D.: Computer-assisted text analysis for comparative politics. Polit. Anal. 23(2), 254–277 (2015)

    Article  Google Scholar 

  24. Naik, R.R., Landge, M.B., Mahender, C.N.: A review on plagiarism detection tools. Int. J. Comput. Appl. 125(11) (2015)

    Google Scholar 

  25. Newman, M.E.: Scientific collaboration networks. i. Network construction and fundamental results. Phys. Rev. E 64(1), 016131 (2001)

    Article  MathSciNet  Google Scholar 

  26. Newman, M.E.: Analysis of weighted networks. Phys. Rev. E 70(5), 056131 (2004)

    Article  Google Scholar 

  27. Packard, D.: Computer-assisted morphological analysis of ancient Greek. In: COLING 1973 Volume 2: Computational And Mathematical Linguistics: Proceedings of the International Conference on Computational Linguistics, vol. 2 (1973)

    Google Scholar 

  28. Porter, M.F.: Snowball: a language for stemming algorithms (2001)

    Google Scholar 

  29. Schwenk, H., Gauvain, J.L.: Connectionist language modeling for large vocabulary continuous speech recognition. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, p. I-765. IEEE (2002)

    Google Scholar 

  30. Segalovich, I.: A fast morphological algorithm with unknown word guessing induced by a dictionary for a web search engine. In: MLMTA, pp. 273–280. Citeseer (2003)

    Google Scholar 

  31. Sharoff, S., Nivre, J.: The proper place of men and machines in language technology: processing Russian without any linguistic knowledge. In: Proceedings of Dialogue 2011, Russian Conference on Computational Linguistics (2011)

    Google Scholar 

  32. Smeaton, A.F., Keogh, G., Gurrin, C., McDonald, K., Sødring, T.: Analysis of papers from twenty-five years of SIGIR conferences: what have we been doing for the last quarter of a century? In: ACM SIGIR Forum, vol. 37, pp. 49–53. ACM (2003)

    Article  Google Scholar 

  33. Teahan, W.J., Cleary, J.G.: The entropy of English using PPM-based models. In: DCC, p. 53. IEEE (1996)

    Google Scholar 

  34. Teahan, W., Cleary, J.G.: Models of English text. In: 1997 Proceedings of Data Compression Conference, DCC’97, pp. 12–21. IEEE (1997)

    Google Scholar 

  35. Thompson, K.: Programming techniques: regular expression search algorithm. Commun. ACM 11(6), 419–422 (1968)

    Article  MATH  Google Scholar 

  36. Vorontsov, K., Potapenko, A.: Additive regularization of topic models. Mach. Learn. 101(1–3), 303–323 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  37. Wang, X., Ren, J., Zhang, Y., Zhu, D., Qiu, P., Huang, M.: China’s patterns of international technological collaboration 1976–2010: a patent analysis study. Technol. Anal. Strateg. Manag. 26(5), 531–546 (2014)

    Article  Google Scholar 

  38. Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications, vol. 8. Cambridge University Press, Cambridge (1994)

    Book  MATH  Google Scholar 

  39. Weizenbaum, J.: Eliza–a computer program for the study of natural language communication between man and machine. Commun. ACM 9(1), 36–45 (1966)

    Article  Google Scholar 

  40. Wiederhold, G.: Intelligent integration of information. In: ACM SIGMOD Record, vol. 22, pp. 434–437. ACM (1993)

    Google Scholar 

  41. Willett, P.: The porter stemming algorithm: then and now. Program 40(3), 219–223 (2006)

    Article  Google Scholar 

  42. Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2016)

    Google Scholar 

  43. Zhao, W.X., et al.: Comparing Twitter and traditional media using topic models. In: Clough, P., et al. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 338–349. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20161-5_34

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fedor Krasnov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Krasnov, F., Dimentov, A., Shvartsman, M. (2019). Comparative Analysis of Scientific Papers Collections via Topic Modeling and Co-authorship Networks. In: Ustalov, D., Filchenkov, A., Pivovarova, L. (eds) Artificial Intelligence and Natural Language. AINL 2019. Communications in Computer and Information Science, vol 1119. Springer, Cham. https://doi.org/10.1007/978-3-030-34518-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-34518-1_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34517-4

  • Online ISBN: 978-3-030-34518-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics