Comparative Analysis of Scientific Papers Collections via Topic Modeling and Co-authorship Networks

Krasnov, Fedor; Dimentov, Alexander; Shvartsman, Mikhail

doi:10.1007/978-3-030-34518-1_6

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1119))

Included in the following conference series:

Conference on Artificial Intelligence and Natural Language

542 Accesses
3 Citations

Abstract

In this paper, the authors present an approach to benchmarking the collections of scientific journals based on the analysis of co-authorship graphs and a text models. The main methodical result is Comparative Topic Modeling (CTM) technique. The application of time series to the metrics of co-authorship graphs allowed trends in the development of author collaborations in scientific journals to be analyzed. A text model was created using machine learning methods. The content of journals was classified to determine the degree of authenticity both in various journals and their issues. Experiments was conducted on the archives of two journals in the field of Rheumatology. The authors used public data sets from the SNAP research laboratory at Stanford University to benchmark the co-authorship network metrics. The application of the research results is improving editorial strategies for development of co-authorship collaborations and scientific content excellence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Aizawa, A.: An information-theoretic perspective of TF-IDF measures. Inf. Process. Manage. 39(1), 45–65 (2003)
Article MATH Google Scholar
Alba, R.D.: A graph-theoretic definition of a sociometric clique. J. Math. Sociol. 3(1), 113–126 (1973)
Article MathSciNet MATH Google Scholar
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics (2007)
Google Scholar
Bholowalia, P., Kumar, A.: EBK-means: a clustering technique based on elbow method and K-means in WSN. Int. J. Comput. Appl. 105(9), 17–24 (2014)
Google Scholar
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media Inc., Sebastopol (2009)
MATH Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
MATH Google Scholar
Bondy, J.A., Murty, U.S.R., et al.: Graph Theory with Applications, vol. 290. Citeseer (1976)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
Cunningham, S.J., Dillon, S.M.: Authorship patterns in information systems. Scientometrics 39(1), 19 (1997)
Article Google Scholar
Egghe, L., Rousseau, R., Van Hooydonk, G.: Methods for accrediting publications to authors or countries: consequences for evaluation studies. J. Am. Soc. Inf. Sci. 51(2), 145–157 (2000)
Article Google Scholar
Farkas, I., Derényi, I., Jeong, H., Neda, Z., Oltvai, Z., Ravasz, E., Schubert, A., Barabási, A.L., Vicsek, T.: Networks in life: scaling properties and eigenvalue spectra. Physica A: Stat. Mech. Appl. 314(1–4), 25–34 (2002)
Article MathSciNet MATH Google Scholar
Garfield, E.: Is citation analysis a legitimate evaluation tool? Scientometrics 1(4), 359–375 (1979)
Article Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: ACM SIGIR Forum, vol. 51, pp. 211–218. ACM (2017)
Google Scholar
Kleene, S.C.: Representation of events in nerve nets and finite automata. Technical report, RAND PROJECT AIR FORCE SANTA MONICA CA (1951)
Google Scholar
Korobov, M.: Morphological analyzer and generator for Russian and Ukrainian languages. In: Khachay, M.Y., Konstantinova, N., Panchenko, A., Ignatov, D.I., Labunets, V.G. (eds.) AIST 2015. CCIS, vol. 542, pp. 320–332. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26123-2_31
Chapter Google Scholar
Krasnov, F., Sen, A.: The number of topics optimization: clustering approach. Mach. Learn. Knowl. Extr. 1(1), 416–426 (2019)
Article Google Scholar
Krasnov, F., Ushmaev, O.: Exploration of hidden research directions in oil and gas industry via full text analysis of OnePetro digital library. Int. J. Open Inf. Technol. 6(5), 7–14 (2018)
Google Scholar
Kucera, H., Francis, W.N.: Computational Analysis of Present - Day American English. Dartmouth Publishing Group, Hanover (1967)
Google Scholar
Law, J., Zhuo, H.H., He, J.H., Rong, E.: LTSG: latent topical skip-gram for mutually improving topic model and vector representations. In: Lai, J.-H., et al. (eds.) PRCV 2018. LNCS, vol. 11258, pp. 375–387. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03338-5_32
Chapter Google Scholar
Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph evolution: densification and shrinking diameters. ACM Trans. Knowl. Discovery Data (TKDD) 1(1), 2 (2007)
Article Google Scholar
Lovins, J.B.: Development of a stemming algorithm. Mech. Translat. Comp. Linguist. 11(2), 22–31 (1968)
Google Scholar
Lu, X., Zheng, X., Li, X.: Latent semantic minimal hashing for image retrieval. IEEE Trans. Image Process. 26(1), 355–368 (2016)
Article MathSciNet MATH Google Scholar
Lucas, C., Nielsen, R.A., Roberts, M.E., Stewart, B.M., Storer, A., Tingley, D.: Computer-assisted text analysis for comparative politics. Polit. Anal. 23(2), 254–277 (2015)
Article Google Scholar
Naik, R.R., Landge, M.B., Mahender, C.N.: A review on plagiarism detection tools. Int. J. Comput. Appl. 125(11) (2015)
Google Scholar
Newman, M.E.: Scientific collaboration networks. i. Network construction and fundamental results. Phys. Rev. E 64(1), 016131 (2001)
Article MathSciNet Google Scholar
Newman, M.E.: Analysis of weighted networks. Phys. Rev. E 70(5), 056131 (2004)
Article Google Scholar
Packard, D.: Computer-assisted morphological analysis of ancient Greek. In: COLING 1973 Volume 2: Computational And Mathematical Linguistics: Proceedings of the International Conference on Computational Linguistics, vol. 2 (1973)
Google Scholar
Porter, M.F.: Snowball: a language for stemming algorithms (2001)
Google Scholar
Schwenk, H., Gauvain, J.L.: Connectionist language modeling for large vocabulary continuous speech recognition. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, p. I-765. IEEE (2002)
Google Scholar
Segalovich, I.: A fast morphological algorithm with unknown word guessing induced by a dictionary for a web search engine. In: MLMTA, pp. 273–280. Citeseer (2003)
Google Scholar
Sharoff, S., Nivre, J.: The proper place of men and machines in language technology: processing Russian without any linguistic knowledge. In: Proceedings of Dialogue 2011, Russian Conference on Computational Linguistics (2011)
Google Scholar
Smeaton, A.F., Keogh, G., Gurrin, C., McDonald, K., Sødring, T.: Analysis of papers from twenty-five years of SIGIR conferences: what have we been doing for the last quarter of a century? In: ACM SIGIR Forum, vol. 37, pp. 49–53. ACM (2003)
Article Google Scholar
Teahan, W.J., Cleary, J.G.: The entropy of English using PPM-based models. In: DCC, p. 53. IEEE (1996)
Google Scholar
Teahan, W., Cleary, J.G.: Models of English text. In: 1997 Proceedings of Data Compression Conference, DCC’97, pp. 12–21. IEEE (1997)
Google Scholar
Thompson, K.: Programming techniques: regular expression search algorithm. Commun. ACM 11(6), 419–422 (1968)
Article MATH Google Scholar
Vorontsov, K., Potapenko, A.: Additive regularization of topic models. Mach. Learn. 101(1–3), 303–323 (2015)
Article MathSciNet MATH Google Scholar
Wang, X., Ren, J., Zhang, Y., Zhu, D., Qiu, P., Huang, M.: China’s patterns of international technological collaboration 1976–2010: a patent analysis study. Technol. Anal. Strateg. Manag. 26(5), 531–546 (2014)
Article Google Scholar
Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications, vol. 8. Cambridge University Press, Cambridge (1994)
Book MATH Google Scholar
Weizenbaum, J.: Eliza–a computer program for the study of natural language communication between man and machine. Commun. ACM 9(1), 36–45 (1966)
Article Google Scholar
Wiederhold, G.: Intelligent integration of information. In: ACM SIGMOD Record, vol. 22, pp. 434–437. ACM (1993)
Google Scholar
Willett, P.: The porter stemming algorithm: then and now. Program 40(3), 219–223 (2006)
Article Google Scholar
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2016)
Google Scholar
Zhao, W.X., et al.: Comparing Twitter and traditional media using topic models. In: Clough, P., et al. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 338–349. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20161-5_34
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Gazpromneft STC, 75-79 Moika River emb., 190000, Saint-Petersburg, Russia
Fedor Krasnov
NEICON, b.4/5 Letnikovskaia st., 115114, Moscow, Russia
Alexander Dimentov & Mikhail Shvartsman

Authors

Fedor Krasnov
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Dimentov
View author publications
You can also search for this author in PubMed Google Scholar
Mikhail Shvartsman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fedor Krasnov .

Editor information

Editors and Affiliations

Krasovskii Institute of Mathematics and Mechanics, Yekaterinburg, Russia
Dmitry Ustalov
ITMO University, St. Petersburg, Russia
Andrey Filchenkov
Computer Science, University of Helsinki, Helsinki, Finland
Lidia Pivovarova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Krasnov, F., Dimentov, A., Shvartsman, M. (2019). Comparative Analysis of Scientific Papers Collections via Topic Modeling and Co-authorship Networks. In: Ustalov, D., Filchenkov, A., Pivovarova, L. (eds) Artificial Intelligence and Natural Language. AINL 2019. Communications in Computer and Information Science, vol 1119. Springer, Cham. https://doi.org/10.1007/978-3-030-34518-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-34518-1_6
Published: 13 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34517-4
Online ISBN: 978-3-030-34518-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics