Abstract
The structure of scientific collaboration networks provides insight on the relationships between people and disciplines. In this paper, we study a bipartite graph connecting authors to publications and extract from it clusters of authors and articles, interpreting the author clusters as research groups and the article clusters as research topics. Visualisations are proposed to ease the interpretation of such clusters in terms of discovering leaders, the activity level, and other semantic aspects. We discuss the process of obtaining and preprocessing the information from scientific publications, the formulation and implementation of the clustering algorithm, and the creation of the visualisations. Experiments on a test data set are presented, using an initial prototype implementation of the proposed modules.
Similar content being viewed by others
Notes
As the graph is bipartite, necessarily \(\varGamma (v) \subseteq T\) as well as \(\varGamma (w) \subseteq T\).
Available at: http://arnetminer.org.
Available at http://tartarus.org/martin/PorterStemmer/.
In the symmetric mode, once a cluster is computed, the included vertices are no longer available for inclusion in future cluster computations.
The weight of an edge w(v, u) is computed as the multiplicity of that edge; for purposes of the clustering phase, the edges are treated as directed and the weight is normalised by the degree of vertex v, making the directed edge weight asymmetric.
Available at http://www.mathiasbader.de/studium/bioinformatics/.
Available at http://neoformix.com/2008/ClusteredWordClouds.html.
For example, the Wordle tool (http://www.wordle.net).
Available at http://dblp.uni-trier.de/ in XML format.
In a systematic sample, each element is chosen after k steps, where k results from dividing the total number of elements by the desired sample size.
Only iterations where the cluster order was above the threshold were considered.
References
Avalos-Gaytán V, Almendral JA, Papo D, Schaeffer SE, Boccaletti S (2012) Assortative and modular networks are shaped by adaptive synchronization processes. PRE 86(1):015101(R)
Barber MJ (2007) Modularity and community detection in bipartite networks. Phys Rev E 76(6):066102
Batagelj V (2003) Efficient algorithms for citation network analysis. Technical Report. arXiv:cs/0309023
Bian J, Xie M, Hudson TJ, Eswaran H, Brochhausen M, Hanna J, Hogan WR (2014) Collaborationviz: interactive visual exploration of biomedical research collaboration networks. PloS One 9(11):e1119280
Bogárdi-Mészöly Á, Rövid A, Ishikawa H (2013) Topic recommendation from tag clouds. Bull Netw Comp Sys Softw 2(1):25
Brin S, Page L (1998) The anatomy of a large-scale hypertextual Web search engine. Comput Netw ISDN Syst 30(1–7):107–117
Catanzaro M, Caldarelli G, Pietronero L (2004a) Assortative model for social networks. PRE 70(3), Article ID 037101. doi:10.1103/PhysRevE.70.037101
Catanzaro M, Caldarelli G, Pietronero L (2004b) Social network growth with assortative mixing. Phys A 338(1–2):119–124
Clement R, Sharp D (2003) Ngram and Bayesian classification of documents for topic and authorship. Lit Linguist Comput 18(4):423–447
Diestel R (2010) Graph theory, GTM, vol 173, 4th edn. Springer, Berlin
Ding Y, Yan E, Frazho A, Caverlee J (2009) PageRank for ranking authors in co-citation networks. JASIST 60(11):2229–2243
Dorogovtsev S, Mendes J (2002) Evolution of networks: from biological nets to the internet and WWW. Clarendon Press, Oxford
Du N, Wu B, Pei X, Wang B, Xu L (2007) Community detection in large-scale social networks. In: Proceedings of WebKDD and SNA-KDD, ACM, New York, pp 16–25
da Costa LF, Rodrigues F, Travieso G, Boas P (2007) Characterization of complex networks: a survey of measurements. Adv Phys 56(1):167–242
Flake G, Lawrence S, Giles C (2000) Efficient identification of web communities. In: Proceedings of KDD, ACM New York, pp 150–160
Fortunato S (2010) Community detection in graphs. Phys Rep 486:75–174
Fruchterman T, Reingold E (1991) Graph drawing by force-directed placement. Softw Pract Exp 21(11):1129–1164
Gleiser PM, Danon L (2003) Community structure in jazz. Adv Complex Syst 6(4):563–573
Huang J, Zhuang Z, Li J, Giles CL (2008) Collaboration over time: characterizing and modeling network evolution. In: Proceedings of WSDM, ACM, New York, pp 107–116
Jeong H, Néda Z, Barabási A (2003) Measuring preferential attachment in evolving networks. Europhys Lett 61:567–572. doi:10.1209/epl/i2003-00166-9
Kirkpatrick S, Gelatt CD Jr, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680
Larremore DB, Clauset A, Jacobs AZ (2014) Efficiently inferring community structure in bipartite networks. arXiv:1403.2933
Li M, Fan Y, Chen J, Gao L, Di Z, Wu J (2005) Weighted networks of scientific communication: the measurement and topological role of weight. Phys A 350(2–4):643–656
Liu J, Li Y, Ruan Z, Fu G, Chen X, Sadiq R, Deng Y (2015) A new method to construct co-author networks. Phys A Stat Mech Its Appl 419:29–39
Liu X, Murata T (2009) Community detection in large-scale bipartite networks. In: IEEE/WIC/ACM international joint conferences on web intelligence and intelligent agent technologies, 2009. WI-IAT’09. IET, vol 1, pp 50–57
Liu X, Bollen J, Nelson M, Van de Sompel H (2005) Co-authorship networks in the digital library research community. Inf Process Manag 41(6):1462–1480
Ma T, Rong H, Ying C, Tian Y, Al-Dhelaan A, Al-Rodhaan M (2015) Detect structural-connected communities based on bschef in c-dblp. Concurr Comput Pract Exp. doi:10.1002/cpe.3437
Milgram S (1967) The small world problem. Psych Today 2:60–67
Moody J (2004) The structure of a social science collaboration network: disciplinary cohesion from 1963 to 1999. Am Sociol Rev 69(2):213–238
Newman M (2001a) Clustering and preferential attachment in growing networks. PRE 64(2) Article ID 025102(R). doi:10.1103/PhysRevE.64.025102
Newman M (2001b) Scientific collaboration networks. I. Network construction and fundamental results. PRE 64:016131. doi:10.1103/PhysRevE.64.016131
Newman M (2001c) Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. PRE 64, Article ID 016132. doi:10.1103/PhysRevE.64.016132
Newman M (2001d) The structure of scientific collaboration networks. PNAS 98(2):404–409. doi:10.1073/pnas.98.2.404
Newman M (2002) Assortative mixing in networks. PRL 89 Article ID 208701. doi:10.1103/PhysRevLett.89.208701
Newman M (2004a) Coauthorship networks and patterns of scientific collaboration. PNAS 101(Suppl. 1):5200–5205. doi:10.1073/pnas.0307545100
Newman M (2004b) Who is the best connected scientist? A study of scientific coauthorship networks. Complex Netw 650:337–370
Newman M (2006) Modularity and community structure in networks. PNAS 103(23):8577–8582. doi:10.1073/pnas.0601602103
Newman M (2010) Networks: an introduction. Oxford University Press, Oxford
Papadopoulos S, Kompatsiaris Y, Vakali A, Spyridonos P (2012) Community detection in social media. Data Min Knowl Discov 24(3):515–554
Perianes-Rodríguez A, Olmeda-Gmez C, Moya-Anegn F (2010) Detecting, identifying and visualizing research groups in co-authorship networks. Scientometrics 82(2):307–319
Porter M (1980) An algorithm for suffix stripping. Program 14(3):130–137
Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D (2004) Defining and identifying communities in networks. PNAS 101(9):2658–2663
Ramasco J, Dorogovtsev S, Pastor-Satorras R (2004) Self-organization of collaboration networks. PRE 70(3):036106
Schaeffer S (2007) Graph clustering. CoSRev 1(1):27–64
Schaeffer SE (2005) Stochastic local clustering for massive graphs. In: Ho TB, Cheung D, Liu H (eds) Advances in knowledge discovery and data mining. Proceedings of the 9th Pacific-Asia conference, PAKDD 2005, Hanoi, Vietnam, May 18–20, 2005. Lecture notes in computerscience, vol 3518. Springer, Berlin, pp 354–360. doi:10.1007/11430919_42
Sozio M, Gionis A (2010) The community-search problem and how to plan a successful cocktail party. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 939–948
Stamatatos E (2009) A survey of modern authorship attribution methods. J Am Soc Inf Sci Technol 60(3):538–556
Tang J, Zhang J, Yao L, Li J, Zhang L, Su Z (2008) Arnetminer: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 990–998
Tran DH, Takeda H, Kurakawa K, Tran MT (2012) Combining topic model and co-author network for KAKEN and DBLP linking. In: Intelligent information and database systems, lecture notes in computer science, vol 7198, Springer, pp 396–404
Yang T, Jun R, Chi Y, Zhu S (2009) Combining link and content for community detection: a discriminative approach. In: Proceedings of KDD, ACM, New York, pp 927–936
Ye Q, Wu B, Wang B (2008) Visual analysis of a co-authorship network and its underlying structure. In: Fifth international conference on fuzzy systems and knowledge discovery, 2008. FSKD ’08., vol 4, pp 689–693. doi:10.1109/FSKD.2008.436
Zhou S, Cox I, Hansen LK (2009) Second-order assortative mixing in social networks. Technical Report. arXiv:0903.0687
Acknowledgments
The first author was supported by SEP-PROMEP Grant No. 103.5/12/7884. We thank the anonymous reviewers for their useful suggestions that helped improve the manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Villarreal, S.E.G., Schaeffer, S.E. Local bilateral clustering for identifying research topics and groups from bibliographical data. Knowl Inf Syst 48, 179–199 (2016). https://doi.org/10.1007/s10115-015-0867-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-015-0867-y