Detection of Sociolinguistic Features in Digital Social Networks for the Detection of Communities

Puertas, Edwin; Moreno-Sandoval, Luis Gabriel; Redondo, Javier; Alvarado-Valencia, Jorge Andres; Pomares-Quimbaya, Alexandra

doi:10.1007/s12559-021-09818-9

Detection of Sociolinguistic Features in Digital Social Networks for the Detection of Communities

Published: 26 January 2021

Volume 13, pages 518–537, (2021)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Edwin Puertas ORCID: orcid.org/0000-0002-0758-1851^1,2,
Luis Gabriel Moreno-Sandoval²,
Javier Redondo³,
Jorge Andres Alvarado-Valencia² &
…
Alexandra Pomares-Quimbaya²

647 Accesses
8 Citations
3 Altmetric
Explore all metrics

Abstract

The emergence of digital social networks has transformed society, social groups, and institutions in terms of the communication and expression of their opinions. Determining how language variations allow the detection of communities, together with the relevance of specific vocabulary (proposed by the National Council of Accreditation of Colombia (Consejo Nacional de Acreditación - CNA) to determine the quality evaluation parameters for universities in Colombia) in digital assemblages could lead to a better understanding of their dynamics and social foundations, thus resulting in better communication policies and intervention where necessary. The approach presented in this paper intends to determine what are the semantic spaces (sociolinguistic features) shared by social groups in digital social networks. It includes five layers based on Design Science Research, which are integrated with Natural Language Processing techniques (NLP), Computational Linguistics (CL), and Artificial Intelligence (AI). The approach is validated through a case study wherein the semantic values of a series of “Twitter” institutional accounts belonging to Colombian Universities are analyzed in terms of the 12 quality factors established by CNA. In addition, the topics and the sociolect used by different actors in the university communities are also analyzed. The current approach allows determining the sociolinguistic features of social groups in digital social networks. Its application allows detecting the words or concepts to which each actor of a social group (university) gives more importance in terms of vocabulary.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MBLA Social Corpus

Inducing Personalities and Values from Language Use in Social Network Communities

Article 08 September 2017

Detecting sociosemantic communities by applying social network analysis in tweets

Article 09 July 2015

References

Dumbill E. A revolution that will transform how we live, work, and think: An interview with the authors of big data. Big data. 2013;1(2):73–7.
Article Google Scholar
Meyerhoff M. Introducing sociolinguistics. Taylor & Francis Group: Routledge; 2015.
Book Google Scholar
Meyerhoff M. Introducing sociolinguistics. Routledge; 2018.
Scott J. Social network analysis: developments, advances, and prospects. Social network analysis and mining. 2011;1(1):21–6.
Article Google Scholar
Zeinab Kafi, Khalil Motallebzadeh. An introduction to sociolinguistics. International Journal of Society, Culture & Language. 2016;4(2):134–40.
Google Scholar
Bryden J, Funk S, Jansen VA. Word usage mirrors community structure in the online social network twitter. EPJ Data Science, 2013;2(1):3.
Ríos SA, Muñoz R. Dark web portal overlapping community detection based on topic models. In Proceedings of the ACM SIGKDD workshop on intelligence and security informatics. 2012. p. 1–7.
Nguyen D. A Seza Doğruöz, Carolyn P Rosé, and Franciska de Jong. Computational sociolinguistics: A survey Computational linguistics. 2016;42(3):537–93.
Article Google Scholar
Reynolds WN, Salter WJ, Farber RM, Corley C, Dowling CP, Beeman WO, et al. Sociolect-based community detection. In 2013 IEEE International Conference on Intelligence and Security Informatics. 2013. p. 221–226, IEEE.
Mansouri F, Abdelalim S, Ikram EA. A modeling framework for the moroccan sociolect recognition used on the social media. In Proceedings of the 2nd international Conference on Big Data, Cloud and Applications. ACM. 2017. p. 34.
Gibson KR. Tool use, language and social behavior in relationship to information processing capacities. Tools, language and cognition in human evolution. 1993. p. 251-269.
K Adnan, R Akbar. An analytical study of information extraction from unstructured and multidimensional big data. Journal of Big Data. 2019;6(1):91.
Article Google Scholar
Louwerse MM. Semantic variation in idiolect and sociolect: Corpus linguistic evidence from literary texts. Computers and the Humanities. 2004;38(2):207–21.
Article Google Scholar
Paradis RD, Davenport D, Menaker D, Taylor SM. Detection of groups in non-structured data. Procedia Computer Science. 2012;12:412–7.
Article Google Scholar
A Hussain, E Cambria. Semi-supervised learning for big social data analysis. Neurocomputing. 2018;275:1662–733.
Article Google Scholar
Li L, Wu L, Evans JA. Social centralization and semantic collapse: Hyperbolic embeddings of networks and text. CoRR, abs/2001.09493, 2020.
Balaanand M, Karthikeyan N, Karthik S, Varatharajan R, Manogaran G, Sivaparthipan C. An enhanced graph-based semi-supervised learning algorithm to detect fake users on twitter. The Journal of Supercomputing. 2019;75(9):6085–105.
Article Google Scholar
Cavallari S, Cambria E, Cai H, Chang KC, Zheng VW. Embedding both finite and infinite communities on graphs [application notes]. IEEE Computational Intelligence Magazine. 2019;14(3):39–50.
Article Google Scholar
H Fani, E Jiang, E Bagheri, F Al-Obeidat, W Du, M Kargar. User community detection via embedding of social network structure and temporal content. Information Processing & Management. 2020;57(2):102056.
Article Google Scholar
Park C, Han J, Yu H. Deep multiplex graph infomax: Attentive multiplex network embedding using global information. Knowledge-Based Systems. 2020. p.105861.
Liu P, Zhang L, Gulla JA. Real-time social recommendation based on graph embedding and temporal context. International Journal of Human-Computer Studies. 2019;121:58–72.
Article Google Scholar
Tkachenko N, Guo W. Conflict detection in linguistically diverse on-line social networks: A russia-ukraine case study. In Proceedings of the 11th International Conference on Management of Digital EcoSystems, MEDES ’19. Association for Computing Machinery. New York, NY, USA. 2019. p. 23-28.
E Cambria. Affective computing and sentiment analysis. IEEE intelligent systems. 2016;31(2):102–7.
Google Scholar
Poria S, Chaturvedi I, Cambria E, Bisio F. Sentic lda: Improving on lda with semantic similarity for aspect-based sentiment analysis. In 2016 international joint conference on neural networks (IJCNN). 2016. p. 4465–4473, IEEE.
Hevner A, Chatterjee S. Design research in information systems: theory and practice. Springer Science & Business Media. 2010;2.
González RA, Pomares A. La investigación científica basada en el diseño como eje de proyectos de investigación en ingeniería. Reunión Nacional ACOFI. 2012. p. 12–14.
Kietzmann JH, Hermkens K, McCarthy IP, Silvestre BS. Social media? get serious! understanding the functional building blocks of social media. Business horizons. 2011;54(3):241–51.
Article Google Scholar
Española RA. Banco de datos (CREA). Corpus de referencia del español actual. 2015. p. 2011–10.
Spitkovsky VI, Alshawi H, Chang AX, Jurafsky D. Unsupervised dependency parsing without gold part-of-speech tags. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. Edinburgh, Scotland, UK. 2011. p. 1281–1290.
Khurshid A, Gillam L, Tostevin L. University of surrey participation in trec8: Weirdness indexing for logical document extrapolation and retrieval (wilder). In The Eighth Text REtrieval Conference (TREC-8). Gaithersburg, Maryland. 1999. p. 1–8.
Joseph K, Carley KM, Hong JI. Check-ins in blau space applying blau macrosociological theory to foursquare check-ins from new york city. ACM Transactions on Intelligent Systems and Technology (TIST). 2014;5(3):1–22.
Article Google Scholar
Park Y, Alam MH, Ryu WJ, and Sangkeun Lee. Bl-lda: Bringing bigram to supervised topic model. In 2015 International Conference on Computational Science and Computational Intelligence (CSCI). 2015. p. 83–88, IEEE.
Camacho D, Panizo-LLedot A, Bello-Orgaz G, Gonzalez-Pardo A, Cambria E. The four dimensions of social network analysis: An overview of research methods, applications, and software tools. Information Fusion. 2020;63:88–120.
Article Google Scholar
Varelo AR. Hacia un modelo de aseguramiento de la calidad en la educación superior en colombia: estándares básicos y acreditación de excelencia. Educación superior, calidad y acreditación. CNA., 2003.
Beeferman D, Berger A, Lafferty J. Statistical models for text segmentation. Machine learning. 1999;34(1–3):177–21010.
Article Google Scholar
Damani OP, Ghonge S. Appropriately incorporating statistical significance in pmi. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. 2013. p. 163–169.
Arora S, Li Y, Liang Y, Ma T, Risteski A. A latent variable model approach to pmi-based word embeddings. Transactions of the Association for Computational Linguistics. 2016;4:385–99.
Article Google Scholar
Ahmad K, Gillman L, Tostevin L. Weirdness indexing for logical document extrapolation and retrieval. In Proceedings of the Eighth Text Retrieval Conference (TREC-8). 2000. p. 1–8.

Download references

Acknowledgements

We would like to thank the Center for Excellence and Appropriation in Big Data and Data Analytics (CAOBA), Pontificia Universidad Javeriana, and the Ministry of Information Technologies and Telecommunications of the Republic of Colombia (MinTIC). The models and results presented in this challenge contributed to the building of the research capabilities of CAOBA. Also, the author Edwin Puertas gives thanks to the Universidad Tecnológica de Bolívar.

Author information

Authors and Affiliations

Faculty of Engineering, Department of Engineering, Universidad Tecnologica de Bolivar, Cartagena, Colombia
Edwin Puertas
Faculty of Engineering, Engineering School, Pontificia Universidad Javeriana, Carrera 7 No. 40-62, Bogota, Colombia
Edwin Puertas, Luis Gabriel Moreno-Sandoval, Jorge Andres Alvarado-Valencia & Alexandra Pomares-Quimbaya
Department of Communication and Language, Pontificia Universidad Javeriana, Cartagena, Colombia
Javier Redondo

Authors

Edwin Puertas
View author publications
You can also search for this author in PubMed Google Scholar
Luis Gabriel Moreno-Sandoval
View author publications
You can also search for this author in PubMed Google Scholar
Javier Redondo
View author publications
You can also search for this author in PubMed Google Scholar
Jorge Andres Alvarado-Valencia
View author publications
You can also search for this author in PubMed Google Scholar
Alexandra Pomares-Quimbaya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Edwin Puertas.

Ethics declarations

Conflicts of Interest

The authors declare that they have no conflict of interest.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed Consent

Informed consent was obtained from all individual participants included in the study.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Puertas, E., Moreno-Sandoval, L.G., Redondo, J. et al. Detection of Sociolinguistic Features in Digital Social Networks for the Detection of Communities. Cogn Comput 13, 518–537 (2021). https://doi.org/10.1007/s12559-021-09818-9

Download citation

Received: 13 March 2020
Accepted: 05 January 2021
Published: 26 January 2021
Issue Date: March 2021
DOI: https://doi.org/10.1007/s12559-021-09818-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detection of Sociolinguistic Features in Digital Social Networks for the Detection of Communities

Abstract

Access this article

Similar content being viewed by others

MBLA Social Corpus

Inducing Personalities and Values from Language Use in Social Network Communities

Detecting sociosemantic communities by applying social network analysis in tweets

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of Interest

Ethical Approval

Informed Consent

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Detection of Sociolinguistic Features in Digital Social Networks for the Detection of Communities

Abstract

Access this article

Similar content being viewed by others

MBLA Social Corpus

Inducing Personalities and Values from Language Use in Social Network Communities

Detecting sociosemantic communities by applying social network analysis in tweets

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of Interest

Ethical Approval

Informed Consent

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation