Skip to main content

A Model for Age and Gender Profiling of Social Media Accounts Based on Post Contents

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2018)

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Included in the following conference series:

Abstract

The growth of social networking platforms such as Facebook and Twitter has bridged communication channels between people to share their thoughts and sentiments. However, along with the rapid growth and rise of the Internet, the idea of anonymity has also been introduced wherein user identities are easily falsified and hidden. Hence, presenting difficulty for businesses to give accurate advertisements to specific account demographics. As such, this study searched for the best model to identify gender and age group of Filipino social media accounts through analyzing post contents. Two model structures for the classifier namely, the stacked/combined structure and the parallel structure were experimented on. Different types of features including those based on socio-linguistics, grammar, characters and words were considered. The results show that different model structures, features, feature reduction and classification algorithms apply to age classification and gender classification. For Facebook and Twitter, the best model for classifying age was Support Vector Classifier (SVC) with least absolute shrinkage and selection operator (Lasso) on a parallel model structure for Facebook, while a combined model structure is best for Twitter. For gender classification, the best model for Facebook used Ridge Classifier (RC), while the best model for Twitter used SVC, both utilizing Lasso on a parallel model structure. The features that were dominant in age classification for both Facebook and Twitter were word-based, socio-linguistic features and post time, while socio-linguistic features, specifically netspeak, were important in gender classification for both platforms. Based on the differences of the features affecting the performance of the models, Facebook and Twitter data must be analyzed separately as the posts found in these two platforms differ significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. AMEX iSUPPORT: 10 eye-opening facts about social media in PH (2016). http://isupportworldwide.com/blog/archive/socialmediaphilippines/. Accessed 14 Feb 2017

  2. Burger, J.D., Henderson, J.C.: An exploration of observable features related to blogger age. In: Spring Symposium: Computational Approaches to Analyzing Weblogs, pp. 15–20. AAAI (2006)

    Google Scholar 

  3. Chaffey, D.: Global social media research summary 2016 (2016). https://www.smartinsights.com/social-media-marketing/social-media-strategy/new-global-social-media-research/. Accessed 04 Sept 2017

  4. Cheng, N., Chandramouli, R., Subbalakshmi, K.P.: Author gender identifcation from text. Digit. Investig. 8(1), 78–88 (2011)

    Article  Google Scholar 

  5. Choi, J.Y., Lim, G.G., Woo, M.N.: A study on the anonymity perceptions impacting on posting malicious messages in online communities. In: Proceedings of PACIS 2016 (2016)

    Google Scholar 

  6. Corney, M., Anderson, A., de Vel, O., Mohay, G.: Gender-preferential text mining of e-mail discourse. In: 18th Annual Computer Security Applications Conference on Proceedings, Las Vegas, pp. 282–289 (2002)

    Google Scholar 

  7. Hernandez, D., Guzman-Cabrera, R., Reyes, A., Rocha, M.: Semantic-based features for author profiling identification. In: Working Notes for CLEF 2013 Conference, Valencia (2013)

    Google Scholar 

  8. Huffaker, D.A., Calvert, S.L.: Gender, identity, and language use in teenage blogs. J. Comput. Mediat. Commun. 10(2), 00–00 (2005)

    Article  Google Scholar 

  9. Marquardt, J., Farnadi, G., Vasudevan, G., Moens, M.F., Davalos, S., Teredesai, A., De Cock, M.: Age and gender identification in social media. In: Proceedings of CLEF 2014 Evaluation Labs, pp. 1129–1136 (2014)

    Google Scholar 

  10. Mechti, S., Jaoua, M., Belguith, L.H., Faiz, R.: Author profiling using style-based features. In: Notebook for PAN at CLEF 2013 (2013)

    Google Scholar 

  11. Mukherjee, A., Liu, B.: Improving gender classification of blog authors. In: Proceedings of the 2010 conference on Empirical Methods in natural Language Processing, pp. 207–217. Association for Computational Linguistics (2010)

    Google Scholar 

  12. Newman, M.L., Groom, C.J., Handelman, L.D., Pennebaker, J.W.: Gender differences in language use: an analysis of 14,000 text samples. Discourse Process. 45(3), 211–236 (2008)

    Article  Google Scholar 

  13. Nguyen, D., Smith, N.A., Rose, C.P.: Author age prediction from text using linear regression. In: Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 115–123. Association for Computational Linguistics (2011)

    Google Scholar 

  14. Patra, B.G., Banerjee, S., Das, D., Saikh, T., Bandyopadhyay, S.: Automatic author profiling based on linguistic and stylistic features. In: Notebook for PAN at CLEF (2013)

    Google Scholar 

  15. Pennebaker, J., Booth, R., Boyd, R., Francis, M.: Linguistic Inquiry and Word Count: LIWC2015. Pennebaker Conglomerates, Austin (2015)

    Google Scholar 

  16. Rangel, F., Rosso, P.: Use of language and author profiling: identification of gender and age. In: Natural Language Processing and Cognitive Science, vol. 177 (2013)

    Google Scholar 

  17. Rao, D., Yarowsky, D., Shreevats, A., Gupta, M.: Classifying latent user attributes in Twitter. In: Proceedings of the 2nd International Workshop on Search and Mining User-Generated Contents, pp. 37–44. ACM (2010)

    Google Scholar 

  18. Sap, M., Park, G., Eichstaedt, J.C., Kern, M.L., Stillwell, D., Kosinski, M., Ungar, L.H., Schwartz, H.A.: Developing age and gender predictive lexica over social media (2014)

    Google Scholar 

  19. Schwartz, H.A., Eichstaedt, J.C., Kern, M.L., Dziurzynski, L., Ramones, S.M., Agrawal, M., Shah, A., Kosinski, M., Stillwell, D., Seligman, M.E., Ungar, L.H.: Personality, gender, and age in the language of social media: the open-vocabulary approach. PLoS ONE 8(9), e73791 (2013)

    Article  Google Scholar 

  20. Stenzler, M.A.: How marketers are using social media to grow their businesses (2016). http://www.socialmediaexaminer.com/wp-content/uploads/2016/05/SocialMediaMarketingIndustryReport2016.pdf. Accessed 24 Aug 2017

  21. Stillwell, D., Kosinski, M.: (2016) Mypersonality project, http://mypersonality.org/wiki/doku.php. Accessed 21 Feb 2017

  22. Understanding Analytics: Understanding the importance of demographics in marketing (2015). http://upfrontanalytics.com/understanding-the-importance-of-demographics-in-marketing/. Accessed 02 Sept 2017

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan Kristoffer Cheng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cheng, J.K., Fernandez, A., Quindoza, R.G.M., Tan, S., Cheng, C. (2018). A Model for Age and Gender Profiling of Social Media Accounts Based on Post Contents. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11302. Springer, Cham. https://doi.org/10.1007/978-3-030-04179-3_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-04179-3_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-04178-6

  • Online ISBN: 978-3-030-04179-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics