Abstract
Accurate identification of hidden demographic attributes from social media is very useful for advertisement, personalized recommendation and etc. We investigate the effect of two different classification models for the gender identification problem over different attributes of Sina Weibo users. To improve the accuracy of the classfication models, we propose a novel feature selection algorithm and a retrained multiattribute model. Experimental results show that the accuracy of our approach achieves 89.01% which is better than any previous work in this problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Burger, J.D., Henderson, J., Kim, G., Zarrella, G.: Discriminating gender on twitter. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1301–1309. Association for Computational Linguistics (2011)
Che, W., Li, Z., Liu, T.: Ltp: A Chinese language technology platform. In: Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations, pp. 13–16. Association for Computational Linguistics (2010)
Herring, S.C., Paolillo, J.C.: Gender and genre variation in weblogs. Journal of Sociolinguistics 10(4), 439–459 (2006)
Mukherjee, A., Liu, B.: Improving gender classification of blog authors. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 207–217. Association for Computational Linguistics (2010)
Nguyen, D., Smith, N.A., Rosé, C.P.: Author age prediction from text using linear regression. In: Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 115–123. Association for Computational Linguistics (2011)
Peersman, C., Daelemans, W., Van Vaerenbergh, L.: Predicting age and gender in online social networks. In: Proceedings of the 3rd International Workshop on Search and Mining User-generated Contents, pp. 37–44. ACM (2011)
Rao, D., Yarowsky, D., Shreevats, A., Gupta, M.: Classifying latent user attributes in twitter. In: Proceedings of the 2nd International Workshop on Search and Mining User-generated Contents, pp. 37–44. ACM (2010)
Rosenthal, S., McKeown, K.: Age prediction in blogs: A study of style, content, and online behavior in pre-and post-social media generations. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 763–772. Association for Computational Linguistics (2011)
Salton, G., McGill, M.J.: Introduction to modern information retrieval (1983)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sun, X., Ding, X., Liu, T. (2014). Gender Identification on Social Media. In: Huang, H., Liu, T., Zhang, HP., Tang, J. (eds) Social Media Processing. SMP 2014. Communications in Computer and Information Science, vol 489. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45558-6_9
Download citation
DOI: https://doi.org/10.1007/978-3-662-45558-6_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45557-9
Online ISBN: 978-3-662-45558-6
eBook Packages: Computer ScienceComputer Science (R0)