Fine-grained Multi-label Sexism Classification Using Semi-supervised Learning

Abburi, Harika; Parikh, Pulkit; Chhaya, Niyati; Varma, Vasudeva

doi:10.1007/978-3-030-62008-0_37

Harika Abburi¹³,
Pulkit Parikh¹³,
Niyati Chhaya¹⁴ &
…
Vasudeva Varma¹³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12343))

Included in the following conference series:

International Conference on Web Information Systems Engineering

1164 Accesses

Abstract

Sexism, a pervasive form of oppression, causes profound suffering through various manifestations. Given the rising number of experiences of sexism reported online, categorizing these recollections automatically can aid the fight against sexism, as it can facilitate effective analyses by gender studies researchers and government officials involved in policy making. In this paper, we explore the fine-grained, multi-label classification of accounts (reports) of sexism. To the best of our knowledge, we consider substantially more categories of sexism than any related prior work through our 23-class problem formulation. Moreover, we present the first semi-supervised work for the multi-label classification of accounts describing any type(s) of sexism wherein the approach goes beyond merely fine-tuning pre-trained models using unlabeled data. We devise self-training based techniques tailor-made for the multi-label nature of the problem to utilize unlabeled samples for augmenting the labeled set. We identify high textual diversity with respect to the existing labeled set as a desirable quality for candidate unlabeled instances and develop methods for incorporating it into our approach. We also explore ways of infusing class imbalance alleviation for multi-label classification into our semi-supervised learning, independently and in conjunction with the method involving diversity. Several proposed methods outperform a variety of baselines on a recently released dataset for multi-label sexism categorization across several standard metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Abney, S.: Semisupervised Learning for Computational Linguistics. Chapman and Hall/CRC, Boca Raton (2007)
Book Google Scholar
Agrawal, S., Awekar, A.: Deep learning for detecting cyberbullying across multiple social media platforms. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 141–153. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76941-7_11
Chapter Google Scholar
Anzovino, M., Fersini, E., Rosso, P.: Automatic identification and classification of misogynistic language on twitter. In: Silberztein, M., Atigui, F., Kornyshova, E., Métais, E., Meziane, F. (eds.) NLDB 2018. LNCS, vol. 10859, pp. 57–64. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91947-8_6
Chapter Google Scholar
Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for hate speech detection in tweets. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 759–760. International World Wide Web Conferences Steering Committee (2017)
Google Scholar
Cer, D., et al.: Universal sentence encoder (2018). arXiv preprint arXiv:1803.11175
Chowdhury, A.G., Sawhney, R., Shah, R., Mahata, D.: #YouToo? Detection of personal recollections of sexual harassment on social media. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2527–2537 (2019)
Google Scholar
Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Eleventh International AAAI Conference on Web and Social Media (2017)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding (2018). arXiv preprint arXiv:1810.04805
ElSherief, M., Belding, E., Nguyen, D.: #NotOkay: understanding gender-based violence in social media. In: Eleventh International AAAI Conference on Web and Social Media (2017)
Google Scholar
Frenda, S., Ghanem, B., Montes-y Gómez, M., Rosso, P.: Online hate speech against women: automatic identification of misogyny and sexism on twitter. J. Intell. Fuzzy Syst. 36(5), 4743–4752 (2019)
Article Google Scholar
Jafarpour, B., Matwin, S., et al.: Boosting text classification performance on sexist tweets by text augmentation and text generation using a combination of knowledge graphs. In: Proceedings of the 2nd Workshop on Abusive Language Online (ALW2). pp. 107–114 (2018)
Google Scholar
Jha, A., Mamidi, R.: When does a compliment become sexist? Analysis and classification of ambivalent sexism using twitter data. In: Proceedings of the Second Workshop on NLP and Computational Social Science, pp. 7–16 (2017)
Google Scholar
Karlekar, S., Bansal, M.: Safecity: understanding diverse forms of sexual harassment personal stories. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2805–2811 (2018)
Google Scholar
Khatua, A., Cambria, E., Khatua, A.: Sounds of silence breakers: exploring sexual violence on twitter. In: 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 397–400 (2018)
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751 (2014)
Google Scholar
Melville, S., Eccles, K., Yasseri, T.: Topic modelling of everyday sexism project entries. Front. Dig. Hum. 5, 28 (2018)
Article Google Scholar
Parikh, P., et al.: Multi-label categorization of accounts of sexism using a neural framework. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 1642–1652 (2019)
Google Scholar
Pennebaker, J.W., Boyd, R.L., Jordan, K., Blackburn, K.: The development and psychometric properties of LIWC2015. Technical report (2015)
Google Scholar
Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Peters, M.E., et al.: Deep contextualized word representations. In: Proceedings of NAACL (2018)
Google Scholar
Van Hee, C., et al.: Detection and fine-grained classification of cyberbullying events. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, pp. 672–680 (2015)
Google Scholar
Wang, J., Yu, L.C., Lai, K.R., Zhang, X.: Dimensional sentiment analysis using a regional CNN-LSTM model. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), vol. 2, pp. 225–230 (2016)
Google Scholar
Waseem, Z., Hovy, D.: Hateful symbols or hateful people? Predictive features for hate speech detection on twitter. In: Proceedings of the NAACL Student Research Workshop, pp. 88–93 (2016)
Google Scholar
Xiao, H.: bert-as-service (2018). https://github.com/hanxiao/bert-as-service
Yan, P., Li, L., Chen, W., Zeng, D.: Quantum-inspired density matrix encoder for sexual harassment personal stories classification. In: 2019 IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 218–220. IEEE (2019)
Google Scholar
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)
Google Scholar
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: 33rd Annual Meeting of the Association for Computational Linguistics, pp. 189–196 (1995)
Google Scholar
Zhang, M.L., Zhou, Z.H.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2014)
Article Google Scholar
Zhang, Z., Luo, L.: Hate speech detection: a solved problem? The challenging case of long tail on twitter. In: Semantic Web, pp. 1–21 (2018)
Google Scholar
Zhou, C., Sun, C., Liu, Z., Lau, F.: A C-LSTM neural network for text classification (2015). arXiv preprint arXiv:1511.08630

Download references

Author information

Authors and Affiliations

IIIT-Hyderabad, Hyderabad, India
Harika Abburi, Pulkit Parikh & Vasudeva Varma
Adobe Research, Bangalore, India
Niyati Chhaya

Authors

Harika Abburi
View author publications
You can also search for this author in PubMed Google Scholar
Pulkit Parikh
View author publications
You can also search for this author in PubMed Google Scholar
Niyati Chhaya
View author publications
You can also search for this author in PubMed Google Scholar
Vasudeva Varma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Harika Abburi .

Editor information

Editors and Affiliations

VU Amsterdam, Amsterdam, The Netherlands
Zhisheng Huang
VU Amsterdam, Amsterdam, The Netherlands
Wouter Beek
Victoria University, Melbourne, VIC, Australia
Hua Wang
Swinburne University of Technology, Hawthorn, VIC, Australia
Rui Zhou
Victoria University, Melbourne, VIC, Australia
Yanchun Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abburi, H., Parikh, P., Chhaya, N., Varma, V. (2020). Fine-grained Multi-label Sexism Classification Using Semi-supervised Learning. In: Huang, Z., Beek, W., Wang, H., Zhou, R., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2020. WISE 2020. Lecture Notes in Computer Science(), vol 12343. Springer, Cham. https://doi.org/10.1007/978-3-030-62008-0_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-62008-0_37
Published: 21 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62007-3
Online ISBN: 978-3-030-62008-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics