BERT-based Classifiers for Fake News Detection on Short and Long Texts with Noisy Data: A Comparative Analysis

Shushkevich, Elena; Alexandrov, Mikhail; Cardiff, John

doi:10.1007/978-3-031-16270-1_22

Elena Shushkevich¹¹,
Mikhail Alexandrov¹² &
John Cardiff¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13502))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

924 Accesses
1 Citations

Abstract

Free uncontrolled access to the Internet is the main reason for fake news propagation on the Internet both in social media and in regular Internet publications. In this paper we study the potential of several BERT-based models to detect fake news related to politics. Our contribution to the area consists of testing BERT, RoBERTa and MNLI RoBERTa models with (a) short and long texts; (b) ensembling with the best models; (c) noisy texts. To improve ensembling, we introduce an additional class ‘Doubtful news’. To create noisy data we use cross-translation. For the experiments we consider the well-known FRN (Fake vs. Real News, long texts) and LIAR (short texts) datasets. The results we obtained on the long texts dataset are higher than the results we obtained on the short texts dataset. The proposed approach to ensembling provided significant improvement of the results. The experiments with noisy data demonstrated high noise immunity of the BERT model with long news and the RoBERTa model with short news.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://metatext.io/datasets/liar-dataset.
2.
https://www.kaggle.com/c/fake-news.
3.
https://www.uvic.ca/ecs/ece/isot/datasets/fake-news/.
4.
textattack.readthedocs.io/en/latest/.
5.
https://www.mpi-inf.mpg.de/dl-cred-analysis/.
6.
https://www.kaggle.com/mrisdal/fake-news.
7.
https://cims.nyu.edu/~sbowman/multinli/.

References

Hunt, E.: What is fake news? How to spot it and what you can do to stop it. The Guardian (2016). https://www.theguardian.com/media/2016/dec/18/what-is-fake-news-pizzagate
Bandyopadhyay, S., Dutta, S.: Analysis of fake news in social medias for four months during lockdown in COVID-19 (2020). https://doi.org/10.20944/preprints202006.0243.v1
Gravanis, G., Vakali, A., Diamantaras, K., Karadais, P.: Behind the cues: a benchmarking study for fake news detection. Expert Syst. Appl. 128, 201–213 (2019)
Article Google Scholar
Long, Y., Lu, Q., Xiang, R., Li, M., Huang, C.-R.: Fake news detection through multi-perspective speaker profiles. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), vol. 2, pp. 252–256 (2017)
Google Scholar
Kirilin, A., Strube, M.: Exploiting a speaker’s credibility to detect fake news. In: Proceedings of Data Science, Journalism & Media Workshop at KDD (DSJM 2018) (2018)
Google Scholar
Bhattacharjee, S.D., Talukder, A., Balantrapu, B.V.: Active learning based news veracity detection with feature weighting and deep-shallow fusion. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 556–565. IEEE (2017)
Google Scholar
Rashkin, H., Choi, E., Jang, J., Volkova, S., Choi, Y.: Truth of varying shades: analyzing language in fake news and political fact-checking. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2931–2937 (2017)
Google Scholar
Hamdi, T., Slimi, H., Bounhas, I., Slimani, Y.: A hybrid approach for fake news detection in twitter based on user features and graph embedding. In: Hung, D.V., D’Souza, M. (eds.) ICDCIT 2020. LNCS, vol. 11969, pp. 266–280. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-36987-3_17
Chapter Google Scholar
Oshikawa, R., Qian, J., and Wang., W.: A survey on natural language processing for fake news detection. arXiv preprint arXiv:1811.00770 (2018)
Akhtyamova, L.: Named entity recognition in Spanish biomedical literature: short review and BERT model. In: 26th Conference of Open Innovations Association (FRUCT), pp. 1–7. IEEE (2020)
Google Scholar
Adhikari, A., Ram, A., Tang, R., and Lin, J.: Docbert: Bert for document classification. arXiv preprint arXiv:1904.08398 (2019)
Gonzalez-Carvajal S., Garrido-Merch E.: Comparing Bert against traditional machine learning text classification. arXiv preprint arXiv:2005.13012 (2020)
Flores, L.J., Yu, Hao, Y.: An adversarial benchmark for fake news detection models. arXiv:2201.00912v1 (2022)
Ali, H., et al.: All your fake detector are belong to us: evaluating adversarial robustness of fake-news detectors under black-box settings. IEEE Access 9, 81678–81692 (2021)
Article Google Scholar
Yuan, H., et al.: Improving fake news detection with domain-adversarial and graph-attention neural network. Decis. Support Syst. 151, 113633 (2021)
Article Google Scholar
Vosoughi, S., Roy, D., Aral, S.: The spread of true and false news online. Science 359(6380), 1146–1151 (2018)
Article Google Scholar
Giachanou, A., Rosso, P., Crestani, F.: Leveraging emotional signals for credibility detection. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019), Paris, France, 21–25 July (2019)
Google Scholar
Pulido, C.M., Ruiz-Eugenio, L., Redondo-Sama, G., Villarejo-Carballido, B.: A new application of social impact in social media for overcoming fake news in health. Int. J. Environ. Res. Public Health 17, 2430 (2020)
Article Google Scholar
Hovold, J.: Naive Bayes spam filtering using word-position-based attributes. In: CEAS, pp. 41–48 (2005)
Google Scholar
Petrov, A., Proncheva, O.: Identifying the topics of Russian political talk shows. In: Proceedings of the Conference on Modeling and Analysis of Complex Systems and Processes, 22–24 October (MACSPro 2020), pp. 79–86. CEUR-WS.org (2020). online. https://ceur-ws.org/Vol-2795/short1.pdf
Popova, S., Skitalinskaya, G.: Extended list of stop words: does it work for keyphrase extraction from short texts? In: Proceedings of 12th Intern Scientific and Technical Conference on Computing Sciences and Information Technologies (CSIT-2017), pp. 401–404. IEEE (2017)
Google Scholar
Khan, J.Y., Khondaker, M.T.I., Afroz, S., Uddin, G., Iqbal, A.: A benchmark study of machine learning models for online fake news detection. Mach. Learn. Appl. 100032. https://arxiv.org/abs/1905.04749 (2021)
GitHub Repository. https://github.com/joolsa. Accessed 12 Mar 2022
Wang, W.Y.: “Liar, Liar Pants on Fire”: a new benchmark dataset for fake news detection. ACL. https://arxiv.org/abs/1705.00648 (2017)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. https://arxiv.org/abs/1810.04805 (2018)
Liu, Y., et al.: Roberta: a robustly optimized BERT pretraining approach. https://arxiv.org/abs/1907.11692 (2019)
Loshchilov, I., Hutter F.: Fixing weight decay regularization in ADAM. arXiv preprint arXiv:1711.05101 (2017)
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, pp. 8026–8037 (2019)
Google Scholar
Wolf, T. et al.: Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45 (2020)
Google Scholar
Glazkova, A., Glazkov, M., Trifonov, T.: g2tmn at constraint@AAAI2021: exploiting CT-BERT and Ensembling learning for COVID-19 fake news detection. In: Combating Online Hostile Posts in Regional Languages during Emergency Situation, pp. 116–127 (2021)
Google Scholar
Akhtyamova, L., Alexandrov, M., Cardiff, J., Koshulko, O.: Opinion mining on small and noisy samples of health-related texts. In: Shakhovska, N., Medykovskyy, M.O. (eds.) CSIT 2018. AISC, vol. 871, pp. 379–390. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-01069-0_27
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Technological University Dublin, Dublin, Ireland
Elena Shushkevich & John Cardiff
Autonomous University of Barcelona, Barcelona, Spain
Mikhail Alexandrov

Authors

Elena Shushkevich
View author publications
You can also search for this author in PubMed Google Scholar
Mikhail Alexandrov
View author publications
You can also search for this author in PubMed Google Scholar
John Cardiff
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elena Shushkevich .

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Aleš Horák
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Ivan Kopeček
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shushkevich, E., Alexandrov, M., Cardiff, J. (2022). BERT-based Classifiers for Fake News Detection on Short and Long Texts with Noisy Data: A Comparative Analysis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2022. Lecture Notes in Computer Science(), vol 13502. Springer, Cham. https://doi.org/10.1007/978-3-031-16270-1_22

Download citation

DOI: https://doi.org/10.1007/978-3-031-16270-1_22
Published: 16 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16269-5
Online ISBN: 978-3-031-16270-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

BERT-based Classifiers for Fake News Detection on Short and Long Texts with Noisy Data: A Comparative Analysis