Skip to main content

BERT-based Classifiers for Fake News Detection on Short and Long Texts with Noisy Data: A Comparative Analysis

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13502))

Included in the following conference series:

Abstract

Free uncontrolled access to the Internet is the main reason for fake news propagation on the Internet both in social media and in regular Internet publications. In this paper we study the potential of several BERT-based models to detect fake news related to politics. Our contribution to the area consists of testing BERT, RoBERTa and MNLI RoBERTa models with (a) short and long texts; (b) ensembling with the best models; (c) noisy texts. To improve ensembling, we introduce an additional class ‘Doubtful news’. To create noisy data we use cross-translation. For the experiments we consider the well-known FRN (Fake vs. Real News, long texts) and LIAR (short texts) datasets. The results we obtained on the long texts dataset are higher than the results we obtained on the short texts dataset. The proposed approach to ensembling provided significant improvement of the results. The experiments with noisy data demonstrated high noise immunity of the BERT model with long news and the RoBERTa model with short news.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://metatext.io/datasets/liar-dataset.

  2. 2.

    https://www.kaggle.com/c/fake-news.

  3. 3.

    https://www.uvic.ca/ecs/ece/isot/datasets/fake-news/.

  4. 4.

    textattack.readthedocs.io/en/latest/.

  5. 5.

    https://www.mpi-inf.mpg.de/dl-cred-analysis/.

  6. 6.

    https://www.kaggle.com/mrisdal/fake-news.

  7. 7.

    https://cims.nyu.edu/~sbowman/multinli/.

References

  1. Hunt, E.: What is fake news? How to spot it and what you can do to stop it. The Guardian (2016). https://www.theguardian.com/media/2016/dec/18/what-is-fake-news-pizzagate

  2. Bandyopadhyay, S., Dutta, S.: Analysis of fake news in social medias for four months during lockdown in COVID-19 (2020). https://doi.org/10.20944/preprints202006.0243.v1

  3. Gravanis, G., Vakali, A., Diamantaras, K., Karadais, P.: Behind the cues: a benchmarking study for fake news detection. Expert Syst. Appl. 128, 201–213 (2019)

    Article  Google Scholar 

  4. Long, Y., Lu, Q., Xiang, R., Li, M., Huang, C.-R.: Fake news detection through multi-perspective speaker profiles. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), vol. 2, pp. 252–256 (2017)

    Google Scholar 

  5. Kirilin, A., Strube, M.: Exploiting a speaker’s credibility to detect fake news. In: Proceedings of Data Science, Journalism & Media Workshop at KDD (DSJM 2018) (2018)

    Google Scholar 

  6. Bhattacharjee, S.D., Talukder, A., Balantrapu, B.V.: Active learning based news veracity detection with feature weighting and deep-shallow fusion. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 556–565. IEEE (2017)

    Google Scholar 

  7. Rashkin, H., Choi, E., Jang, J., Volkova, S., Choi, Y.: Truth of varying shades: analyzing language in fake news and political fact-checking. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2931–2937 (2017)

    Google Scholar 

  8. Hamdi, T., Slimi, H., Bounhas, I., Slimani, Y.: A hybrid approach for fake news detection in twitter based on user features and graph embedding. In: Hung, D.V., D’Souza, M. (eds.) ICDCIT 2020. LNCS, vol. 11969, pp. 266–280. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-36987-3_17

    Chapter  Google Scholar 

  9. Oshikawa, R., Qian, J., and Wang., W.: A survey on natural language processing for fake news detection. arXiv preprint arXiv:1811.00770 (2018)

  10. Akhtyamova, L.: Named entity recognition in Spanish biomedical literature: short review and BERT model. In: 26th Conference of Open Innovations Association (FRUCT), pp. 1–7. IEEE (2020)

    Google Scholar 

  11. Adhikari, A., Ram, A., Tang, R., and Lin, J.: Docbert: Bert for document classification. arXiv preprint arXiv:1904.08398 (2019)

  12. Gonzalez-Carvajal S., Garrido-Merch E.: Comparing Bert against traditional machine learning text classification. arXiv preprint arXiv:2005.13012 (2020)

  13. Flores, L.J., Yu, Hao, Y.: An adversarial benchmark for fake news detection models. arXiv:2201.00912v1 (2022)

  14. Ali, H., et al.: All your fake detector are belong to us: evaluating adversarial robustness of fake-news detectors under black-box settings. IEEE Access 9, 81678–81692 (2021)

    Article  Google Scholar 

  15. Yuan, H., et al.: Improving fake news detection with domain-adversarial and graph-attention neural network. Decis. Support Syst. 151, 113633 (2021)

    Article  Google Scholar 

  16. Vosoughi, S., Roy, D., Aral, S.: The spread of true and false news online. Science 359(6380), 1146–1151 (2018)

    Article  Google Scholar 

  17. Giachanou, A., Rosso, P., Crestani, F.: Leveraging emotional signals for credibility detection. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019), Paris, France, 21–25 July (2019)

    Google Scholar 

  18. Pulido, C.M., Ruiz-Eugenio, L., Redondo-Sama, G., Villarejo-Carballido, B.: A new application of social impact in social media for overcoming fake news in health. Int. J. Environ. Res. Public Health 17, 2430 (2020)

    Article  Google Scholar 

  19. Hovold, J.: Naive Bayes spam filtering using word-position-based attributes. In: CEAS, pp. 41–48 (2005)

    Google Scholar 

  20. Petrov, A., Proncheva, O.: Identifying the topics of Russian political talk shows. In: Proceedings of the Conference on Modeling and Analysis of Complex Systems and Processes, 22–24 October (MACSPro 2020), pp. 79–86. CEUR-WS.org (2020). online. https://ceur-ws.org/Vol-2795/short1.pdf

  21. Popova, S., Skitalinskaya, G.: Extended list of stop words: does it work for keyphrase extraction from short texts? In: Proceedings of 12th Intern Scientific and Technical Conference on Computing Sciences and Information Technologies (CSIT-2017), pp. 401–404. IEEE (2017)

    Google Scholar 

  22. Khan, J.Y., Khondaker, M.T.I., Afroz, S., Uddin, G., Iqbal, A.: A benchmark study of machine learning models for online fake news detection. Mach. Learn. Appl. 100032. https://arxiv.org/abs/1905.04749 (2021)

  23. GitHub Repository. https://github.com/joolsa. Accessed 12 Mar 2022

  24. Wang, W.Y.: “Liar, Liar Pants on Fire”: a new benchmark dataset for fake news detection. ACL. https://arxiv.org/abs/1705.00648 (2017)

  25. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. https://arxiv.org/abs/1810.04805 (2018)

  26. Liu, Y., et al.: Roberta: a robustly optimized BERT pretraining approach. https://arxiv.org/abs/1907.11692 (2019)

  27. Loshchilov, I., Hutter F.: Fixing weight decay regularization in ADAM. arXiv preprint arXiv:1711.05101 (2017)

  28. Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, pp. 8026–8037 (2019)

    Google Scholar 

  29. Wolf, T. et al.: Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45 (2020)

    Google Scholar 

  30. Glazkova, A., Glazkov, M., Trifonov, T.: g2tmn at constraint@AAAI2021: exploiting CT-BERT and Ensembling learning for COVID-19 fake news detection. In: Combating Online Hostile Posts in Regional Languages during Emergency Situation, pp. 116–127 (2021)

    Google Scholar 

  31. Akhtyamova, L., Alexandrov, M., Cardiff, J., Koshulko, O.: Opinion mining on small and noisy samples of health-related texts. In: Shakhovska, N., Medykovskyy, M.O. (eds.) CSIT 2018. AISC, vol. 871, pp. 379–390. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-01069-0_27

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elena Shushkevich .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shushkevich, E., Alexandrov, M., Cardiff, J. (2022). BERT-based Classifiers for Fake News Detection on Short and Long Texts with Noisy Data: A Comparative Analysis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2022. Lecture Notes in Computer Science(), vol 13502. Springer, Cham. https://doi.org/10.1007/978-3-031-16270-1_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16270-1_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16269-5

  • Online ISBN: 978-3-031-16270-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics