Skip to main content
Log in

Integrating Machine Learning Techniques in Semantic Fake News Detection

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

The nuances of languages, as well as the varying degrees of truth observed in news items, make fake news detection a difficult problem to solve. A news item is never launched without a purpose, therefore in order to understand its motivation it is best to analyze the relations between the speaker and its subject, as well as different credibility metrics. Inferring details about the various actors involved in a news item is a problem that requires a hybrid approach that mixes machine learning, semantics and natural language processing. This article discusses a semantic fake news detection method built around relational features like sentiment, entities or facts extracted directly from text. Our experiments are focused on short texts with different degrees of truth and show that adding semantic features improves accuracy significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. https://github.com/sloria/textblob.

  2. https://github.com/aolieman/pyspotlight.

  3. https://github.com/dbpedia-spotlight/dbpedia-spotlight-model.

  4. https://spacy.io/.

  5. https://www.w3.org/TR/vocab-data-cube/.

References

  1. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray DG, Steiner B, Tucker PA, Vasudevan V, Warden P, Wicke M, Yu Y, Zhang X (2016) Tensorflow: a system for large-scale machine learning. CoRR. arXiv:1605.08695

  2. Aghakhani H, Machiry A, Nilizadeh S, Kruegel C, Vigna G (2018) Detecting deceptive reviews using generative adversarial networks. CoRR. arXiv:1805.10364

  3. Al-Moslmi T, Ocaña MG, Opdahl AL, Veres C (2020) Named entity extraction for knowledge graphs: a literature overview. IEEE Access 8:32862–32881. https://doi.org/10.1109/ACCESS.2020.2973928

    Article  Google Scholar 

  4. Allcott H, Gentzkow M (2017) Social media and fake news in the 2016 election. J. Econ. Perspect. 31(2):211–36

    Article  Google Scholar 

  5. Atanasova P, Nakov P, Màrquez L, Barrón-Cedeño A, Karadzhov G, Mihaylova T, Mohtarami M, Glass JR (2019) Automatic fact-checking using context and discourse information. J Data Inf Qual. https://doi.org/10.1145/3297722

    Article  Google Scholar 

  6. Barrón-Cedeño A, Martino GDS, Jaradat I, Nakov P (2019) Proppy: a system to unmask propaganda in online news. In: The 33rd AAAI conference on artificial intelligence, AAAI 2019, the thirty-first innovative applications of artificial intelligence conference, IAAI 2019, the ninth AAAI symposium on educational advances in artificial intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27–February 1, 2019, AAAI Press, pp 9847–9848. https://aaai.org/ojs/index.php/AAAI/article/view/5061

  7. Bender EM, Derczynski L, Isabelle P (eds) (2018) Proceedings of the 27th international conference on computational linguistics, COLING 2018, Santa Fe, New Mexico, USA, August 20–26, 2018, association for computational linguistics. https://www.aclweb.org/anthology/volumes/C18-1/

  8. Berghel H (2017) Lies, damn lies, and fake news. IEEE Comput 50(2):80–85. https://doi.org/10.1109/MC.2017.56

    Article  Google Scholar 

  9. Brasoveanu AMP, Andonie R (2019) Semantic fake news detection: a machine learning perspective. In: Rojas I, Joya G, Català A (eds) Advances in computational intelligence—15th international work-conference on artificial neural networks, IWANN 2019, Gran Canaria, Spain, June 12–14, 2019, Proceedings, part I, Springer, lecture notes in computer science, vol 11506, pp 656–667. https://doi.org/10.1007/978-3-030-20521-8_54

  10. Cambria E, Poria S, Gelbukh AF, Thelwall M (2017) Sentiment analysis is a big suitcase. IEEE Intell Syst 32(6):74–80. https://doi.org/10.1109/MIS.2017.4531228

    Article  Google Scholar 

  11. Chiu JPC, Nichols E (2016) Named entity recognition with bidirectional LSTM-CNNs. TACL 4:357–370. https://transacl.org/ojs/index.php/tacl/article/view/792

  12. Chollet F (2017) Deep learning with python. Manning Publications Co

  13. Clark K, Khandelwal U, Levy O, Manning CD (2019) What does BERT look at? An analysis of bert’s attention. CoRR. arXiv:1906.04341

  14. Daiber J, Jakob M, Hokamp C, Mendes PN (2013) Improving efficiency and accuracy in multilingual entity extraction. In: Sabou M, Blomqvist E, Noia TD, Sack H, Pellegrini T (eds) I-SEMANTICS 2013—9th international conference on semantic systems, ISEM ’13, Graz, Austria, September 4–6, 2013, ACM, pp 121–124. https://doi.org/10.1145/2506182.2506198

  15. Devlin J, Chang M, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein J, Doran C, Solorio T (eds) Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2–7, 2019, vol 1 (long and short papers), Association for Computational Linguistics, pp 4171–4186. https://doi.org/10.18653/v1/n19-1423

  16. Fentaw HW, Kim TH (2019) Design and investigation of capsule networks for sentence classification. Appl Sci 9(11):2200. https://doi.org/10.3390/app9112200

    Article  Google Scholar 

  17. Fourney A, Rácz MZ, Ranade G, Mobius M, Horvitz E (2017) Geographic and temporal trends in fake news consumption during the 2016 US presidential election. In: [36], pp 2071–2074. https://doi.org/10.1145/3132847.3133147

  18. Gandon F (2018) A survey of the first 20 years of research on semantic web and linked data. Ingénierie des Systèmes d’Information 23(3–4):11–38. https://doi.org/10.3166/isi.23.3-4.11-38

    Article  Google Scholar 

  19. Gangemi A, Presutti V, Recupero DR, Nuzzolese AG, Draicchio F, Mongiovì M (2017) Semantic web machine reading with FRED. Semant Web 8(6):873–893. https://doi.org/10.3233/SW-160240

    Article  Google Scholar 

  20. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville AC, Bengio Y (2014) Generative adversarial nets. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems 27: annual conference on neural information processing systems 2014, December 8–13 2014, Montreal, Quebec, Canada, pp 2672–2680. http://papers.nips.cc/paper/5423-generative-adversarial-nets

  21. Gururangan S, Dang T, Card D, Smith NA (2019) Variational pretraining for semi-supervised text classification. In: [34], pp 5880–5894. https://doi.org/10.18653/v1/p19-1590

  22. Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds) (2017) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, 4–9 December 2017, Long Beach, CA, USA

  23. Habib A, Asghar MZ, Khan A, Habib A, Khan A (2019) False information detection in online content and its role in decision making: a systematic literature review. Soc Netw Anal Min 9(1):50

    Article  Google Scholar 

  24. Hastie T, Tibshirani R, Friedman JH (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, Berlin. http://www.worldcat.org/oclc/300478243

  25. Irie K, Tüske Z, Alkhouli T, Schlüter R, Ney H (2016) LSTM, GRU, highway and a bit of attention: an empirical overview for language modeling in speech recognition. In: Morgan N (ed) Interspeech 2016, 17th annual conference of the international speech communication association, San Francisco, CA, USA, September 8–12, 2016, ISCA, pp 3519–3523. https://doi.org/10.21437/Interspeech.2016-491

  26. Ji H, Nothman J (2016) Overview of TAC-KBP2016 tri-lingual EDL and its impact on end-to-end KBP. In: Eighth text analysis conference (TAC), NIST. https://tac.nist.gov/publications/2016/additional.papers/

  27. Jin Z, Cao J, Zhang Y, Luo J (2016) News verification by exploiting conflicting social viewpoints in microblogs. In: Schuurmans D, Wellman MP (eds) Proceedings of the thirtieth AAAI conference on artificial intelligence, February 12–17, 2016, Phoenix, Arizona, USA, AAAI Press, pp 2972–2978. http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12128

  28. Karimi H, Roy P, Saba-Sadiya S, Tang J (2018) Multi-source multi-class fake news detection. In: [7], pp 1546–1557. https://aclanthology.info/papers/C18-1131/c18-1131

  29. Kiesel J, Mestre M, Shukla R, Vincent E, Adineh P, Corney D, Stein B, Potthast M (2019) Semeval-2019 task 4: hyperpartisan news detection. In: May J, Shutova E, Herbelot A, Zhu X, Apidianaki M, Mohammad SM (eds) Proceedings of the 13th international workshop on semantic evaluation, SemEval@NAACL-HLT 2019, Minneapolis, MN, USA, June 6–7, 2019, Association for Computational Linguistics, pp 829–839. https://www.aclweb.org/anthology/S19-2145/

  30. Kim J, Jang S, Park EL, Choi S (2020) Text classification using capsules. Neurocomputing 376:214–221. https://doi.org/10.1016/j.neucom.2019.10.033

    Article  Google Scholar 

  31. Kim Y (2014) Convolutional neural networks for sentence classification. In: Moschitti A, Pang B, Daelemans W (eds) Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP 2014, October 25–29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, ACL, pp 1746–1751. https://www.aclweb.org/anthology/D14-1181/

  32. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. CoRR. arXiv:1412.6980

  33. Kiperwasser E, Goldberg Y (2016) Simple and accurate dependency parsing using bidirectional LSTM feature representations. TACL 4:313–327. https://transacl.org/ojs/index.php/tacl/article/view/885

  34. Korhonen A, Traum DR, Màrquez L (eds) (2019) Proceedings of the 57th conference of the association for computational linguistics, ACL 2019, Florence, Italy, July 28–August 2, 2019, vol 1, Long Papers, Association for Computational Linguistics. https://www.aclweb.org/anthology/volumes/P19-1/

  35. Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes PN, Hellmann S, Morsey M, van Kleef P, Auer S, Bizer C (2015) DBpedia—a large-scale, multilingual knowledge base extracted from wikipedia. Semant Web 6(2):167–195. https://doi.org/10.3233/SW-140134

    Article  Google Scholar 

  36. Lim E, Winslett M, Sanderson M, Fu AW, Sun J, Culpepper JS, Lo E, Ho JC, Donato D, Agrawal R, Zheng Y, Castillo C, Sun A, Tseng VS, Li C (eds) (2017) Proceedings of the 2017 ACM on conference on information and knowledge management, CIKM 2017, Singapore, November 06–10, 2017, ACM. http://dl.acm.org/citation.cfm?id=3132847

  37. Liu C, Wu X, Yu M, Li G, Jiang J, Huang W, Lu X (2019) A two-stage model based on bert for short fake news detection. In: International conference on knowledge science, Springer, Engineering and Management, pp 172–183

  38. Liu Y, Wu YB (2018) Early detection of fake news on social media through propagation path classification with recurrent and convolutional networks. In: McIlraith SA, Weinberger KQ (eds) Proceedings of the thirty-second AAAI conference on artificial intelligence, New Orleans, Louisiana, USA, February 2–7, 2018, AAAI Press. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16826

  39. Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized BERT pretraining approach. CoRR. arXiv:1907.11692

  40. Long Y, Lu Q, Xiang R, Li M, Huang C (2017) Fake news detection through multi-perspective speaker profiles. In: Kondrak G, Watanabe T (eds) Proceedings of the eighth international joint conference on natural language processing, IJCNLP 2017, Taipei, Taiwan, November 27–December 1, 2017, vol 2: short papers, Asian Federation of Natural Language Processing, pp 252–256. https://aclanthology.info/papers/I17-2043/i17-2043

  41. Lundberg SM, Lee S (2017) A unified approach to interpreting model predictions. In: [22], pp 4765–4774. http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions

  42. Mikolov T, Grave E, Bojanowski P, Puhrsch C, Joulin A (2018) Advances in Pre-Training Distributed Word Representations. In: Calzolari N, Choukri K, Cieri C, Declerck T, Goggi S, Hasida K, Isahara H, Maegaard B, Mariani J, Mazo H, Moreno A, Odijk J, Piperidis S, Tokunaga T (eds) Proceedings of the eleventh international conference on language resources and evaluation, LREC 2018, Miyazaki, Japan, May 7–12, 2018., European Language Resources Association (ELRA). http://www.lrec-conf.org/lrec2018

  43. Nickel M, Murphy K, Tresp V, Gabrilovich E (2016) A review of relational machine learning for knowledge graphs. Proc IEEE 104(1):11–33. https://doi.org/10.1109/JPROC.2015.2483592

    Article  Google Scholar 

  44. Parikh SB, Atrey PK (2018) Media-rich fake news detection: a survey. In: IEEE 1st conference on multimedia information processing and retrieval, MIPR 2018, Miami, FL, USA, April 10–12, 2018, IEEE, pp 436–441. http://doi.ieeecomputersociety.org/10.1109/MIPR.2018.00093

  45. Qi Y, Sachan DS, Felix M, Padmanabhan S, Neubig G (2018) When and why are pre-trained word embeddings useful for neural machine translation? In: Walker MA, Ji H, Stent A (eds) Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, NAACL-HLT, New Orleans, Louisiana, USA, June 1–6, 2018, vol 2 (Short Papers), Association for Computational Linguistics, pp 529–535. https://aclanthology.info/papers/N18-2084/n18-2084

  46. Rashkin H, Choi E, Jang JY, Volkova S, Choi Y (2017) Truth of varying shades: analyzing language in fake news and political fact-checking. In: Palmer M, Hwa R, Riedel S (eds) Proceedings of the 2017 conference on empirical methods in natural language processing, EMNLP 2017, Copenhagen, Denmark, September 9–11, 2017, Association for Computational Linguistics, pp 2931–2937. https://aclanthology.info/papers/D17-1317/d17-1317

  47. Ribeiro MT, Singh S, Guestrin C (2016) “why should I trust you?”: explaining the predictions of any classifier. In: Krishnapuram B, Shah M, Smola AJ, Aggarwal CC, Shen D, Rastogi R (eds) Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, CA, USA, August 13–17, 2016, ACM, pp 1135–1144. https://doi.org/10.1145/2939672.2939778

  48. Rubin V, Conroy N, Chen Y, Cornwell S (2016) Fake news or truth? Using satirical cues to detect potentially misleading news. In: Proceedings of the second workshop on computational approaches to deception detection, pp 7–17

  49. Rubin VL, Chen Y, Conroy NJ (2015) Deception detection for news: three types of fakes. In: Information science with impact: research in and for the community—proceedings of the 78th ASISand T annual meeting, ASIST 2015, St. Louis, Missouri, Missouri, USA, October 6–10, 2015, Wiley, Proceedings of the association for information science and technology, vol 52, no 1, pp 1–4. https://doi.org/10.1002/pra2.2015.145052010083

  50. Ruchansky N, Seo S, Liu Y (2017) CSI: a hybrid deep model for fake news detection. In: [36], pp 797–806

  51. Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. In: [22], pp 3859–3869. http://papers.nips.cc/paper/6975-dynamic-routing-between-capsules

  52. Schlichtkrull MS, Kipf TN, Bloem P, van den Berg R, Titov I, Welling M (2018) Modeling relational data with graph convolutional networks. In: Gangemi A, Navigli R, Vidal M, Hitzler P, Troncy R, Hollink L, Tordai A, Alam M (eds) The semantic web: 15th international conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings, Springer, lecture notes in computer science, vol 10843, pp 593–607. https://doi.org/10.1007/978-3-319-93417-4_38

  53. Shu K, Sliva A, Wang S, Tang J, Liu H (2017) Fake news detection on social media: a data mining perspective. SIGKDD Explor 19(1):22–36. https://doi.org/10.1145/3137597.3137600

    Article  Google Scholar 

  54. Shu K, Wang S, Liu H (2017) Exploiting tri-relationship for fake news detection. CoRR. arXiv:1712.07709

  55. Singhania S, Fernandez N, Rao S (2017) 3HAN: a deep neural network for fake news detection. In: Liu D, Xie S, Li Y, Zhao D, El-Alfy EM (eds) Neural information processing: 24th international conference, ICONIP 2017, Guangzhou, China, November 14–18, 2017, Proceedings, part II, Springer, lecture notes in computer science, vol 10635, pp 572–581. https://doi.org/10.1007/978-3-319-70096-0_59

  56. Solaiman I, Brundage M, Clark J, Askell A, Herbert-Voss A, Wu J, Radford A, Wang J (2019) Release strategies and the social impacts of language models. CoRR. arXiv:1908.09203

  57. Strubell E, Ganesh A, McCallum A (2019) Energy and policy considerations for deep learning in NLP. In: [34], pp 3645–3650. https://doi.org/10.18653/v1/p19-1355

  58. Thorne J, Vlachos A (2018) Automated fact checking: Task formulations, methods and future directions. In: [7], pp 3346–3359. https://www.aclweb.org/anthology/C18-1283/

  59. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: [22], pp 6000–6010. http://papers.nips.cc/paper/7181-attention-is-all-you-need

  60. Vo N, Lee K (2018) The rise of guardians: fact-checking URL recommendation to combat fake news. In: Collins-Thompson K, Mei Q, Davison BD, Liu Y, Yilmaz E (eds) The 41st international ACM SIGIR conference on research and development in information retrieval, SIGIR 2018, Ann Arbor, MI, USA, July 08–12, 2018, ACM, pp 275–284. https://doi.org/10.1145/3209978.3210037

  61. Vosoughi S, Roy D, Aral S (2018) The spread of true and false news online. Science 359(6380):1146–1151

    Article  Google Scholar 

  62. Wang WY (2017) “Liar, liar pants on fire”: A new benchmark dataset for fake news detection. CoRR. arXiv:1705.00648

  63. Wu L, Liu H (2018) Tracing fake-news footprints: characterizing social media messages by how they propagate. In: Chang Y, Zhai C, Liu Y, Maarek Y (eds) Proceedings of the eleventh ACM international conference on web search and data mining, WSDM 2018, Marina Del Rey, CA, USA, February 5–9, 2018, ACM, pp 637–645. https://doi.org/10.1145/3159652.3159677

  64. Yang K, Niven T, Kao H (2019) Fake news detection as natural language inference. CoRR. arXiv:1907.07347

  65. Young T, Hazarika D, Poria S, Cambria E (2018) Recent trends in deep learning based natural language processing [review article]. IEEE Comp Int Mag 13(3):55–75. https://doi.org/10.1109/MCI.2018.2840738

    Article  Google Scholar 

  66. Zannettou S, Sirivianos M, Blackburn J, Kourtellis N (2018) The web of false information: rumors, fake news, Hoaxes, Clickbait, and various other shenanigans. CoRR. arXiv:1804.03461

  67. Zellers R, Holtzman A, Rashkin H, Bisk Y, Farhadi A, Roesner F, Choi Y (2019) Defending against neural fake news. In: Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, Garnett R (eds) Advances in neural information processing systems 32: annual conference on neural information processing systems 2019, NeurIPS 2019, 8–14 December 2019, Vancouver, BC, Canada, pp 9051–9062. http://papers.nips.cc/paper/9106-defending-against-neural-fake-news

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adrian M. P. Braşoveanu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Braşoveanu, A.M.P., Andonie, R. Integrating Machine Learning Techniques in Semantic Fake News Detection. Neural Process Lett 53, 3055–3072 (2021). https://doi.org/10.1007/s11063-020-10365-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-020-10365-x

Keywords

Navigation