A Heterogeneous Network-Based Positive and Unlabeled Learning Approach to Detect Fake News

de Souza, Mariana C.; Nogueira, Bruno M.; Rossi, Rafael G.; Marcacini, Ricardo M.; Rezende, Solange O.

doi:10.1007/978-3-030-91699-2_1

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13074))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

1010 Accesses
5 Citations

Abstract

The dynamism of fake news evolution and dissemination plays a crucial role in influencing and confirming personal beliefs. To minimize the spread of disinformation approaches proposed in the literature, automatic fake news detection generally learns models through binary supervised algorithms considering textual and contextual information. However, labeling significant amounts of real news to build accurate classifiers is difficult and time-consuming due to their broad spectrum. Positive and unlabeled learning (PUL) can be a good alternative in this scenario. PUL algorithms learn models considering little labeled data of the interest class and use unlabeled data to increase classification performance. This paper proposes a heterogeneous network variant of the PU-LP algorithm, a PUL algorithm based on similarity networks. Our network incorporates different linguistic features to characterize fake news, such as representative terms, emotiveness, pausality, and average sentence size. Also, we considered two representations of the news to compute similarity: term frequency-inverse document frequency, and Doc2Vec, which creates fixed-sized document representations regardless of its length. We evaluated our approach in six datasets written in Portuguese or English, comparing its performance with a binary semi-supervised baseline algorithm, using two well-established label propagation algorithms: LPHN and GNetMine. The results indicate that PU-LP with heterogeneous networks can be competitive to binary semi-supervised learning. Also, linguistic features such as representative terms and pausality improved the classification performance, especially when there is a small amount of labeled news.

Supported by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior [10662147/D], Fundação de Amparo à Pesquisa do Estado de São Paulo [2019/25010-5, 2019/07665-4], and Conselho Nacional de Desenvolvimento Científico e Tecnológico [426663/2018-7, 433082/2018-6, and 438017/2018-8].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
All datasets and source codes used in this paper are available in our public repository: https://github.com/marianacaravanti/A-Heterogeneous-Network-based-Positive-and-Unlabeled-Learning-Approach-to-Detecting-Fake-News.
2.
https://github.com/several27/FakeNewsCorpus.
3.
We also evaluate the average of \(F_1\) for both classes (macro-averaging \(F_{1}\)). Due to space limitations, the complete results are available in our public repository: https://github.com/marianacaravanti/A-Heterogeneous-Network-based-Positive-and-Unlabeled-Learning-Approach-to-Detecting-Fake-News/tree/main/Results.

References

Aggarwal, C.C.: Machine Learning for Text. Springer Publishing (2018). https://doi.org/10.1007/978-3-319-73531-3
Vargas, F.A., Pardo, T.A.S.: Studying dishonest intentions in Brazilian Portuguese texts. arXiv e-prints (2020)
Google Scholar
Bekker, J., Davis, J.: Learning from positive and unlabeled data: a survey. Mach. Learn. 109(4), 719–760 (2020). https://doi.org/10.1007/s10994-020-05877-5
Article MathSciNet MATH Google Scholar
Bondielli, A., Marcelloni, F.: A survey on fake news and rumour detection techniques. Inf. Sci. 497, 38–55 (2019)
Article Google Scholar
Faustini, P., Covões, T.F.: Fake news detection using one-class classification. In: 2019 8th Brazilian Conference on Intelligent Systems, pp. 592–597. IEEE (2019)
Google Scholar
Greifeneder, R., Jaffe, M., Newman, E., Schwarz, N.: The Psychology of Fake News: Accepting, Sharing, and Correcting Misinformation. Routledge, Milton Park (2021)
Google Scholar
Ji, M., Sun, Y., Danilevsky, M., Han, J., Gao, J.: Graph regularized transductive classification on heterogeneous information networks. In: European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 570–586 (2010)
Google Scholar
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)
Google Scholar
Ma, S., Zhang, R.: PU-LP: a novel approach for positive and unlabeled learning by label propagation. In: 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 537–542. IEEE (2017)
Google Scholar
Pennebaker, J.W., Boyd, R.L., Jordan, K., Blackburn, K.: The development and psychometric properties of liwc2015. University of Texas, Technical report (2015)
Google Scholar
Ren, Y., Wang, B., Zhang, J., Chang, Y.: Adversarial active learning based heterogeneous graph neural network for fake news detection. In: 2020 IEEE International Conference on Data Mining (ICDM), pp. 452–461. IEEE (2020)
Google Scholar
Rossi, R.G.: Automatic text classification through network-based machine learning. Ph.D. thesis, University of São Paulo, Doctoral thesis (2016). (in Portuguese)
Google Scholar
Santos, R.L.S., Pardo, T.A.S.: Fact-checking for Portuguese: knowledge graph and google search-based methods. In: Quaresma, P., Vieira, R., Aluísio, S., Moniz, H., Batista, F., Gonçalves, T. (eds.) PROPOR 2020. LNCS (LNAI), vol. 12037, pp. 195–205. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-41505-1_19
Chapter Google Scholar
Santos, B.N.: Transductive classification of events using heterogeneous networks. Master’s Thesis - Federal University of Mato Grosso do Sul (2018). (in Portuguese)
Google Scholar
Heterogeneous Information Network Analysis and Applications. DA. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56212-4_9
Shu, K., Mahudeswaran, D., Wang, S., Lee, D., Liu, H.: FakeNewsNet: a data repository with news content, social context, and spatiotemporal information for studying fake news on social media. Big Data 8(3), 171–188 (2020)
Article Google Scholar
Silva, R.M., Santos, R.L., Almeida, T.A., Pardo, T.A.: Towards automatically filtering fake news in Portuguese. Expert Syst. Appl. 146, 113–199 (2020)
Article Google Scholar
Singh, V.K., Ghosh, I., Sonagara, D.: Detecting fake news stories via multimodal analysis. Assoc. Inf. Sci. Technol. 72(1), 3–17 (2021)
Article Google Scholar
Yu, J., Huang, Q., Zhou, X., Sha, Y.: IARnet: an information aggregating and reasoning network over heterogeneous graph for fake news detection. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–9. IEEE (2020)
Google Scholar
Yu, S., Li, C.: PE-PUC: a graph based PU-learning approach for text classification. In: Perner, P. (ed.) MLDM 2007. LNCS (LNAI), vol. 4571, pp. 574–584. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73499-4_43
Chapter Google Scholar
Zhang, J., Dong, B., Philip, S.Y.: Deep diffusive neural network based fake news detection from heterogeneous social networks. In: Big Data 2019: International Conference on Big Data, pp. 1259–1266. IEEE (2019)
Google Scholar
Zhang, X., Ghorbani, A.A.: An overview of online fake news: characterization, detection, and discussion. Inf. Process. Manage. 57(2), 102025 (2020)
Google Scholar
Zhu, X., Ghahramani, Z., Lafferty, J.D.: Semi-supervised learning using gaussian fields and harmonic functions. In: Proceedings of the 20th International Conference on Machine Learning (ICML-2003), pp. 912–919 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of São Paulo, São Carlos, Brazil
Mariana C. de Souza, Ricardo M. Marcacini & Solange O. Rezende
Federal University of Mato Grosso do Sul, Campo Grande, Brazil
Bruno M. Nogueira
Federal University of Mato Grosso do Sul, Três Lagoas, Brazil
Rafael G. Rossi

Authors

Mariana C. de Souza
View author publications
You can also search for this author in PubMed Google Scholar
Bruno M. Nogueira
View author publications
You can also search for this author in PubMed Google Scholar
Rafael G. Rossi
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo M. Marcacini
View author publications
You can also search for this author in PubMed Google Scholar
Solange O. Rezende
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mariana C. de Souza .

Editor information

Editors and Affiliations

Universidade Federal de Sergipe, São Cristóvão, Brazil
André Britto
Universidade de São Paulo, São Paulo, Brazil
Karina Valdivia Delgado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de Souza, M.C., Nogueira, B.M., Rossi, R.G., Marcacini, R.M., Rezende, S.O. (2021). A Heterogeneous Network-Based Positive and Unlabeled Learning Approach to Detect Fake News. In: Britto, A., Valdivia Delgado, K. (eds) Intelligent Systems. BRACIS 2021. Lecture Notes in Computer Science(), vol 13074. Springer, Cham. https://doi.org/10.1007/978-3-030-91699-2_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-91699-2_1
Published: 28 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91698-5
Online ISBN: 978-3-030-91699-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics