Abstract
The dynamism of fake news evolution and dissemination plays a crucial role in influencing and confirming personal beliefs. To minimize the spread of disinformation approaches proposed in the literature, automatic fake news detection generally learns models through binary supervised algorithms considering textual and contextual information. However, labeling significant amounts of real news to build accurate classifiers is difficult and time-consuming due to their broad spectrum. Positive and unlabeled learning (PUL) can be a good alternative in this scenario. PUL algorithms learn models considering little labeled data of the interest class and use unlabeled data to increase classification performance. This paper proposes a heterogeneous network variant of the PU-LP algorithm, a PUL algorithm based on similarity networks. Our network incorporates different linguistic features to characterize fake news, such as representative terms, emotiveness, pausality, and average sentence size. Also, we considered two representations of the news to compute similarity: term frequency-inverse document frequency, and Doc2Vec, which creates fixed-sized document representations regardless of its length. We evaluated our approach in six datasets written in Portuguese or English, comparing its performance with a binary semi-supervised baseline algorithm, using two well-established label propagation algorithms: LPHN and GNetMine. The results indicate that PU-LP with heterogeneous networks can be competitive to binary semi-supervised learning. Also, linguistic features such as representative terms and pausality improved the classification performance, especially when there is a small amount of labeled news.
Supported by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior [10662147/D], Fundação de Amparo à Pesquisa do Estado de São Paulo [2019/25010-5, 2019/07665-4], and Conselho Nacional de Desenvolvimento Científico e Tecnológico [426663/2018-7, 433082/2018-6, and 438017/2018-8].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
All datasets and source codes used in this paper are available in our public repository: https://github.com/marianacaravanti/A-Heterogeneous-Network-based-Positive-and-Unlabeled-Learning-Approach-to-Detecting-Fake-News.
- 2.
- 3.
We also evaluate the average of \(F_1\) for both classes (macro-averaging \(F_{1}\)). Due to space limitations, the complete results are available in our public repository: https://github.com/marianacaravanti/A-Heterogeneous-Network-based-Positive-and-Unlabeled-Learning-Approach-to-Detecting-Fake-News/tree/main/Results.
References
Aggarwal, C.C.: Machine Learning for Text. Springer Publishing (2018). https://doi.org/10.1007/978-3-319-73531-3
Vargas, F.A., Pardo, T.A.S.: Studying dishonest intentions in Brazilian Portuguese texts. arXiv e-prints (2020)
Bekker, J., Davis, J.: Learning from positive and unlabeled data: a survey. Mach. Learn. 109(4), 719–760 (2020). https://doi.org/10.1007/s10994-020-05877-5
Bondielli, A., Marcelloni, F.: A survey on fake news and rumour detection techniques. Inf. Sci. 497, 38–55 (2019)
Faustini, P., Covões, T.F.: Fake news detection using one-class classification. In: 2019 8th Brazilian Conference on Intelligent Systems, pp. 592–597. IEEE (2019)
Greifeneder, R., Jaffe, M., Newman, E., Schwarz, N.: The Psychology of Fake News: Accepting, Sharing, and Correcting Misinformation. Routledge, Milton Park (2021)
Ji, M., Sun, Y., Danilevsky, M., Han, J., Gao, J.: Graph regularized transductive classification on heterogeneous information networks. In: European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 570–586 (2010)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)
Ma, S., Zhang, R.: PU-LP: a novel approach for positive and unlabeled learning by label propagation. In: 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 537–542. IEEE (2017)
Pennebaker, J.W., Boyd, R.L., Jordan, K., Blackburn, K.: The development and psychometric properties of liwc2015. University of Texas, Technical report (2015)
Ren, Y., Wang, B., Zhang, J., Chang, Y.: Adversarial active learning based heterogeneous graph neural network for fake news detection. In: 2020 IEEE International Conference on Data Mining (ICDM), pp. 452–461. IEEE (2020)
Rossi, R.G.: Automatic text classification through network-based machine learning. Ph.D. thesis, University of São Paulo, Doctoral thesis (2016). (in Portuguese)
Santos, R.L.S., Pardo, T.A.S.: Fact-checking for Portuguese: knowledge graph and google search-based methods. In: Quaresma, P., Vieira, R., Aluísio, S., Moniz, H., Batista, F., Gonçalves, T. (eds.) PROPOR 2020. LNCS (LNAI), vol. 12037, pp. 195–205. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-41505-1_19
Santos, B.N.: Transductive classification of events using heterogeneous networks. Master’s Thesis - Federal University of Mato Grosso do Sul (2018). (in Portuguese)
Heterogeneous Information Network Analysis and Applications. DA. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56212-4_9
Shu, K., Mahudeswaran, D., Wang, S., Lee, D., Liu, H.: FakeNewsNet: a data repository with news content, social context, and spatiotemporal information for studying fake news on social media. Big Data 8(3), 171–188 (2020)
Silva, R.M., Santos, R.L., Almeida, T.A., Pardo, T.A.: Towards automatically filtering fake news in Portuguese. Expert Syst. Appl. 146, 113–199 (2020)
Singh, V.K., Ghosh, I., Sonagara, D.: Detecting fake news stories via multimodal analysis. Assoc. Inf. Sci. Technol. 72(1), 3–17 (2021)
Yu, J., Huang, Q., Zhou, X., Sha, Y.: IARnet: an information aggregating and reasoning network over heterogeneous graph for fake news detection. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–9. IEEE (2020)
Yu, S., Li, C.: PE-PUC: a graph based PU-learning approach for text classification. In: Perner, P. (ed.) MLDM 2007. LNCS (LNAI), vol. 4571, pp. 574–584. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73499-4_43
Zhang, J., Dong, B., Philip, S.Y.: Deep diffusive neural network based fake news detection from heterogeneous social networks. In: Big Data 2019: International Conference on Big Data, pp. 1259–1266. IEEE (2019)
Zhang, X., Ghorbani, A.A.: An overview of online fake news: characterization, detection, and discussion. Inf. Process. Manage. 57(2), 102025 (2020)
Zhu, X., Ghahramani, Z., Lafferty, J.D.: Semi-supervised learning using gaussian fields and harmonic functions. In: Proceedings of the 20th International Conference on Machine Learning (ICML-2003), pp. 912–919 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
de Souza, M.C., Nogueira, B.M., Rossi, R.G., Marcacini, R.M., Rezende, S.O. (2021). A Heterogeneous Network-Based Positive and Unlabeled Learning Approach to Detect Fake News. In: Britto, A., Valdivia Delgado, K. (eds) Intelligent Systems. BRACIS 2021. Lecture Notes in Computer Science(), vol 13074. Springer, Cham. https://doi.org/10.1007/978-3-030-91699-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-91699-2_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91698-5
Online ISBN: 978-3-030-91699-2
eBook Packages: Computer ScienceComputer Science (R0)