Efficient Multivariate Data Fusion for Misinformation Detection During High Impact Events

Damasceno, Lucas P.; Shafer, Allison; Japkowicz, Nathalie; Cavalcante, Charles C.; Boukouvalas, Zois

doi:10.1007/978-3-031-18840-4_19

Lucas P. Damasceno⁹,
Allison Shafer¹⁰,
Nathalie Japkowicz¹⁰,
Charles C. Cavalcante⁹ &
…
Zois Boukouvalas¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13601))

Included in the following conference series:

International Conference on Discovery Science

840 Accesses
4 Citations

Abstract

With the evolution of social media, cyberspace has become the de-facto medium for users to communicate during high-impact events such as natural disasters, terrorist attacks, and periods of political unrest. However, during such high-impact events, misinformation on social media can rapidly spread, affecting decision-making and creating social unrest. Identifying the spread of misinformation during high-impact events is a significant data challenge, given the variety of data associated with social media posts. Recent machine learning advances have shown promise for detecting misinformation, however, there are still key limitations that make this a significant challenge. These limitations include the effective and efficient modeling of the underlying non-linear associations of multi-modal data as well as the explainability of a system geared at the detection of misinformation. This paper presents a novel multivariate data fusion framework based on pre-trained deep learning features and a well-structured and parameter-free joint blind source separation method named independent vector analysis, that can reliably respond to this set of limitations. We present the mathematical formulation of the new data fusion algorithm, demonstrate its effectiveness, and present multiple explainability case studies using a popular multi-modal dataset that consists of tweets during several high-impact events.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/MKLab-ITI/image-verification-corpus.
2.
We also evaluated features created using Bidirectional Encoder Representations from Transformers, or BERT [18].
3.
https://code.google.com/archive/p/word2vec/.
4.
Additionally, we evaluated Word2Vec trained using our own data.
5.
We also analyzed using the ’avgpool’ layer from a pre-trained ResNet-18 model.
6.
We consider continuous-valued random variables and in the sequel, refer to differential entropy as simply entropy for simplicity.
7.
https://www.independent.co.uk/news/world/asia/lahore-attack-photo-showing-eiffel-tower-lit-up-in-colours-of-pakistan-flag-is-from-2007-rugby-world-cup-a6959231.html.

References

The Washington Post (2018). https://rebrand.ly/ieeovv
Newsweek (2019). https://rebrand.ly/z6t52a
Hateful memes challenge and data set for research on harmful multimodal content. https://ai.facebook.com/blog/hateful-memes-challenge-and-data-set/
Adalı, T., Anderson, M., Fu, G.S.: Diversity in Independent Component and Vector Analyses: Identifiability, algorithms, and applications in medical imaging. IEEE Sig. Process. Mag. 31(3), 18–33 (2014)
Article Google Scholar
Anderson, M., Adalı, T., Li, X.L.: Joint blind source separation with multivariate gaussian model: algorithms and performance analysis. Sig. Process. IEEE Trans. 60(4), 1672–1683 (2012). https://doi.org/10.1109/TSP.2011.2181836
Article MathSciNet MATH Google Scholar
Baltrušaitis, T., Ahuja, C., Morency, L.P.: Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41(2), 423–443 (2018)
Article Google Scholar
BBC: Social media firms fail to act on covid-19 fake news. www.bbc.com/news/technology-52903680, June 2020
Boididou, C., Papadopoulos, S., Zampoglou, M., Apostolidis, L., Papadopoulou, O., Kompatsiaris, I.: Detection and visualization of misleading content on twitter. Int. J. Multimedia Inf. Retrieval 7 (2018). https://doi.org/10.1007/s13735-017-0143-x
Boididou, C., Papadopoulos, S., Zampoglou, M., Apostolidis, L., Papadopoulou, O., Kompatsiaris, Y.: Detection and visualization of misleading content on twitter. Int. J. Multimedia Inf. Retrieval 7(1), 71–86 (2018). https://doi.org/10.1007/s13735-017-0143-x
Article Google Scholar
Boukouvalas, Z., Fu, G.S., Adalı, T.: An efficient multivariate generalized gaussian distribution estimator: Application to IVA. In: 2015 49th Annual Conference on Information Sciences and Systems (CISS), pp. 1–4. IEEE (2015)
Google Scholar
Boukouvalas, Z., Levin-Schwartz, Y., Mowakeaa, R., Fu, G.S., Adalı, T.: Independent component analysis using semi-parametric density estimation via entropy maximization. In: 2018 IEEE Statistical Signal Processing Workshop (SSP), pp. 403–407. IEEE (2018)
Google Scholar
Boukouvalas, Z., Puerto, M., Elton, D.C., Chung, P.W., Fuge, M.D.: Independent vector analysis for molecular data fusion: Application to property prediction and knowledge discovery of energetic materials. In: 2020 28th European Signal Processing Conference (EUSIPCO), pp. 1030–1034. IEEE (2021)
Google Scholar
Cao, J., Qi, P., Sheng, Q., Yang, T., Guo, J., Li, J.: Exploring the role of visual content in fake news detection. In: Shu, K., Wang, S., Lee, D., Liu, H. (eds.) Disinformation, Misinformation, and Fake News in Social Media. LNSN, pp. 141–161. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-42699-6_8
Chapter Google Scholar
Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J.L., Blei, D.M.: Reading tea leaves: how humans interpret topic models. In: Advances in Neural Information Processing Systems, pp. 288–296 (2009)
Google Scholar
Comon, P., Jutten, C.: Handbook of Blind Source Separation: Independent Component Analysis and Applications. Academic Press, Cambridge (2010)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). https://doi.org/10.1023/A:1022627411411
Damasceno, L.P., Cavalcante, C.C., Adalı, T., Boukouvalas, Z.: Independent vector analysis using semi-parametric density estimation via multivariate entropy maximization. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3715–3719. IEEE (2021)
Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). arxiv.org/abs/1810.04805
Dick, J., Kuo, F.Y., Sloan, I.H.: High-dimensional integration: the quasi-monte Carlo way. Acta Numerica 22, 133–288 (2013). https://doi.org/10.1017/S0962492913000044
Article MathSciNet MATH Google Scholar
Fu, G., Boukouvalas, Z., Adali, T.: Density estimation by entropy maximization with kernels. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1896–1900, April 2015. https://doi.org/10.1109/ICASSP.2015.7178300
Hansen, L.K., Rieger, L.: Interpretability in intelligent systems – a new concept? In: Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Müller, K.-R. (eds.) Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. LNCS (LNAI), vol. 11700, pp. 41–49. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28954-6_3
Chapter Google Scholar
Hardoon, D.R., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16(12), 2639–2664 (2004)
Article MATH Google Scholar
Hiten Patel, M.: Fake news about covid-19 is spreading faster than virus. https://wexnermedical.osu.edu/blog/fake-news-about-covid-19, April 2020
Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis, vol. 46. Wiley, Hoboken (2004)
Google Scholar
Kim, T., Eltoft, T., Lee, T.-W.: Independent vector analysis: an extension of ICA to multivariate components. In: Rosca, J., Erdogmus, D., Príncipe, J.C., Haykin, S. (eds.) ICA 2006. LNCS, vol. 3889, pp. 165–172. Springer, Heidelberg (2006). https://doi.org/10.1007/11679363_21
Chapter MATH Google Scholar
Linardatos, P., Papastefanopoulos, V., Kotsiantis, S.: Explainable AI: a review of machine learning interpretability methods. Entropy 23(1), 18 (2020)
Article Google Scholar
Mikolov, T., Chen, K., Corrado, G.S., Dean, J.: Efficient estimation of word representations in vector space abs/1301.3781
Google Scholar
Moroney, C., et al.: The case for latent variable vs deep learning methods in misinformation detection: an application to covid-19. In: Soares, C., Torgo, L. (eds.) DS 2021. LNCS (LNAI), vol. 12986, pp. 422–432. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88942-5_33
Chapter Google Scholar
Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. Society for Industrial and Applied Mathematics, USA (1992)
Google Scholar
Ramachandram, D., Taylor, G.W.: Deep multimodal learning: a survey on recent advances and trends. IEEE Sig. Process. Mag. 34(6), 96–108 (2017)
Article Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should i trust you?”: Explaining the predictions of any classifier arxiv.org/abs/1602.04938
Sharma, K., Qian, F., Jiang, H., Ruchansky, N., Zhang, M., Liu, Y.: Combating fake news: a survey on identification and mitigation techniques. ACM Trans. Intell. Syst. Technol. (TIST) 10(3), 1–42 (2019)
Article Google Scholar
Suciu, P.: Covid-19 conspiracy theories continue to spread and thrive on social media. www.forbes.com/sites/petersuciu/2020/04/24/covid-19-conspiracy-theories-continue-to-spread-and-thrive-on-social-media/#e1a9e8b10076, April 2020

Download references

Author information

Authors and Affiliations

Federal University of Ceará, Fortaleza, CE, 60455-760, Brazil
Lucas P. Damasceno & Charles C. Cavalcante
American University, Washington, DC, 20016, USA
Allison Shafer, Nathalie Japkowicz & Zois Boukouvalas

Authors

Lucas P. Damasceno
View author publications
You can also search for this author in PubMed Google Scholar
Allison Shafer
View author publications
You can also search for this author in PubMed Google Scholar
Nathalie Japkowicz
View author publications
You can also search for this author in PubMed Google Scholar
Charles C. Cavalcante
View author publications
You can also search for this author in PubMed Google Scholar
Zois Boukouvalas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lucas P. Damasceno .

Editor information

Editors and Affiliations

University of Montpellier, Montpellier, France
Poncelet Pascal
INRAE, Montpellier, France
Dino Ienco

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Damasceno, L.P., Shafer, A., Japkowicz, N., Cavalcante, C.C., Boukouvalas, Z. (2022). Efficient Multivariate Data Fusion for Misinformation Detection During High Impact Events. In: Pascal, P., Ienco, D. (eds) Discovery Science. DS 2022. Lecture Notes in Computer Science(), vol 13601. Springer, Cham. https://doi.org/10.1007/978-3-031-18840-4_19

Download citation

DOI: https://doi.org/10.1007/978-3-031-18840-4_19
Published: 06 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18839-8
Online ISBN: 978-3-031-18840-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Efficient Multivariate Data Fusion for Misinformation Detection During High Impact Events