Skip to main content

ConTheModel: Can We Modify Tweets to Confuse Classifier Models?

  • Conference paper
  • First Online:
Silicon Valley Cybersecurity Conference (SVCC 2020)

Abstract

News on social media can significantly influence users, manipulating them for political or economic reasons. Adversarial manipulations in the text have proven to create vulnerabilities in classifiers, and the current research is towards finding classifier models that are not susceptible to such manipulations. In this paper, we present a novel technique called ConTheModel, which slightly modifies social media news to confuse machine learning (ML)-based classifiers under the black-box setting. ConTheModel replaces a word in the original tweet with its synonym or antonym to generate tweets that confuse classifiers. We evaluate our technique on three different scenarios of the dataset and perform a comparison between five well-known machine learning algorithms, which includes Support Vector Machine (SVM), Naive Bayes (NB), Random Forest (RF), eXtreme Gradient Boosting (XGBoost), and Multilayer Perceptron (MLP) to demonstrate the performance of classifiers on the modifications done by ConTheModel. Our results show that the classifiers are confused after modification with the utmost drop of 16.36%. We additionally conducted a human study with 25 participants to validate the effectiveness of ConTheModel and found that the majority of participants (65%) found it challenging to classify the tweets correctly. We hope our work will help in finding robust ML models against adversarial examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alzantot, M., Sharma, Y., Elgohary, A., Ho, B.J., Srivastava, M., Chang, K.W.: Generating natural language adversarial examples. arXiv preprint arXiv:1804.07998 (2018)

  2. Behzadan, V., Munir, A.: Vulnerability of deep reinforcement learning to policy induction attacks. In: Perner, P. (ed.) MLDM 2017. LNCS (LNAI), vol. 10358, pp. 262–275. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-62416-7_19

    Chapter  Google Scholar 

  3. Biggio, B., et al.: Evasion attacks against machine learning at test time. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS (LNAI), vol. 8190, pp. 387–402. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40994-3_25

    Chapter  Google Scholar 

  4. Carlini, N., et al.: Hidden voice commands. In: 25th USENIX Security Symposium (USENIX Security 16), pp. 513–530 (2016)

    Google Scholar 

  5. Carlini, N., Wagner, D.: Audio adversarial examples: targeted attacks on speech-to-text. In: 2018 IEEE Security and Privacy Workshops (SPW), pp. 1–7. IEEE (2018)

    Google Scholar 

  6. Clement, J.: Number of social network users worldwide from 2017 to 2025 (2020)

    Google Scholar 

  7. Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: HotFlip: white-box adversarial examples for text classification. arXiv preprint arXiv:1712.06751 (2017)

  8. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)

    Google Scholar 

  9. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)

  10. Hu, W., Tan, Y.: Generating adversarial malware examples for black-box attacks based on GAN. arXiv preprint arXiv:1702.05983 (2017)

  11. Jia, R., Liang, P.: Adversarial examples for evaluating reading comprehension systems. arXiv preprint arXiv:1707.07328 (2017)

  12. Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: TextFool: fool your model with natural adversarial text (2019)

    Google Scholar 

  13. Kaliyar, R.K., Goswami, A., Narang, P.: Multiclass fake news detection using ensemble machine learning. In: 2019 IEEE 9th International Conference on Advanced Computing (IACC), pp. 103–107. IEEE (2019)

    Google Scholar 

  14. Kim, Y., Kim, H.K., Kim, H., Hong, J.B.: Do many models make light work? evaluating ensemble solutions for improved rumor detection. IEEE Access 8, 150709–150724 (2020)

    Article  Google Scholar 

  15. Kochkina, E., Liakata, M., Zubiaga, A.: All-in-one: multi-task learning for rumour verification. arXiv preprint arXiv:1806.03713 (2018)

  16. Kuleshov, V., Thakoor, S., Lau, T., Ermon, S.: Adversarial examples for natural language classification problems (2018)

    Google Scholar 

  17. Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 (2016)

  18. Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236 (2016)

  19. Ma, J., Gao, W., Wong, K.F.: Detect rumors on twitter by promoting information campaigns with generative adversarial learning. In: The World Wide Web Conference, pp. 3049–3055 (2019)

    Google Scholar 

  20. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)

  21. Miller, G.A.: Princeton university “about wordnet.” (2010). https://wordnet.princeton.edu/

  22. BBC News: Facebook bows to Singapore’s ‘fake news’ law with post ‘correction’, 30 November 2019. https://www.bbc.com/news/world-asia-50613341

  23. Ney, H., Essen, U., Kneser, R.: On structuring probabilistic dependences in stochastic language modelling. Comput. Speech Lang. 8(1), 1–38 (1994)

    Article  Google Scholar 

  24. Papernot, N., McDaniel, P., Swami, A., Harang, R.: Crafting adversarial input sequences for recurrent neural networks. In: MILCOM 2016–2016 IEEE Military Communications Conference, pp. 49–54. IEEE (2016)

    Google Scholar 

  25. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  26. Sharma, Y., Chen, P.Y.: Attacking the madry defense model with \( l\_1 \)-based adversarial examples. arXiv preprint arXiv:1710.10733 (2017)

  27. Szegedy, C., et al.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)

  28. Taheri, R., Javidan, R., Shojafar, M., Vinod, P., Conti, M.: Can machine learning model with static features be fooled: an adversarial machine learning approach. Cluster Comput. 23(4), 3233–3253 (2020). https://doi.org/10.1007/s10586-020-03083-5

    Article  Google Scholar 

  29. Vijayaraghavan, P., Roy, D.: Generating black-box adversarial examples for text classifiers using a deep reinforced model. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds.) ECML PKDD 2019. LNCS (LNAI), vol. 11907, pp. 711–726. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-46147-8_43

    Chapter  Google Scholar 

  30. Wang, X., Jin, H., He, K.: Natural language adversarial attacks and defenses in word level. arXiv preprint arXiv:1909.06723 (2019)

  31. Zubiaga, A., Liakata, M., Procter, R., Wong Sak Hoi, G., Tolmie, P.: Analysing how people orient to and spread rumours in social media by looking at conversational threads. PloS One 11(3), e0150989 (2016)

    Google Scholar 

Download references

Acknowledgments

This work was supported by the ICT R&D Programs (no. 2017-0-00545) and the ITRC Support Program (IITP-2019-2015-0-00403).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hyoungshick Kim .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ram Vinay, A., Alawami, M.A., Kim, H. (2021). ConTheModel: Can We Modify Tweets to Confuse Classifier Models?. In: Park, Y., Jadav, D., Austin, T. (eds) Silicon Valley Cybersecurity Conference. SVCC 2020. Communications in Computer and Information Science, vol 1383. Springer, Cham. https://doi.org/10.1007/978-3-030-72725-3_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-72725-3_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-72724-6

  • Online ISBN: 978-3-030-72725-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics