Skip to main content

Collaborative Learning Based Effective Malware Detection System

  • Conference paper
  • First Online:
ECML PKDD 2020 Workshops (ECML PKDD 2020)

Abstract

Malware is overgrowing, causing severe loss to different institutions. The existing techniques, like static and dynamic analysis, fail to mitigate newly generated malware. Also, the signature, behavior, and anomaly-based defense mechanisms are susceptible to obfuscation and polymorphism attacks. With machine learning in practice, several authors proposed different classification and visualization techniques for malware detection. Images have proved worth analyzing the behavior of malware. Deep neural networks extract much information from it without having expert domain knowledge. On the other hand, the scarcity of diverse malware data available with clients, and their privacy concerns about sharing data with a centralized curator makes it challenging to build a more reliable model. This paper proposes a lightweight Convolution Neural Network (CNN) based model extracting relevant features using call graph, n-gram, and image transformations. Further, Auxiliary Classifier Generative Adversarial Network (AC-GAN) is used for generating unseen data for training purposes. The model is extended for federated setup to build an effective malware detection system. We have used the Microsoft malware dataset for training and evaluation. The result shows that the federated approach achieves the accuracy closer to centralized training while preserving data privacy at an individual organization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.kaggle.com/c/malware-classification/data.

  2. 2.

    https://github.com/tensorflow/federated/blob/master/docs/install.md.

  3. 3.

    Tensorflow federated. https://www.tensorflow.org/federated.

References

  1. Moser, A., Kruegel, C., Kirda, E.: Limits of static analysis for malware detection. In: Twenty-Third Annual Computer Security Applications Conference, ACSAC 2007, pp. 421–430, December 2007

    Google Scholar 

  2. Shijo, P., Salim, A.: Integrated static and dynamic analysis for malware detection. Proc. Comput. Sci. 46, 804–811 (2015)

    Article  Google Scholar 

  3. Carlin, D., Cowan, A., O’Kane, P., Sezer, S.: The effects of traditional anti-virus labels on malware detection using dynamic runtime opcodes. IEEE Access 5, 17 742–17 752 (2017)

    Google Scholar 

  4. Harel, D. (ed.): First-Order Dynamic Logic. LNCS, vol. 68. Springer, Heidelberg (1979). https://doi.org/10.1007/3-540-09237-4

    Book  MATH  Google Scholar 

  5. Pechaz, B., Jahan, M.V., Jalali, M.: Malware detection using hidden Markov model based on Markov blanket feature selection method. In: 2015 International Congress on Technology, Communication and Knowledge (ICTCK), pp. 558–563, November 2015

    Google Scholar 

  6. Liu, C., Zhang, Z., Wang, S.: An android malware detection approach using Bayesian inference. In: 2016 IEEE International Conference on Computer and Information Technology (CIT), pp. 476–483, December 2016

    Google Scholar 

  7. Rathore, H., Agarwal, S., Sahay, S.K., Sewak, M.: Malware detection using machine learning and deep learning. CoRR, vol. abs/1904.02441 (2019). http://arxiv.org/abs/1904.02441

  8. Yousefi-Azar, M., Varadharajan, V., Hamey, L., Tupakula, U.: Autoencoder-based feature learning for cyber security applications. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 3854–3861, May 2017

    Google Scholar 

  9. Zhao, Y., Xu, C., Bo, B., Feng, Y.: MalDeep: a deep learning classification framework against malware variants based on texture visualization. Secur. Commun. Netw. 2019, 1–12 (2019)

    Google Scholar 

  10. Lu, R.: Malware detection with LSTM using opcode language. ArXiv, vol. abs/1906.04593 (2019)

    Google Scholar 

  11. McMahan, H.B., Moore, E., Ramage, D., y Arcas, B.A.: Federated learning of deep networks using model averaging. CoRR, vol. abs/1602.05629 (2016). http://arxiv.org/abs/1602.05629

  12. Le, Q., Boydell, O., Namee, B.M., Scanlon, M.: Deep learning at the shallow end: malware classification for non-domain experts. CoRR, vol. abs/1807.08265 (2018). http://arxiv.org/abs/1807.08265

  13. Vasan, D., Alazab, M., Wassan, S., Naeem, H., Safaei, B., Zheng, Q.: IMCFN: image-based malware classification using fine-tuned convolutional neural network architecture. Comput. Netw. 107138 (2020). http://www.sciencedirect.com/science/article/pii/S1389128619304736

  14. Goodfellow, I., et al.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 2672–2680. Curran Associates Inc. (2014). http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf

  15. Kim, J.Y., Bu, S.J., Cho, S.B.: Malware detection using deep transferred generative adversarial networks. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, E.S. (eds.) ICONIP 2017. LNCS, vol. 10634, pp. 556–564. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70087-8_58

    Chapter  Google Scholar 

  16. Hu, W., Tan, Y.: Generating adversarial malware examples for black-box attacks based on GAN. CoRR, vol. abs/1702.05983 (2017). http://arxiv.org/abs/1702.05983

  17. Sewak, M., Sahay, S.K., Rathore, H.: An investigation of a deep learning based malware detection system. CoRR, vol. abs/1809.05888 (2018). http://arxiv.org/abs/1809.05888

  18. Shamir, O., Srebro, N., Zhang, T.: Communication efficient distributed optimization using an approximate newton-type method. CoRR, vol. abs/1312.7853 (2013). http://arxiv.org/abs/1312.7853

  19. Odena, A., Olah, C., Shlens, J.: Conditional image synthesis with auxiliary classifier GANs. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, Series Proceedings of Machine Learning Research, 06–11 Aug 2017, vol. 70, pp. 2642–2651. International Convention Centre. PMLR, Sydney, Australia. http://proceedings.mlr.press/v70/odena17a.html

  20. Jiang, H., Turki, T., Wang, J.T.L.: DLGraph: malware detection using deep learning and graph embedding. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 1029–1033 (2018)

    Google Scholar 

Download references

Acknowledgement

We acknowledge the Ministry of Human Resource Development, Government of India, for providing fellowship to complete this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Harsh Kasyap .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Singh, N., Kasyap, H., Tripathy, S. (2020). Collaborative Learning Based Effective Malware Detection System. In: Koprinska, I., et al. ECML PKDD 2020 Workshops. ECML PKDD 2020. Communications in Computer and Information Science, vol 1323. Springer, Cham. https://doi.org/10.1007/978-3-030-65965-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-65965-3_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-65964-6

  • Online ISBN: 978-3-030-65965-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics