Skip to main content

Data Modelling for Predicting Exploits

  • Conference paper
  • First Online:
Secure IT Systems (NordSec 2018)

Abstract

Modern society is becoming increasingly reliant on secure computer systems. Predicting which vulnerabilities are more likely to be exploited by malicious actors is therefore an important task to help prevent cyber attacks. Researchers have tried making such predictions using machine learning. However, recent research has shown that the evaluation of such models require special sampling of training and test sets, and that previous models would have had limited utility in real world settings. This study further develops the results of recent research through the use of their sampling technique for evaluation in combination with a novel data model. Moreover, contrary to recent research, we find that using open web data can help in making better predictions about exploits, and that zero-day exploits are detrimental to the predictive powers of the model. Finally, we discovered that the initial days of vulnerability information is sufficient to make the best possible model. Given our findings, we suggest that more research should be devoted to develop refined techniques for building predictive models for exploits. Gaining more knowledge in this domain would not only help preventing cyber attacks but could yield fruitful insights in the nature of exploit development.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This percentage is estimated from Fig. 5 in their report [3].

  2. 2.

    The \(\varDelta \) was computed from their reported class percentage of their test set which was \(16.7\%\) in their random split experiment and \(9.3\%\) in their temporally split model.

References

  1. Allodi, L., Massacci, F.: Comparing vulnerability severity and exploits using case-control studies. ACM Trans. Inf. Syst. Secur. 17(1), 1:1–1:20 (2014). https://doi.org/10.1145/2630069

    Article  Google Scholar 

  2. Bozorgi, M., Saul, L.K., Savage, S., Voelker, G.M.: Beyond heuristics: learning to classify vulnerabilities and predict exploits. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2010, pp. 105–114. ACM, New York (2010). http://doi.acm.org/10.1145/1835804.1835821

  3. Bullough, B.L., Yanchenko, A.K., Smith, C.L., Zipkin, J.R.: Predicting exploitation of disclosed software vulnerabilities using open-source data. In: Proceedings of the 3rd ACM on International Workshop on Security and Privacy Analytics, IWSPA 2017, pp. 45–53. ACM, New York (2017). http://doi.acm.org/10.1145/3041008.3041009

  4. Chen, T., He, T., Benesty, M., et al.: Xgboost: extreme gradient boosting. R package version 0.4-2, pp. 1–4 (2015)

    Google Scholar 

  5. Edkrantz, M., Said, A.: Predicting cyber vulnerability exploits with machine learning. In: SCAI (2015)

    Google Scholar 

  6. Exploit-DB Offensive Securitys Exploit Database Archive. https://www.exploit-db.com/. Accessed 24 Aug 2017

  7. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001). http://www.jstor.org/stable/2699986

    Article  MathSciNet  Google Scholar 

  8. Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. (CSUR) 46(4), 44 (2014)

    Article  Google Scholar 

  9. National Vulnerability Database Computer Security Resource Center. https://nvd.nist.gov/. Accessed 24 Aug 2017

  10. Recorded Future’s threat intelligence platform

    Google Scholar 

  11. Roytman, M.: Quick Look: Predicting Exploitability, Forecasts for Vulnerability Management (2018). https://www.rsaconference.com/videos/quick-look-predicting-exploitabilityforecasts-for-vulnerability-management

  12. Sabottke, C., Suciu, O., Dumitras, T.: Vulnerability disclosure in the age of social media: exploiting twitter for predicting real-world exploits. In: 24th USENIX Security Symposium. USENIX Association, Washington, D.C. (2015)

    Google Scholar 

  13. Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23(1), 69–101 (1996)

    Google Scholar 

Download references

Acknowledgements

The research leading to these results has been partially supported by the Swedish Civil Contingencies Agency (MSB) through the project “RICS” and by the European Community’s Horizon 2020 Framework Programme through the UNITED-GRID project under grant agreement 773717.

We would also like to thank Staffan Truvé and Michel Edkrantz at Recorded Future for inspiration, access to data and the environment to perform the current study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Magnus Almgren .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Reinthal, A., Filippakis, E.L., Almgren, M. (2018). Data Modelling for Predicting Exploits. In: Gruschka, N. (eds) Secure IT Systems. NordSec 2018. Lecture Notes in Computer Science(), vol 11252. Springer, Cham. https://doi.org/10.1007/978-3-030-03638-6_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-03638-6_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-03637-9

  • Online ISBN: 978-3-030-03638-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics