Skip to main content

A Bayesian Multi-armed Bandit Approach for Identifying Human Vulnerabilities

  • Conference paper
  • First Online:
Decision and Game Theory for Security (GameSec 2018)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11199))

Included in the following conference series:

  • 2012 Accesses

Abstract

We consider the problem of identifying the set of users in an organization’s network that are most susceptible to falling victim to social engineering attacks. To achieve this goal, we propose a testing strategy, based on the theory of multi-armed bandits, that involves a system administrator sending fake malicious messages to users in a sequence of unannounced tests and recording their responses. To accurately model the administrator’s testing problem, we propose a new bandit setting, termed the structured combinatorial multi-bandit model, that allows one to impose combinatorial constraints on the space of allowable queries. The model captures the diversity in attack types and user responses by considering multiple multi-armed bandits, where each bandit problem represents an attack (message) type and each arm represents a user. Users respond to test messages according to a response model with unknown statistics. The response model associates a Bernoulli distribution with an unknown mean with each message-user pair, dictating the likelihood that a user will respond to a given message. The administrator’s problem of identifying the most susceptible users can then be expressed as identifying the set of message-user pairs with means that exceed a given threshold. We adopt a Bayesian approach to solving the problem, associating a (beta) prior distribution with each unknown mean. In a given trial, the system administrator queries a selection of users with test messages, generating query responses which are then used to update posterior distributions on the means. By defining a state as the parameters of the posteriors, we show that the optimal testing strategy can be characterized as the solution of a Markov decision process (MDP). Unfortunately, solving the MDP is computationally intractable. As a result, we propose a heuristic testing strategy, based on Thompson sampling, that focuses queries on message-user pairs that are estimated to have means close to the threshold. The heuristic testing strategy is shown to yield accurate identifications.

This research was supported by the U.S. Office of Naval Research (ONR) MURI grant N00014-16-1-2710.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note that thresholds can depend on the specific message-user pair (mk); however, for ease of presentation, we assume identical thresholds \(\tau \) across all (mk).

References

  1. Anantharam, V., Varaiya, P., Walrand, J.: Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays - Part I: IID rewards. IEEE Trans. Autom. Control 32(11), 968–976 (1987)

    Article  Google Scholar 

  2. Audibert, J.Y., Bubeck, S., Munos, R.: Best arm identification in multi-armed bandits. In: Proceedings of the 23rd Annual Conference on Learning Theory, pp. 41–53 (2010)

    Google Scholar 

  3. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)

    Article  Google Scholar 

  4. Bubeck, S., Munos, R., Stoltz, G.: Pure exploration in multi-armed bandits problems. In: Gavaldà, R., Lugosi, G., Zeugmann, T., Zilles, S. (eds.) ALT 2009. LNCS (LNAI), vol. 5809, pp. 23–37. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04414-4_7

    Chapter  Google Scholar 

  5. Bubeck, S., Wang, T., Viswanathan, N.: Multiple identifications in multi-armed bandits. In: Proceedings of the 30th International Conference on International Conference on Machine Learning, pp. 258–265 (2013)

    Google Scholar 

  6. Bullée, J.W.H., Montoya, L., Pieters, W., Junger, M., Hartel, P.H.: The persuasion and security awareness experiment: reducing the success of social engineering attacks. J. Exp. Criminol. 11(1), 97–115 (2015)

    Article  Google Scholar 

  7. Chen, S., Lin, T., King, I., Lyu, M.R., Chen, W.: Combinatorial pure exploration of multi-armed bandits. In: Advances in Neural Information Processing Systems, pp. 379–387 (2014)

    Google Scholar 

  8. Cialdini, R.B.: Influence: Science and Practice, vol. 4. Pearson Education, Boston (2009)

    Google Scholar 

  9. Crossler, R.E., et al.: Future directions for behavioral information security research. Comput. Secur. 32, 90–101 (2013)

    Article  Google Scholar 

  10. Dodge Jr., R.C., Carver, C., Ferguson, A.J.: Phishing for user security awareness. Comput. Secur. 26(1), 73–80 (2007)

    Article  Google Scholar 

  11. Frazier, P.I.: Learning with dynamic programming. In: Wiley Encyclopedia of Operations Research and Management Science, pp. 1–13. Wiley, New York (2010)

    Google Scholar 

  12. Gabillon, V., Ghavamzadeh, M., Lazaric, A., Bubeck, S.: Multi-bandit best arm identification. In: Advances in Neural Information Processing Systems, pp. 2222–2230 (2011)

    Google Scholar 

  13. Gittins, J., Glazebrook, K., Weber, R.: Multi-Armed Bandit Allocation Indices. Wiley, Hoboken (2011)

    Book  Google Scholar 

  14. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)

    Google Scholar 

  15. Heartfield, R., Loukas, G.: A taxonomy of attacks and a survey of defence mechanisms for semantic social engineering attacks. ACM Comput. Surv. 48(3), 37:1–37:39 (2016)

    Google Scholar 

  16. Hoffman, M., Shahriari, B., Freitas, N.: On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning. In: Artificial Intelligence and Statistics, pp. 365–374 (2014)

    Google Scholar 

  17. Jun, K.S., Jamieson, K.G., Nowak, R.D., Zhu, X.: Top arm identification in multi-armed bandits with batch arm pulls. In: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pp. 139–148 (2016)

    Google Scholar 

  18. Karp, D.B.: Normalized incomplete beta function: log-concavity in parameters and other properties. J. Math. Sci. 217(1), 91–107 (2016)

    Article  MathSciNet  Google Scholar 

  19. Kocsis, L., Szepesvári, C.: Bandit based monte-carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29

    Chapter  Google Scholar 

  20. Krebs, B.: Target hackers broke in via HVAC company. https://krebsonsecurity.com/2014/02/target-hackers-broke-in-via-hvac-company/. Accessed 05 Feb 2014

  21. Krombholz, K., Hobel, H., Huber, M., Weippl, E.: Advanced social engineering attacks. J. Inf. Secur. Appl. 22, 113–122 (2015)

    Google Scholar 

  22. Kumaraguru, P., Sheng, S., Acquisti, A., Cranor, L.F., Hong, J.: Teaching Johnny not to fall for phish. ACM Trans. Internet Technol. 10(2), 7 (2010)

    Article  Google Scholar 

  23. Lai, T.L., Robbins, H.: Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1), 4–22 (1985)

    Article  MathSciNet  Google Scholar 

  24. Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th International Conference on World Wide Web, pp. 661–670. ACM (2010)

    Google Scholar 

  25. Locatelli, A., Gutzeit, M., Carpentier, A.: An optimal algorithm for the thresholding bandit problem. Proceedings of The 33rd International Conference on Machine Learning, pp. 1690–1698 (2016)

    Google Scholar 

  26. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)

    Book  Google Scholar 

  27. Reeves, J.: Yes, it’s bad. Robocalls, and their scams, are surging. https://www.nytimes.com/2018/05/06/your-money/robocalls-rise-illegal.html. Accessed 20 May 2018

  28. Robbins, H.: Some aspects of the sequential design of experiments. Bull. Am. Math. Soc. 58(5), 527–535 (1952)

    Article  MathSciNet  Google Scholar 

  29. Russo, D.: Simple Bayesian algorithms for best arm identification. In: Conference on Learning Theory, pp. 1417–1418 (2016)

    Google Scholar 

  30. Schneier, B.: Inside risks: semantic network attacks. Commun. ACM 43(12), 168–168 (2000)

    Article  Google Scholar 

  31. Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4), 285–294 (1933)

    Article  Google Scholar 

  32. Wu, Y., Gyorgy, A., Szepesvári, C.: On identifying good options under combinatorially structured feedback in finite noisy environments. In: International Conference on Machine Learning, pp. 1283–1291 (2015)

    Google Scholar 

  33. Zetter, K.: Inside the cunning, unprecedented hack of Ukraine’s power grid. https://www.wired.com/2016/03/inside-cunning-unprecedented-hack-ukraines-power-grid/. Accessed 03 Mar 2016

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Erik Miehling .

Editor information

Editors and Affiliations

A Proof of Lemma 1

A Proof of Lemma 1

Denoting \(\mathbb {E}_n^\pi [J(\varTheta ,P;\tau )]:=E_{\varTheta \sim f_n(\theta _{mk})}[J(\varTheta ,P;\tau )\mid P=P^\pi ]\) as the expectation of the reward with respect to the posteriors \(f_n(\theta _{mk})\), application of the law of iterated expectations allows one to write the expected reward as \(\mathbb {E}_0^\pi [J(\varTheta ,P;\tau )] = \mathbb {E}_0^\pi [\mathbb {E}_n^\pi [J(\varTheta ,P;\tau )]]\), where

$$\begin{aligned} \mathbb {E}_n^\pi \big [J(\varTheta ,P;\tau )\big ]&=\mathbb {P}_n\bigg (\bigcap _{(m,k)\in P^\pi }\{\varTheta _{mk}>\tau \}\cap \bigcap _{(m,k)\in \bar{P}^\pi }\{\varTheta _{mk}\le \tau \}\bigg )\\&= \prod _{(m,k)\in P^\pi }\mathbb {P}_n(\{\varTheta _{mk}>\tau \})\prod _{(m,k)\in \bar{P}^\pi }\mathbb {P}_n(\{\varTheta _{mk}\le \tau \})\\&= \prod _{(m,k)\in P^\pi }I_{1-\tau }(\beta _{mk,n},\alpha _{mk,n})\prod _{(m,k)\in \bar{P}^\pi }I_{\tau }(\alpha _{mk,n},\beta _{mk,n}) =: J^\pi (P;\tau ) \end{aligned}$$

where \(I_{\tau }(\alpha ,\beta )\) is the normalized incomplete beta function (we have used the identity \(1-I_{\tau }(\alpha ,\beta ) \equiv I_{1-\tau }(\beta ,\alpha )\)). The dependency of the identification set P on the testing strategy \(\pi \) is made explicit by writing \(P^\pi \).

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Miehling, E., Xiao, B., Poovendran, R., Başar, T. (2018). A Bayesian Multi-armed Bandit Approach for Identifying Human Vulnerabilities. In: Bushnell, L., Poovendran, R., Başar, T. (eds) Decision and Game Theory for Security. GameSec 2018. Lecture Notes in Computer Science(), vol 11199. Springer, Cham. https://doi.org/10.1007/978-3-030-01554-1_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-01554-1_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-01553-4

  • Online ISBN: 978-3-030-01554-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics