A Bayesian Multi-armed Bandit Approach for Identifying Human Vulnerabilities

Miehling, Erik; Xiao, Baicen; Poovendran, Radha; Başar, Tamer

doi:10.1007/978-3-030-01554-1_30

Erik Miehling¹⁶,
Baicen Xiao¹⁷,
Radha Poovendran¹⁷ &
…
Tamer Başar¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11199))

Included in the following conference series:

International Conference on Decision and Game Theory for Security

2012 Accesses

Abstract

We consider the problem of identifying the set of users in an organization’s network that are most susceptible to falling victim to social engineering attacks. To achieve this goal, we propose a testing strategy, based on the theory of multi-armed bandits, that involves a system administrator sending fake malicious messages to users in a sequence of unannounced tests and recording their responses. To accurately model the administrator’s testing problem, we propose a new bandit setting, termed the structured combinatorial multi-bandit model, that allows one to impose combinatorial constraints on the space of allowable queries. The model captures the diversity in attack types and user responses by considering multiple multi-armed bandits, where each bandit problem represents an attack (message) type and each arm represents a user. Users respond to test messages according to a response model with unknown statistics. The response model associates a Bernoulli distribution with an unknown mean with each message-user pair, dictating the likelihood that a user will respond to a given message. The administrator’s problem of identifying the most susceptible users can then be expressed as identifying the set of message-user pairs with means that exceed a given threshold. We adopt a Bayesian approach to solving the problem, associating a (beta) prior distribution with each unknown mean. In a given trial, the system administrator queries a selection of users with test messages, generating query responses which are then used to update posterior distributions on the means. By defining a state as the parameters of the posteriors, we show that the optimal testing strategy can be characterized as the solution of a Markov decision process (MDP). Unfortunately, solving the MDP is computationally intractable. As a result, we propose a heuristic testing strategy, based on Thompson sampling, that focuses queries on message-user pairs that are estimated to have means close to the threshold. The heuristic testing strategy is shown to yield accurate identifications.

This research was supported by the U.S. Office of Naval Research (ONR) MURI grant N00014-16-1-2710.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Note that thresholds can depend on the specific message-user pair (m, k); however, for ease of presentation, we assume identical thresholds $\tau $ across all (m, k).

References

Anantharam, V., Varaiya, P., Walrand, J.: Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays - Part I: IID rewards. IEEE Trans. Autom. Control 32(11), 968–976 (1987)
Article Google Scholar
Audibert, J.Y., Bubeck, S., Munos, R.: Best arm identification in multi-armed bandits. In: Proceedings of the 23rd Annual Conference on Learning Theory, pp. 41–53 (2010)
Google Scholar
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)
Article Google Scholar
Bubeck, S., Munos, R., Stoltz, G.: Pure exploration in multi-armed bandits problems. In: Gavaldà, R., Lugosi, G., Zeugmann, T., Zilles, S. (eds.) ALT 2009. LNCS (LNAI), vol. 5809, pp. 23–37. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04414-4_7
Chapter Google Scholar
Bubeck, S., Wang, T., Viswanathan, N.: Multiple identifications in multi-armed bandits. In: Proceedings of the 30th International Conference on International Conference on Machine Learning, pp. 258–265 (2013)
Google Scholar
Bullée, J.W.H., Montoya, L., Pieters, W., Junger, M., Hartel, P.H.: The persuasion and security awareness experiment: reducing the success of social engineering attacks. J. Exp. Criminol. 11(1), 97–115 (2015)
Article Google Scholar
Chen, S., Lin, T., King, I., Lyu, M.R., Chen, W.: Combinatorial pure exploration of multi-armed bandits. In: Advances in Neural Information Processing Systems, pp. 379–387 (2014)
Google Scholar
Cialdini, R.B.: Influence: Science and Practice, vol. 4. Pearson Education, Boston (2009)
Google Scholar
Crossler, R.E., et al.: Future directions for behavioral information security research. Comput. Secur. 32, 90–101 (2013)
Article Google Scholar
Dodge Jr., R.C., Carver, C., Ferguson, A.J.: Phishing for user security awareness. Comput. Secur. 26(1), 73–80 (2007)
Article Google Scholar
Frazier, P.I.: Learning with dynamic programming. In: Wiley Encyclopedia of Operations Research and Management Science, pp. 1–13. Wiley, New York (2010)
Google Scholar
Gabillon, V., Ghavamzadeh, M., Lazaric, A., Bubeck, S.: Multi-bandit best arm identification. In: Advances in Neural Information Processing Systems, pp. 2222–2230 (2011)
Google Scholar
Gittins, J., Glazebrook, K., Weber, R.: Multi-Armed Bandit Allocation Indices. Wiley, Hoboken (2011)
Book Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Google Scholar
Heartfield, R., Loukas, G.: A taxonomy of attacks and a survey of defence mechanisms for semantic social engineering attacks. ACM Comput. Surv. 48(3), 37:1–37:39 (2016)
Google Scholar
Hoffman, M., Shahriari, B., Freitas, N.: On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning. In: Artificial Intelligence and Statistics, pp. 365–374 (2014)
Google Scholar
Jun, K.S., Jamieson, K.G., Nowak, R.D., Zhu, X.: Top arm identification in multi-armed bandits with batch arm pulls. In: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pp. 139–148 (2016)
Google Scholar
Karp, D.B.: Normalized incomplete beta function: log-concavity in parameters and other properties. J. Math. Sci. 217(1), 91–107 (2016)
Article MathSciNet Google Scholar
Kocsis, L., Szepesvári, C.: Bandit based monte-carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29
Chapter Google Scholar
Krebs, B.: Target hackers broke in via HVAC company. https://krebsonsecurity.com/2014/02/target-hackers-broke-in-via-hvac-company/. Accessed 05 Feb 2014
Krombholz, K., Hobel, H., Huber, M., Weippl, E.: Advanced social engineering attacks. J. Inf. Secur. Appl. 22, 113–122 (2015)
Google Scholar
Kumaraguru, P., Sheng, S., Acquisti, A., Cranor, L.F., Hong, J.: Teaching Johnny not to fall for phish. ACM Trans. Internet Technol. 10(2), 7 (2010)
Article Google Scholar
Lai, T.L., Robbins, H.: Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1), 4–22 (1985)
Article MathSciNet Google Scholar
Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th International Conference on World Wide Web, pp. 661–670. ACM (2010)
Google Scholar
Locatelli, A., Gutzeit, M., Carpentier, A.: An optimal algorithm for the thresholding bandit problem. Proceedings of The 33rd International Conference on Machine Learning, pp. 1690–1698 (2016)
Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)
Book Google Scholar
Reeves, J.: Yes, it’s bad. Robocalls, and their scams, are surging. https://www.nytimes.com/2018/05/06/your-money/robocalls-rise-illegal.html. Accessed 20 May 2018
Robbins, H.: Some aspects of the sequential design of experiments. Bull. Am. Math. Soc. 58(5), 527–535 (1952)
Article MathSciNet Google Scholar
Russo, D.: Simple Bayesian algorithms for best arm identification. In: Conference on Learning Theory, pp. 1417–1418 (2016)
Google Scholar
Schneier, B.: Inside risks: semantic network attacks. Commun. ACM 43(12), 168–168 (2000)
Article Google Scholar
Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4), 285–294 (1933)
Article Google Scholar
Wu, Y., Gyorgy, A., Szepesvári, C.: On identifying good options under combinatorially structured feedback in finite noisy environments. In: International Conference on Machine Learning, pp. 1283–1291 (2015)
Google Scholar
Zetter, K.: Inside the cunning, unprecedented hack of Ukraine’s power grid. https://www.wired.com/2016/03/inside-cunning-unprecedented-hack-ukraines-power-grid/. Accessed 03 Mar 2016

Download references

Author information

Authors and Affiliations

Coordinated Science Lab, University of Illinois at Urbana–Champaign, Urbana, IL, 61801, USA
Erik Miehling & Tamer Başar
Department of Electrical Engineering, University of Washington, Seattle, WA, 98195, USA
Baicen Xiao & Radha Poovendran

Authors

Erik Miehling
View author publications
You can also search for this author in PubMed Google Scholar
Baicen Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Radha Poovendran
View author publications
You can also search for this author in PubMed Google Scholar
Tamer Başar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Erik Miehling .

Editor information

Editors and Affiliations

University of Washington, Seattle, WA, USA
Linda Bushnell
University of Washington, Seattle, WA, USA
Radha Poovendran
University of Illinois at Urbana–Champaign, Urbana, IL, USA
Tamer Başar

A Proof of Lemma 1

Denoting $\mathbb {E}_n^\pi [J(\varTheta ,P;\tau )]:=E_{\varTheta \sim f_n(\theta _{mk})}[J(\varTheta ,P;\tau )\mid P=P^\pi ]$ as the expectation of the reward with respect to the posteriors $f_n(\theta _{mk})$, application of the law of iterated expectations allows one to write the expected reward as $\mathbb {E}_0^\pi [J(\varTheta ,P;\tau )] = \mathbb {E}_0^\pi [\mathbb {E}_n^\pi [J(\varTheta ,P;\tau )]]$, where

$$\begin{aligned} \mathbb {E}_n^\pi \big [J(\varTheta ,P;\tau )\big ]&=\mathbb {P}_n\bigg (\bigcap _{(m,k)\in P^\pi }\{\varTheta _{mk}>\tau \}\cap \bigcap _{(m,k)\in \bar{P}^\pi }\{\varTheta _{mk}\le \tau \}\bigg )\\&= \prod _{(m,k)\in P^\pi }\mathbb {P}_n(\{\varTheta _{mk}>\tau \})\prod _{(m,k)\in \bar{P}^\pi }\mathbb {P}_n(\{\varTheta _{mk}\le \tau \})\\&= \prod _{(m,k)\in P^\pi }I_{1-\tau }(\beta _{mk,n},\alpha _{mk,n})\prod _{(m,k)\in \bar{P}^\pi }I_{\tau }(\alpha _{mk,n},\beta _{mk,n}) =: J^\pi (P;\tau ) \end{aligned}$$

where $I_{\tau }(\alpha ,\beta )$ is the normalized incomplete beta function (we have used the identity $1-I_{\tau }(\alpha ,\beta ) \equiv I_{1-\tau }(\beta ,\alpha )$). The dependency of the identification set P on the testing strategy $\pi $ is made explicit by writing $P^\pi $.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Miehling, E., Xiao, B., Poovendran, R., Başar, T. (2018). A Bayesian Multi-armed Bandit Approach for Identifying Human Vulnerabilities. In: Bushnell, L., Poovendran, R., Başar, T. (eds) Decision and Game Theory for Security. GameSec 2018. Lecture Notes in Computer Science(), vol 11199. Springer, Cham. https://doi.org/10.1007/978-3-030-01554-1_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-01554-1_30
Published: 26 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01553-4
Online ISBN: 978-3-030-01554-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Bayesian Multi-armed Bandit Approach for Identifying Human Vulnerabilities

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Proof of Lemma 1

A Proof of Lemma 1

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation