Keywords

1 Introduction

Since Joseph Weizenbaum [27] introduced ELIZA, the first computer program that interacts with users in a natural language, in 1966, humanlike communication with a machine has been of growing interest, leading to improvements, e.g., see [14, 26] and finally to chatbots, which rely on artificial intelligence for emulating natural conversation with humans. Whereas some chatbots realize conversation using pre-specified patterns, others make use of machine learning techniques [19, 23]. These intelligent chatbots provide the user a more personalized conversation by remembering and reusing specific information from previous conversations. Chatbots have also gained a lot of interest from industry. The evolution of these systems over the years has been analyzed and there are predictions about the rise of the chatbot market in the future [5, 7].

Like any other technology also chatbots do not come without drawbacks. Despite their intuitive novelty, chatbots are built upon existing technology. They are often integrated into online websites. Therefore, they rely on HTTP(S) and other existing communication protocols. Smart chatbots are connected to databases, thereby performing SQL queries. Data integrity and privacy, as well as user authentication and authorization must be ensured to the clients, especially by personalized chatbots. If a chatbot fails in this task, data leaks can compromise the user’s privacy and may lead to financial losses. Because of these facts, it is very likely that even chatbots become a target for attackers, where known vulnerabilities and attacks, like cross-site scripting (XSS) and SQL injections (SQLI), can be exploited. Therefore, it is inevitable to cover security issues when testing chatbots as well.

In this paper, we discuss the issue of security testing for chatbots, where we describe an automated approach for the detection of intrinsic software leaks in order to prevent their exploitation. We do not test the chatbots’ performance nor their functionality, e.g. natural language processing, or ask what the underlying machinery should be allowed to do [21]. We solely focus on security testing. The result is an offensive testing approach that targets two very common exploitations, namely XSS and SQLI, which has – to the best of our knowledge – not been considered before in the context of chatbots.

The paper is organized as follows. Section 2 gives an overview about the overall testing approach. Then, Section 3 explains a concrete example and discusses the outcome. Section 4 enumerates related work, whereas the paper is concluded in Section 5.

2 Overview of the approach

When designing chatbots, the primary focus lies on the processing of natural language. There the developers must take into consideration the correct understanding and answering of the user’s inquiries. In addition, the system should be able to handle errors and unexpected inputs appropriately [6]. Existing tools [1, 2, 11] primarily target the system’s functionality but do not guarantee security. The chatbot can fulfil its functional requirements but still remain vulnerable to malicious actions. The still open challenge is to test chatbots regarding their resistance to unintended and malicious user inputs. Figure 1 depicts an overall structure of an online system comprising a chatbot.

In this example, a chatbot is set up online and the communication proceeds accordingly to the standard HTTP(S) protocol. We further assume that the chatbot is connected to a database comprising client-related private information. A smart chatbot would be able to increase the amount of information about a user during communication. Therefore, user authentication must be guaranteed as well as the integrity of all stored information. [9, 15, 17] showed that several web vulnerabilities can be exploited due to security leaks in systems. For example, the vulnerabilities SQLI and XSS can be triggered because of insufficient input sanitization. The consequence can be unauthorized database access or malicious script execution on side of the client, which has to be avoided.

Fig. 1.
figure 1

Communication flow in the chatbot system

For our approach to chatbot security testing, we rely on an adapted execution framework and test oracles for both types of vulnerabilities from previous work [12]. This framework comprises two test sets for XSS and SQLI, respectively. Both test sets consist of a list of individual malicious inputs, called attack vectors. In case of XSS, the list encompasses JavaScript code, and for SQLI, a list of SQL statements is used in the tests. The test inputs are sent sequentially to the chatbot, i.e., the system under test (SUT). The resulting outputs from the chatbot are read and checked against the test oracles, and a test verdict is given back as a result. Figure 2 depicts the overall approach.

Fig. 2.
figure 2

Security testing approach for chatbots

The framework is implemented in Java and comprises several elements. Both test sets are attached to an executor. Then, every attack vector is put into generated HTTP requests individually and sent against the SUT. A HTML parser [8] reads the corresponding output in search for critical content that is needed by the test oracle. Finally, the testing procedure terminates when both test sets have been exhausted. Section 3 will describe the test scenario in more detail on an example.

3 Case Study

In this case study, we used the described approach for security testing a chatbot. Although several chatbots are developed and used by private companies, some of them are publicly available, e.g., [3, 4]. For this case study, we selected Program O, which is written in PHP and comprises a MySQL database [10], because it perfectly fits the structure provided in Figure 2. Program O makes use of conversational patterns that have to be specified in the Artificial Intelligence Markup Language (AIML) [25]. According to given patterns, the chatbot formulates its responses by analyzing the user’s provided keywords.

For security testing Program O we developed test suites for XSS and SQLI. Scripts for XSS have the following form:

figure a

For SQLI, we add SQL queries containing for example the following code:

figure b

When testing the chatbot, we make use of the provided input field used to communicate with humans. According to the mentioned test oracles from [12], we draw the conclusions for the obtained test verdicts.

When testing with the test suite for XSS, the parsed response from the SUT indicates that the script was not triggered. Unfortunately (for the attacker), critical parts, namely the <script> elements, were filtered out from the input string thus preventing its execution. The response HTML contains only a fraction of the original script, e.g. alert(document.cookie)">.

We obtained a similar result for SQLI attack related test input where the depicted attack vector is meant to retrieve data from the database. The chatbot’s HTTP response body shows evidence that this input has been filtered out as well (by escaping out the apostrophe ’, which is enough to prevent the execution).

Although, we were not able to successfully trigger a vulnerability for Program O, the testing framework at least showed evidence that Program O has no trivial security bugs. In addition, we showed that, the overall challenge of security testing of online chatbots can be reduced to general security testing for web applications. In both cases, the same challenges exist, namely how to define attack vectors and how to construct efficient detection mechanisms.

4 Related Work

To the best of our knowledge, there are no papers dealing with security testing of chatbots. There are papers describing methods and tools for testing functionality and usability (e.g. [24]), and also other papers considering testing of AI systems in general [18, 22]. In the general context of security testing there has been some publication dealing with testing against certain attacks like XSS and SQLI.

In [20] the authors present QED, a system that is based on goal-directed model checking for testing against XSS and SQLI. It uses a definition of the vulnerability to be tested and a set of input values for test case generation. QED targets on automated testing of Java web applications. There a model checker is used to generate attack vectors for the SUT via searching for candidates that are likely to detect a vulnerability.

Duchene et al. [15] present a testing tool for XSS that relies on fuzzing and model inference. The underlying method is a black-box fuzzer and makes us of a genetic algorithm with the help of an attack grammar. The work sets focus on the input generation of XSS attack vectors by applying mutation and crossover operators. A fitness function guides the choice of inputs for test case generation, which attack vectors are then executed against web applications.

In our previous work [12], we use visual depictions of attacks against web applications. There we specified attack patterns for XSS and SQLI that guide test execution. The result is an abstract state machine that offers a high degree of configurability and extendibility for black-box security testing purposes.

Other works that cover XSS and SQLI include [16] and [13].

5 Conclusion and Future Work

In this paper, we introduced a first version of a security testing approach to be applied to chatbots. We claim that the topic covers a challenge of growing interest and importance. This is due to growing interest of chatbots from industry (see [7]). Most interestingly, the scientific literature lacks solutions for the challenge of testing chatbots for vulnerabilities. In this paper, we briefly introduced a testing framework for security testing chatbots and discuss first results we obtained using an available chatbot implementation.

Although, we were not able to trigger vulnerabilities, we could show that the framework fits well its purpose. In addition, the tests raise evidence that the used chatbot is resistant to some common attack vectors. It is worth noting that the current test execution framework is not limited to XSS and SQLI attacks. In the future, we will extend testing chatbots against other vulnerabilities [9]. In addition, we want to further investigate on automated test case generation for security testing of chatbots.