1 Introduction

Clustering is one of the most popular techniques in data analysis, but also inherently subjective [3]: different users might prefer very different clusterings, depending on their goals and background knowledge. Semi-supervised methods deal with this by allowing the user to define constraints that express their subjective interests [4]. Often, these constraints are obtained by querying the user with questions of the following type: Should these two instances be in the same cluster? Answering “yes” results in a must-link constraint, “no” in a cannot-link.

In this paper we present an interactive clustering system that exploits such constraints. The system is based on COBRASTS [2], a recently proposed method for semi-supervised clustering of time series. COBRASTS is suitable for interactive clustering as it combines the following three characteristics: (1) it can present the best clustering obtained so far at any time, allowing the user to inspect intermediate results (2) it is query-efficient, which means that a good clustering is obtained with only a small number of queries (3) it is time-efficient, so the user does not have to wait long between queries. Given small amounts of supervision COBRASTS has been shown to produce clusterings of much better quality compared to those obtained with unsupervised alternatives [2].

By making our tool readily available and easy to use, we offer any practitioner interested in analyzing time series data the opportunity to exploit the benefits of interactive time series clustering.

Fig. 1.
figure 1

Screenshot of the web application.

2 System Description

The graphical user interface of the system is shown in Fig. 1. On the top left the full dataset is shown, as a plot with all time series stacked on top of each other. On the top right, the system shows the querying interface. It presents the instances for which the pairwise relation is being queried, and two buttons that the user can click to indicate that these two instances should (not) be in the same cluster. On the bottom, the system shows the intermediate clustering. This clustering is updated after every couple of queries. The main loop that is executed is illustrated in Fig. 2(a): the system repeatedly queries several pairwise relations and uses the resulting constraints to improve the clustering, until the user is satisfied with the produced clustering.

Each time an updated clustering is presented, the user can optionally indicate that a cluster is either pure, or pure and complete. If a cluster is indicated as being pure, the system will no longer try to refine this cluster. It is still possible, however, that other instances will be added to it. If a cluster is indicated as being pure and complete, the system will no longer consider this cluster in the querying process: it will not be refined, and other instances can no longer be added to it. This form of interaction between the user and the clustering system (i.e. indicating the purity and/or completeness of a cluster) was not considered in the original COBRASTS method, but experimentation with the graphical user interface showed that that it helps to reduce the number of pairwise queries that is needed to obtain a satisfactory clustering.

Fig. 2.
figure 2

(a) The interactive clustering loop. (b) Demonstration of clustering improvement as queries are answered.

The COBRASTS system is implemented as a web application that is run locally. It is open source and available onlineFootnote 1. It is also available on PyPI, allowing installation with a single commandFootnote 2.

3 Example Run

Figure 2(b) shows the sequence of clusterings that is generated by the application for a sample of the CBF dataset [1]. It starts from a single cluster that contains all instances. After the user has answered two pairwise queries, the system presents an updated clustering containing two clusters. The first cluster contains mainly upward clusters, whereas the second cluster contains a mixture of downward and horizontal patterns. As this clustering is not satisfactory yet, more pairwise queries are answered. After 8 more queries, the system again presents an improved clustering. This time, the clustering clearly separates three distinct patterns (upward, horizontal and downward). While distinguishing between these three types of patterns is easy for a user, it is difficult for most existing clustering systems; none of COBRASTS’s competitors is able to produce a clustering that clearly separates these patterns [2].

4 Conclusion

The proposed demo will present a readily available and easy-to-use web application for interactive time series clustering. Internally, it makes use of the recently developed COBRASTS approach. The application enables users to exploit minimal supervision to get clusterings that are significantly better than those obtained with traditional approaches.