1 Introduction

In contrast to a lot of data analysis methods where the goal is to describe all the data with one model, pattern mining focuses on information describing only parts of the data. However, in practice, the number of discovered patterns is huge and patterns have to be filtered or ranked according to additional quality criteria in order to be used by a data analyst. As surveyed by Vreeken and Tatti [26], there exist numerous methods for evaluating the interestingness of extracted patterns, e.g. based on simple measures or the use of statistical testing. However, it remains difficult to clearly identify the advantages and limitations of each approach.

How can one determine whether a data mining method extracts interesting patterns? How can one know and evaluate if a data mining method is better than another for a given task? Our work addresses these core questions. Completely answering those questions is clearly out of the scope of this (or any single) paper but we propose major improvements in these directions in the context of unsupervised problems with binary data.

Our goal is to propose an interestingness theory independent of any assumption about the data such as a model of the data or an expectation using statistical tests. Our key principle to assess the quality of a data mining method extracting a pattern X is to study the relationships between X and the other patterns when X is selected. Roughly speaking, the higher the number of necessary comparisons between X and the other patterns to select X, the higher the quality of the method. As an example, let us consider correlation measures such as the lift [26] and the productive itemset [27]. While calculating lift involves only the individual items contained in X to select X, the productive itemset involves all subsets of X. A pattern selected by the productive itemset must satisfy more tests and the productive itemset is a more effective selector for mining correlated itemsets. Our framework addresses methods to select patterns such as interestingness measures [25], the constraint-based pattern mining imposing constraints on a single pattern [20] or several patterns such as condensed representations of patterns [5] or \({top}\text {-}k\) patterns [9]. We call selector a data mining method providing patterns. The goal of our framework is to evaluate the quality of a selector and therefore the interestingness of the patterns extracted by the selector. For that purpose, we introduce the notions of supporter and opponent. A supporter Y of X is a pattern which increases the interestingness of X when only the support of Y increases while all other patterns’ support remains unchanged. In other words, when the support of Y increases, it raises the likelihood of X to be selected and therefore Y supports X to be selected. Analogously, an opponent Y of X is a pattern which decreases the interestingness of X when only the support of Y decreases. We show that the number of supporters and opponents and their relations with X (Y is a generalization or a specialization of X, Y and X are incomparable) provide meaningful information about the quality of the selector at hand. Notably, this approach evaluates the quality of a selector only based on the relationships between the patterns from the data without assuming a model or any hypothesis on the data.

This paper formalizes the relationships between patterns to evaluate the quality of a selector through the new notions of supporters and opponents. We present a typology of selectors defined by formal properties and based on two complementary criteria to evaluate and interpret the quality of a selector. This typology offers a global picture of selectors and clarifies their interests and limitations. Highlighting the kinds of patterns’ relationships required by a selector helps to compare selectors to each other. We quantify the quality of a selector via an evaluation complexity analysis based on its number of supporters and opponents. This analysis enables us to contrast the quality of a selector with its computing cost. Finally, we conduct an experimental study in the context of correlation measures to evaluate the quality of selectors according to their complexity.

This paper is structured as follows. Section 2 discusses related work. Section 3 introduces preliminaries and defines what a selector is. We present the key notions of supporters and opponents in Sect. 4 and the typology of selectors in Sect. 5. We continue with the analysis of the complexity of the selectors in Sect. 6. Section 7 provides an experimental study on the evaluation of the quality of a few selectors. We round up with discussion and conclusion in Sect. 8.

2 Related Work

As we focus on formal approaches on interestingness, experimental protocols to evaluate the quality of a method like rediscovery [29] or randomization [14] are out of the scope of this related work. The proposal of a general theory of interestingness was already indicated as a challenge for the past decade [8, 16].

Several approaches have been proposed in the literature to analyze pattern discovery methods. Regarding condensed representations of patterns, the size of a condensed representation is often used as an objective measure to assess its quality [5]. As a condensed representation based on closed patterns is always more compact than a condensed representation based on free (or key) patterns, closed patterns are deemed most interesting. However one of the most compact condensed representations – non-derivable itemsets (NDI) [4] – is little used. The semantics of NDI, which appears complex, may explain this unpopularity. In this paper, we propose a measure to formally identify the complexity of a selector (cf. Sect. 6). The survey of Vreeken and Tatti [26] presents interestingness measures on patterns by dividing them into two categories, absolute measures and advanced ones. An absolute measure is informally defined as follows: “score patterns using only the data at hand, without contrasting their calculations over the data to any expectation using statistical tests”. Advanced measures were introduced to limit redundancy in the results. They are based on statistical models (independence model, partition models, MaxEnt models) having different complexities. Our formalization of complexity classes based on relationships between patterns clarifies the distinction between absolute measures and advanced ones.

A lot of works [12, 18, 23, 24] propose axioms that should be satisfied by an interestingness measure for association rule in order that the measure is considered relevant. These methods state what should be the expected variations of a well-behaved measure under certain conditions (e.g., when the frequency of the body or the head of the association rule increases). More recently, these works were extended to itemsets [15, 27] but only by considering their subsets. Our proposal systematizes this approach by taking into account all patterns of the lattice. Besides, the axioms previously introduced in the literature are mainly focused on correlation measures and there are not such axioms for constraints. In this paper, we generalize these principles to constraints (cf. Sect. 5).

There are very few attempts to define interactions between patterns when evaluating an interestingness measure. The concept of global constraints has been informally defined in [6]. This notion has been formalized in [13] by defining a relational algebra extended to pattern discovery. Our framework provides a broader and more precise formal definition, especially to better analyze the interrelationships between the patterns.

3 Preliminaries

Let \(\mathcal{I} \) be a set of distinct literals called items, an itemset (or pattern) is a subset of \(\mathcal{I} \). The language of itemsets corresponds to \(\mathcal{L} = 2^{\mathcal{I}}\). A transactional dataset is a multi-set of itemsets of \(\mathcal{L} \). Each itemset, usually called transaction, is a dataset entry. For instance, Table 2 gives three transactional datasets with 5 transactions \(t_1, \dots , t_5\) each described by 3 items A, B and C. Note that the transaction \(t_5\) in \(\mathcal{D} _1\) is empty. \(\mathcal{D} \) denotes a dataset and \(\varDelta \) all datasets. The frequency of an itemset X, denoted by , is the number of transactions of \(\mathcal{D} \) containing X. For simplicity, we write when there is no ambiguity.

Constraint-based pattern mining [19] aims at enumerating all patterns occurring at least once in a dataset \(\mathcal{D} \) and satisfying a user-defined selection predicate q. A well-known example is the minimal support constraint, based on the frequency measure, which provides the patterns having a support greater than a given minimal threshold. Despite the filtering performed by a constraint, the collection of mined patterns is often too large to be managed and interestingness measures are additionally used to rank patterns and focus on the most relevant ones. There are numerous measures [24], several of which (support, bond, lift, all-confidence) are given in Table 1. In this paper, we consider constraint-based pattern mining imposing constraints on a single pattern or several patterns such as condensed representations of patterns or top-k patterns. Table 1 depicts several examples of constraints. The productive itemset is here defined as a constraint.

Table 1. Itemset mining approaches based on frequency

Let \(\mathbb {S}\) be a poset. We formally define the notion of selector as follows.

Definition 1

(Interestingness Selector). An interestingness selector \(s\) is a function defined from \(\mathcal{L} \times \varDelta \) to \(\mathbb {S}\) that increases when X is more interesting.

\(\mathbb {S}\) is the set of reals \(\mathbb {R}\) if the selector is an interestingness measureFootnote 1 and booleans \(\mathbb {B}\) (i.e. true or false) with the order \(false < true\) if the selector is a constraint. Clearly, selectors define very different views on what should be a relevant pattern. Relevance may highlight correlations between items (regularity), correlations with a class of the dataset (contrast), removing redundancy (condensed representation), complementarity between several patterns (top-k), outlier detection such as the FPOF measure (cf. Table 1).

4 Framework of Supporters and Opponents

4.1 Fundamental Definitions and Notations

Deciding if a pattern is interesting (and why) generally depends on its frequency, but also on the frequencies of some other patterns. In our framework, we show how the knowledge of those patterns for a given selector makes it possible to qualify this selector and evaluate its quality.

More precisely, in order to isolate the impact of the change in frequency of a pattern Y on the evaluation of the interestingness of a pattern X, we propose to compare the interestingness of the assessed pattern X with respect to two very similar datasets \(\mathcal{D} \) and \(\mathcal{D} '\), where only the frequency of itemset Y varies. Therefore, we introduce the following definition.

Definition 2

(Increasing at a Point). Compared to \(\mathcal{D} \), a dataset \(\mathcal{D} '\) is increasing at a point Y, denoted by \(\mathcal{D} <_Y \mathcal{D} '\), iff and for all patterns \(X \ne Y\).

For instance, the first two datasets provided by Table 2 satisfy \(\mathcal{D} _1 <_{ABC} \mathcal{D} _2\). It means that patterns \(\emptyset \), A, B, C, AB, AC, BC have the same frequency in both datasets, while the frequency of ABC is greater in \(\mathcal{D} _2\). Indeed, we have , whereas . In the same way, we can easily see that \(\mathcal{D} _1 <_{A} \mathcal{D} _3\) due to the addition in \(\mathcal{D} _3\) (compared to \(\mathcal{D} _1\)) of an item A in the fifth transaction \(t_5\). Thus, we have , whereas , and for all other patterns \(X \ne A\).

Table 2. Three toy datasets with slight variations

Intuitively, given a selector \(s \), a supporter Y of an assessed pattern X is a pattern that increases the interestingness of X when only the support of Y increases (while all other patterns’ support remains unchanged). In other words, when the support of Y increases, it raises the likelihood of X to be selected using \(s \). Conversely, if Y is an opponent of X, when the support of Y increases, it reduces the likelihood of X to be selected. Using Definition 2, the following definition formalizes these notions of supporter and opponent.

Definition 3

(Supporters and Opponents). Given a selector \(s \), let X be a pattern in \(\mathcal{L} \). Y is a supporter of X for \(s \), denoted by \(Y \in s ^+(X)\), iff there exist two datasets \(\mathcal{D} \) and \(\mathcal{D} '\) such that \(\mathcal{D} ' >_Y \mathcal{D} \) and \(s (X,\mathcal{D} ') > s (X,\mathcal{D})\).

Conversely, Y is an opponent of X for \(s \), denoted by \(Y \in s ^-(X)\), iff there exist two datasets \(\mathcal{D} \) and \(\mathcal{D} '\) such that \(\mathcal{D} ' >_Y \mathcal{D} \) and \(s (X,\mathcal{D} ') < s (X,\mathcal{D})\).

Given a selector, the strength of the notions of supporter and opponent is to clearly identify the patterns actually involved in the evaluation of an assessed pattern. Moreover, it is important to note that the set of supporters and opponents of a pattern (given a selector) is not dependent on a specific dataset. They are a property of a given selector.

Considering the datasets given in Table 2, let us illustrate Definition 3 with the all-confidence selector. We already noted that \(\mathcal{D} _1 <_{ABC} \mathcal{D} _2\). Additionally, we have , whereas . Therefore, we have \(\mathcal{D} _1 <_{ABC} \mathcal{D} _2\) and , which means that ABC is a supporter of itself for the all-confidence measure. On the other hand, we have \(\mathcal{D} _1 <_{A} \mathcal{D} _3\), . Thus, by Definition 3, A is an opponent of ABC for the all-confidence measure.

More generally, it is possible to show that for all patterns X, is equal to \(\{X\}\), i.e. X has no supporters other than itself, and that . In the following Sect. 4.2, we give the set of supporters and opponents for a representative set of usual selectors.

4.2 Supporters and Opponents of Usual Selectors

In this section, we give the sets of supporters \(s ^+\) and opponents \(s ^-\) for a representative set of selectors \(s \). These sets are presented in Table 3 both for interestingness measures (support, all-confidence, bond, lift, etc.) and boolean constraints (productive, free, closed itemset, etc.). Due to lack of space, we do not present a proof for every selector considered in Table 3. Nevertheless, we provide a proof for two examples: the lift measure (see Property 1) and the free constraint (see Property 2). Note that the schema of these proofs could be easily adapted to identify the supporters and opponents of other correlation measures (support, all-confidence, bond, etc.) and other condensed representation constraints (maximal, closed, etc.).

Table 3. Analysis of methods based on supporters and opponents

Before detailing the proofs of Properties 1 and 2, given an itemset X, Lemma 1 stresses that it is always possible to build two transactional datasets \(\mathcal{D} \) and \(\mathcal{D} '\) such that \(\mathcal{D} '\) is increasing at X in comparison to \(\mathcal{D} \), i.e. \(\mathcal{D} ' >_X \mathcal{D} \). Note that the sets of transactions \(\mathcal{D}^-_{X} \) and \(\mathcal{D}^+_{X} \) introduced in this lemma are crucial to identify the sets of supporters and opponents of a selector.

Lemma 1

Given an itemset \(X \subseteq \mathcal{I} \), let \(\mathcal{D}^-_{X} \) and \(\mathcal{D}^+_{X} \) be the datasets defined by \(\mathcal{D}^+_{X} = \{ Y \subseteq X :|{X \setminus Y}| ~is~even \}\) and \(\mathcal{D}^-_{X} = 2^X \setminus \mathcal{D}^+_{X} \). We have \(\mathcal{D}^-_{X} <_X \mathcal{D}^+_{X} \).

For instance, using datasets shown in Table 2, it is easy to see that \(\mathcal{D} _2 = \{ ABC \} \cup \mathcal{D}^+_{ABC} \) with \(\mathcal{D}^+_{ABC} = \{ ABC, A, B, C \}\), and \(\mathcal{D} _1 = \{ ABC \} \cup \mathcal{D}^-_{ABC} \) with \(\mathcal{D}^-_{ABC} = \{ AB, AC, BC, \emptyset \}\). Thus, Lemma 1 implies that \(\mathcal{D} _1 <_{ABC} \mathcal{D} _2\). Given \(\mathcal{D} _0 = \{ ABC, AB, AC, BC \}\), we can also check that \(\mathcal{D} _1 = \mathcal{D} _0 \cup \mathcal{D}^-_{A} \) with \(\mathcal{D}^-_{A} = \emptyset \), and \(\mathcal{D} _3 = \mathcal{D} _0 \cup \mathcal{D}^+_{A} \) with \(\mathcal{D}^+_{A} = \{ A \}\). Thus, Lemma 1 implies that \(\mathcal{D} _1 <_{A} \mathcal{D} _3\).

Proof

First, it is easy to see hat and , which shows that . Then, for all itemsets \(Y \ne X\), if \(Y \not \subseteq X\), we have . Otherwise, if \(Y \subseteq X\), we can see that where . Thus, for all \(Y \ne X\), , which completes the proof that \(\mathcal{D}^-_{X} <_X \mathcal{D}^+_{X} \). \(\square \)

Using Lemma 1, we now prove Property 1, which defines the supporters and opponents of the lift measure.

Property 1

For all itemsets X such that \(|{X}| > 1\), and .

Proof

Given an itemset X such that \(|{X}| > 1\), we distinguish three cases:

1. Let \(\mathbf{Y = X}\) and two datasets \(\mathcal{D} '\) and \(\mathcal{D} \) such that \(\mathcal{D} ' >_Y \mathcal{D} \). By definition we have: and for all \(Z \ne (Y=X)\). Because \(|{X}| > 1\), we also have \(\{ i \} \ne X\) for all \(i \in X\). Therefore, for all \(i \in X\), which implies that the denominators of and are equal. Finally, we have , which shows that , whereas .

2. Let Y be an itemset such that \(\mathbf{Y \ne X}\) and \(\mathbf{|{Y}| > 1}\), and two datasets \(\mathcal{D} '\) and \(\mathcal{D} \) such that \(\mathcal{D} ' >_Y \mathcal{D} \). By definition, we have: since \(Y \ne X\), and for all \(i \in X\) since \(Y \ne \{ i \}\) (indeed, we assume that \(|{Y}| > 1\)). Thus, we necessarily have for all datasets \(\mathcal{D} '\) and \(\mathcal{D} \) such that \(\mathcal{D} ' >_Y \mathcal{D} \). It implies that for all itemsets Y such that \(Y \ne X\) and \(|{Y}| > 1\), .

3. Let Y be an itemset such that \(\mathbf{Y \ne X}\) and \(\mathbf{|{Y}| = 1}\), and two datasets \(\mathcal{D} '\) and \(\mathcal{D} \) such that \(\mathcal{D} ' >_Y \mathcal{D} \). Using the same reasoning as before, it is easy to see that if \(Y \not \subset X\), we necessarily have , which implies that and . Dually, if \(Y \subset X\), because \(|{Y}| =1\), there exists \(j \in X\) such that \(Y = \{ j \}\). Since \(\mathcal{D} ' >_Y \mathcal{D} \) and \(X \ne Y\), we have and \(\prod _{i \in X} {supp} (\{ i\},\mathcal{D} ') > \) \(\prod _{i \in X} {supp} (\{ i\},\mathcal{D})\) because and \(j \in X\). Thus, we have , which shows that \(Y = \{ j \} \subset X\) is an opponent of X for the lift measure (and not a supporter). \(\square \)

We now consider the case of a condensed representation selector, and prove Property 2, which defines the supporters and opponents of the free constraint.

Property 2

For all itemsets X such that \(|{X}| > 1\), \({free} ^+(X) = \{ X \setminus \{i\} :i \in X \}\) and \({free} ^-(X) = \{ X \}\).

Proof

Let X be an itemset such that \(|{X}| > 1\). We first show that \({X}^{\underline{\downarrow }} \subseteq {free} ^+(X)\), i.e. that for all \(k \in X\), \(Y = X \setminus \{ k \} \in {free} ^+(X)\). By definition, we have to find two datasets \(\mathcal{D} \) and \(\mathcal{D} '\) such that \(\mathcal{D} ' >_{Y} \mathcal{D} \), \({free} (X,\mathcal{D}) = false\), whereas \({free} (X,\mathcal{D} ') = true\). Let \(\mathcal{D} = \{ X \} \cup \{ X \setminus \{ i \} :i \in Y \} \cup \mathcal{D}^-_{Y} \) and \(\mathcal{D} ' = \{ X \} \cup \{ X \setminus \{ i \} :i \in Y \} \cup \mathcal{D}^+_{Y} \). First, it is easy to that \(\mathcal{D} ' >_{Y} \mathcal{D} \). Moreover, we have and since \(Y \subseteq X\) and \(Y \not \in \mathcal{D}^-_{Y} \). Thus, X is not a free itemset in \(\mathcal{D} \), i.e. \({free} (X,\mathcal{D}) = false\). Then, we can see that and for all \(i \in X\) (in particular, note that \(Y = (X \setminus \{ k\} \in \mathcal{D}^+_{Y} \)). Thus, X is a free itemset in \(\mathcal{D} '\), i.e. \({free} (X,\mathcal{D}) = true\), which completes the proof that \(Y = (X \setminus \{k\}) \in {free} ^+(X)\).

We now show that \(X \in {free} ^-(X)\). We have to find two datasets \(\mathcal{D} \) and \(\mathcal{D} '\) such that \(\mathcal{D} ' >_{X} \mathcal{D} \), \({free} (X,\mathcal{D}) = true\), whereas \({free} (X,\mathcal{D} ') = false\). Let \(\mathcal{D} = \{ X \} \cup \mathcal{D}^-_{X} \) and \(\mathcal{D} ' = \{ X \} \cup \mathcal{D}^+_{X} \). By construction (see the definitions of \(\mathcal{D}^-_{X} \) and \(\mathcal{D}^+_{X} \) in the proof of Lemma 1), it is clear that \(\mathcal{D} ' >_{X } \mathcal{D} \). Moreover, we have and for all \(k \in X\) (because \(X \setminus \{k\} \subseteq X \in \mathcal{D} \), and \(X \setminus \{k\} \in \mathcal{D}^-_{X} \)). Therefore, X is a free itemset in \(\mathcal{D} \), i.e. \({free} (X,\mathcal{D})=true\). Then, we can also check that and for all \(k \in X\). Thus, X is not a free itemset in \(\mathcal{D} '\), which completes the proof that \(X \in {free} ^-(X)\).

To complete the proof, we have to show that any other pattern \(Y \not \in \{ X \} \cup {X}^{\underline{\downarrow }} \) cannot be a supporter or an opponent of X. In particular, we have to show that for all \(Y \subset X\), if \(Y \not \in {X}^{\underline{\downarrow }} \), then \(Y \not \in {free} ^+(X)\), i.e. that for all databases \(\mathcal{D} \) and \(\mathcal{D} '\) such that \(\mathcal{D} ' >_Y \mathcal{D} \), it is impossible to have \({free} (X,\mathcal{D}) = false\) and \({free} (X,\mathcal{D} ') = true\). If \({free} (X,\mathcal{D}) = false\), it means that there exists \(k \in X\) such that . Moreover, because \(\mathcal{D} ' >_Y \mathcal{D} \) and \(Y \not \in {X}^{\underline{\downarrow }} \), we have , which shows that X cannot be free in \(\mathcal{D} '\), i.e. \({free} (X,\mathcal{D} ') = false\), and contradicts the hypothesis. Thus, we have shown that only the direct subsets of X are supporters of X for the free constraint. The rest of the proof is omitted for lack of space.

To conclude this section, we stress that the strength of the concept of supporters and opponents is to clearly identify the patterns actually involved in the evaluation of a selector. For instance, whereas the definition of free itemsets given in Table 1 involves all strict subsets of X (with \(\forall Y \subset X\)), we can see that only direct subsets of X are supporters. In the following sections, we show how supporters and opponents can be used to compare selectors (see Sect. 5), and how the number of supporters and opponents of a selector is related to its effectiveness to select interesting patterns (see Sect. 6).

5 Typology of Interestingness Selectors

5.1 Polarity of Interestingness Selectors

We distinguish two broad categories of selectors according to whether they aim at discovering over-represented phenomena in the data (e.g., positive correlation) or under-represented phenomena in the data (e.g., outlier detection). Naturally, the characterization of these categories is related to the evaluation of the frequency on the pattern to assess. For instance, it is well-known that the interestingness of a pattern X increases with its frequency for finding correlations between items. In order that an interestingness selector \(s\) will be sensitive to this variation, it is essential that \(s\) increases with the frequency of X. This principle has first been proposed for association rules [23] (Property P2) and after, extended to correlated itemsets [15, 27]. Conversely, a selector for outlier detection will favor patterns whose frequency decreases. Indeed, a pattern is more likely to be abnormal as it is not representative of the dataset i.e., its frequency is low.

We formalize these two types of patterns thanks to reflexivity property:

Definition 4

(Positive and Negative Reflexive). An interestingness selector \(s\) is positive (resp. negative) reflexive iff any pattern is its own supporter i.e., \((\forall X \in \mathcal{L}) (X \in {s}^{+} (X))\) (resp. opponent i.e., \((\forall X \in \mathcal{L}) (X \in {s}^{-} (X))\)).

As , the all-confidence selector is positive reflexive. Conversely, the free selector is negative reflexive because \({{free}}^{-} (X) = \{X\}\) (when frequency of X increases, X is less likely to be free because its frequency becomes closer to that of its subsets).

This clear separation based on reflexive property constitutes the first analysis axis of our selector typology. Table 4 schematizes this typology where the polarity is the vertical axis of analysis. The horizontal axis (semantics) will be described in the next section. Note that the correlation measures and the closed itemsets are in the same column. Several works in the literature have shown that closed itemsets maximize classification measures [11] and correlation measures [10]. For instance, the lift of a closed pattern has the highest value of its equivalence class because the frequency of X remains the same (numerator) while the denominator decreases.

Table 4. Typology of interestingness selectors

Of course, it should not be possible for an interestingness selector to both isolate over-represented phenomena (i.e., positive) and under-represented phenomena (i.e., negative). For this reason, a selector should never be both positive and negative. Besides, the behavior of an interestingness selector is easier to understand for the end user if the change in frequency of a pattern Y still impacts \(s (X)\) in the same way. In other words, the increase of should not decrease \(s (X)\) in some cases and increase \(s (X)\) in others.

Quality Criterion 1

(Soundness). An interestingness selector \(s\) is sound iff no pattern is at the same time a supporter and an opponent of another pattern: \(\forall X \in \mathcal{L}, {s}^{+} (X) \cap {s}^{-} (X) = \emptyset \).

When Quality Criterion 1 is violated, it makes difficult to interpret a mined pattern. For instance, frequent free itemset mining is not sound. There are two opposite reasons for explaining that a pattern is not extracted: its frequency is too low (non-frequent rejection), or its frequency is too high (non-free rejection). Conversely, for frequent closed patterns, a pattern is not extracted if and only if its frequency is too low (whatever the underlying cause: the pattern is not frequent or non-closed). It means that frequent closed pattern mining is sound. We therefore think that the violation of Quality Criterion 1 (where \({s}^{+} (X) = {s}^{-} (X) = {X}^{\downarrow } \)) could partly explain the failure of NDI (non-derivable itemsets) even if they form an extremely compact condensed representation.

Recommendation. A well-behaving pattern mining method should not mix interestingness selectors with opposite polarities or make possible the existence of patterns that are supporters and opponents of the same pattern.

Before describing the semantics axis of our typology, Table 4 classifies all the selectors presented in Table 1. As expected, all selectors seeking to isolate over-represented phenomena are in the Positive column.

5.2 Semantics of Interestingness Selectors

This section presents three complementary criteria to identify the nature of an interestingness selector. The key idea is to focus on the relationships between patterns to qualify the semantics of the selector. More precisely, the meaning of a positive selector (whose primary objective is to find over-represented patterns) depends strongly on the set of opponents that can lead to the rejection of the assessed pattern. Conversely, a negative reflexive selector relies often on supporters to better isolate under-represented phenomena. For this reason, the positive (resp. negative) column of Table 4 involves opponents \({s}^{-} (X)\) (resp. supporters \({s}^{+} (X)\)).

Furthermore, for two selectors of the same polarity, it is possible to distinguish their goals (e.g., correlation or condensed representation) according to the opponents/supporters that they involve. Thus, we break down the semantics axis into three parts: subsets \({X}^{\downarrow } = \{Y \subset X\}\), supersets \({X}^{\uparrow } = \{Y \supset X\}\) and incomparable sets \({X}^{\leftrightarrow } = \{Y \in \mathcal{L}: Y \not \subseteq X \wedge Y \not \supseteq X\}\). This decomposition of the lattice of the opponents and the lattice of the supporters is useful to redefine coherent classes of usual selectors (these classes are indicated in Table 4):

Definition 5

(Selector Classes). An interestingness selector \(s\) belongs to:

  • C1 (Positive correlation) iff \((\forall X \in \mathcal{L})({X}^{\downarrow } \cap {s}^{-} (X) \ne \emptyset )\)

  • C2 (Minimal condensed representation) iff \((\forall X \in \mathcal{L})({X}^{\downarrow } \cap {s}^{+} (X) \ne \emptyset )\)

  • C3 (Maximal condensed representation) iff \((\forall X \in \mathcal{L})({X}^{\uparrow } \cap {s}^{-} (X) \ne \emptyset )\)

Intuitively a pattern is a set of correlated items (or correlated in brief) when its frequency is higher than what was expected by considering the frequency of some of its subsets (this set of opponents varies depending on the statistical model). This means that the increase of the frequency of one of these subsets may lead to the rejection of the assessed pattern. In other words, a correlation measure is based on subsets as opponents. This observation has already been made in the literature for association rules [23] (with Property P3) and itemsets [15, 27]. Table 4 shows that most of correlation measures in the literature satisfy \((\forall X \in \mathcal{L})({X}^{\downarrow } \cap {s}^{-} (X) \ne \emptyset )\). The extraction of NDI, classified as a condensed representation, also meets this criterion. It is intriguingly since the NDI selector is not usually used as a correlation measure.

A condensed representation is a reduced collection of patterns that can regenerate some properties of the full collection of patterns. Typically, frequent closed patterns enable to retrieve the exact frequency of any frequent pattern. Most approaches are based on the notion of equivalence class where two patterns are equivalent if they have the same value for a function f and if they are comparable. The equality for f and the comparability result in an interrelation between the assessed pattern and its subsets/supersets. Class C3 (i.e., maximal condensed representations) includes the measures that remove the assessed pattern when a more specific pattern provides more information. Closed patterns and maximal patterns satisfy this criterion: \((\forall X \in \mathcal{L})({X}^{\uparrow } \cap {s}^{-} (X) \ne \emptyset )\). Minimal condensed representations are in the dual class (i.e., Class C2).

Unlike the polarity that opposes two types of irreconcilable patterns, the three parts of the semantics axis (i.e., subsets, supersets and incomparable sets) are simultaneously satisfiable. We think that an ideal pattern extraction method should always belong to these three parts:

Quality Criterion 2

(Completeness). A selector \(s\) is complete iff all patterns are either supporter or opponent: \(\forall X \in \mathcal{L}, {s}^{+} (X) \cup {s}^{-} (X) = \mathcal{L} \).

Let us illustrate the principle behind this quality criterion by considering an ideal pattern mining method that isolates correlations. Of course, this method relies on a selector \(s\) that belongs to the class of correlations (for example, the lift). At equal frequency, the longer pattern will be preferred because it will maximize lift. This property corresponds to the criterion \({X}^{\uparrow } \cap {s}^{-} (X) \ne \emptyset \). At this stage, two incomparable patterns can cover the same set of transactions. To retain only one, we must add a new selection criterion that verifies the criterion \({X}^{\leftrightarrow } \cap {s}^{-} (X) \ne \emptyset \). This approach is at the heart of many proposals in the literature [3, 10, 28]: (i) use of a correlation measure, (ii) elimination of non-closed patterns, (iii) elimination of incomparable redundant patterns.

Recommendation. All patterns should be either supporters or opponents in a well-behaving pattern mining method. It is often necessary to combine a measure with local and global redundancy reduction techniques.

6 Evaluation Complexity of Interestingness Selectors

As Quality Criterion 2 is often violated, we propose to measure its degree of satisfaction to evaluate and compare interestingness selectors. More precisely, we measure the quality of an interestingness selector considering its degree of satisfaction of the semantics criterion. Let us consider the correlation family, it is clear that to detect correlations, support is a poorer measure than lift which is itself less effective than productivity. Whatever the part of the lattice, the more numerous the opponents/supporters of a selector, the better its quality. In other words, a selector is more effective to assess the interestingness of a pattern X when the number of supporters and opponents of X is very high.

Definition 6

(Evaluation Complexity). The evaluation complexity of an interestingness selector \(s \) is the asymptotic behavior of the cardinality of its supporters/opponents.

The evaluation complexity of a selector usually depends on the cardinality of the assessed pattern (denoted by \(k = |{X}| \)) and the cardinality of the set of items \(\mathcal{I}\) (denoted by \( n = |{\mathcal{I}}| \)). For instance, . Therefore, the behavior of the number of evaluations of all-confidence is linear with respect to itemset size. Similarly, the evaluation complexity of productive itemsets is exponential with respect to the size of the assessed pattern since all subsets are involved in the evaluation of this constraint. According to the evaluation complexity, we say that the quality of the constraint of productive itemsets is better than that of the all-confidence, because the opponents are more numerous. More generally, this complexity allows to compare several interestingness selectors to each other. The column \(|{{s}^{\pm } (X)}| \) (where \({s}^{\pm } (X)\) is the total number of supporters and opponents of X) in Table 3 indicates the evaluation complexity of each measure or constraint defined in Table 1. Three main complexity classes emerge: constant, linear and exponential. Although Table 1 is an extremely small sample of measures, we observe that the evaluation complexity of pattern mining methods has increased over the past decades. Interestingly, we also note that the evaluation complexity of global constraints [6, 13] (or advanced measures [26]) is greater than those of local constraints (or absolute measures).

For Classes C2 and C3, the most condensed representations (among those that enable to regenerate the frequency of each pattern) are also those with the greatest evaluation complexity. Indeed, the free itemsets are more numerous than the closed ones, themselves more numerous than the NDIs. For Class C1, it is clear that measures based on more sophisticated statistical models require more relationships [26]. They have therefore an higher evaluation complexity. We will also experimentally verify this hypothesis in the next section.

7 Experimental Study

Our goal is to verify whether the quality of the correlated pattern selectors follow the evaluation complexity. In other words, if a correlation measure has a greater evaluation complexity than another measure, it is expected to be more effective.

To verify this hypothesis, we rely on the experimental protocol inspired by [14]. The idea is to compare the extracted patterns in an original dataset \(\mathcal{D}\) with the same randomized dataset \(\mathcal{D} ^*\). Specifically, in the randomized dataset \(\mathcal{D} ^*\), a large number of items are randomly swapped two by two in order to clear any correlation. Nevertheless, this dataset \(\mathcal{D} ^*\) retains the same characteristics (transaction length and frequency of each item). So, if a pattern X extracted in the original dataset \(\mathcal{D}\) is also extracted in the randomized dataset \(\mathcal{D} ^*\), X is said to be false positive (FP). Its presence in \(\mathcal{D} ^*\) is not due to the correlation between items but due to the distribution of items in data. Then we evaluate for each selector how many false positive patterns are extracted on average by repeating the protocol on 10 randomized datasets. Experiments were conducted on datasets coming from the UCI ML Repository [7]. Given a minimum support threshold, we compare 4 selectors: Support (all frequent patterns); All-confidence (all frequent patterns having at least 5 as all-confidence); Lift (all frequent patterns having at least 1.5 as lift); and Productivity (all frequent patterns having at least 1.5 as productivity).

Even if arbitrary thresholds are used for the last three selectors, the results are approximately the same with other thresholds because we use the FP rate as evaluation measure. This normalized measure is a ratio, it returns the proportion of FP patterns among all the mined patterns.

Fig. 1.
figure 1

FP rate with minimum support threshold

Figure 1 plots the FP rate of each selector on abalone, cmc and glass. Since lift and productivity measures sometimes do not return FP patterns, there are missing points because the scale is logarithmic. For each dataset, we observe that the FP rate increases when the minimum support threshold decreases regardless of the measure. The evolution of the FP rate for the all-confidence is very similar to that of the support even if the all-confidence has a greater complexity in evaluation. For the other selectors, there is a clear ranking from worst to best: support, lift and productivity. This ranking also corresponds to the classes of complexity from worst to best: constant, linear, exponential. Our framework could be refined to consider a set of patterns as an opponent (or a supporter). Then, the relation between a pattern and its supporters (or opponents) would become a relation between a pattern and a set of supporters (or opponents). That would make possible to capture refinements between selectors as follows: the all-confidence depends only on one item at once (due to the maximum of its denominator) whereas the lift can vary according to a set of items (due to the multiplication). Nevertheless, our experiments show that on the whole the correlation measures with the highest evaluation complexity are also the best ones according to the FP rate.

8 Conclusion and Discussion

In this paper, we have addressed the question of the quality of a data mining method in the context of unsupervised problems with binary data. A key concept is to study the relationships between a pattern X and the other patterns when X is selected by a method. These relationships are formalized through the notions of supporters and opponents. We have presented a typology of methods defined by formal properties and based on two complementary criteria. This typology offers a global picture and a methodology helping to compare methods to each other. Besides, if a new method is proposed, its quality can be immediately compared to the quality of the other methods according to our framework. Finally, the quality of a method is quantified via an evaluation complexity analysis based on the number of supporters and opponents of a pattern extracted by the method.

Two recommendations can be drawn from this work. We think that the result of a data mining operation should be understandable by the user. So, our first recommendation is a data mining method should not simultaneously extract over-represented phenomena and under-represented phenomena because mixing these two kinds of phenomena obstructs the understandability of the extracted patterns. This recommendation is formally defined by our soundness criterion. Most of methods satisfy this property, but there are a few exceptions such as the constraints extracting NDI and frequent free patterns. The violation of this recommendation might explain why these patterns are of little use.

Another recommendation is a data mining method should extract patterns for which all patterns contribute to the quality of an extracted pattern. This recommendation is formalized by our completeness criterion stating that all patterns must be either supporters or opponents of a pattern extracted by a method. In practice, this recommendation is not satisfied by a lot of methods. However, a few methods are endowed with this behavior, such as [3, 10, 28] in the context of the correlations. We think that a goal of pattern mining should be to design methods following this recommendation which is more often reached by pattern sets [3] as illustrated by the previous examples [3, 10, 28].

A perspective of this work is to study an interestingness theory for methods producing pattern sets. Pattern sets are a promising avenue since the interestingness of a pattern also depends on the interestingness of the other patterns of the pattern set, thus providing a global quality of the method. Finally, it is important to note that our framework can be generalized to other pattern languages (sequence, graph, etc.) and other basic functions, e.g., observing the variation of support in a target class would extend our approach to the supervised context.