Keywords

1 Introduction

Block ciphers are used as basic building primitives in symmetric cryptography for encryption, authentication, construction of hash functions and so on. Evaluation of their practical security has been a hot research issue over the decades, giving rise to different analysis techniques. Statistical attacks exploit non-uniform behaviors of the plaintext-ciphertext data to find information about the key. One of the most prominent statistical attacks is linear cryptanalysis. Previously, linear trails were assumed to behave equally for each key [3, 4, 17, 20]. Then, by considering many trails in one approximation [24, 25], the linear hull effect raises interesting discussions about fixed-key behaviors in single linear approximations [21, 22]. Daemen et al. gave a fixed-key probability distribution for single linear correlations [13], leading to subsequent works on e.g., fundamental assumptions [9], the effect of key schedules [1] and measures for data complexity [19], all for single linear attacks. However, we still do not understand the situation in multidimensional linear cryptanalysis.

A collection of linear approximations has a capacity which measures their bias to the uniform distribution. One important open problem in multidimensional linear cryptanalysis is to estimate the capacity and data complexity when a large number of different keys are considered. In previous work, the capacity was assumed to hold an average value constantly for most of the keys, and the data complexity was usually measured by reciprocal of the average capacity. However, neither is correct. As we know, the key equivalent hypothesis has been questioned for single linear approximations and differential trails [5, 9, 12]. Now this hypothesis also requires adjustment in multidimensional linear setting.

Also, it has always been difficult to compute average data complexity over the keys in linear cryptanalysis. Using Jensen’s inequality, Murphy [22] points out that the Fundamental Theorem [24] can only give a lower bound for the average data complexity when a collection of linear trails in a linear approximation is used. Leander shows that in single linear attacks we should focus on median complexity instead of average complexity since the latter usually turns to infinity [19]. Both Murphy’s and Leander’s concerns haven’t been addressed yet in the scenario of multidimensional linear attacks.

As one of the most powerful variants of linear attacks, multidimensional linear attacks notably benefit the data complexity, both in theory and in practice [10, 11, 15, 16, 23]. Moreover, the multidimensional linear distinguisher has been discovered to have connections with other statistical distinguishers, e.g., truncated differential distinguishers [6], statistical saturation distinguishers [19], and integral distinguishers [8]. All the above suggests the importance of multidimensional linear cryptanalysis, hence, the lack of knowledge on fundamental aspects of this attack is especially surprising, and deserves more attention.

Our Contributions. In this paper, we point out that under a reasonable assumption, the distribution of key-dependent capacity can be explicitly formulated with a Gamma distribution, depending on average linear probability and dimension (Sect. 3). This distribution is verified experimentally on the round-reduced PRESENT cipher. Then, we derive the distribution of data complexity, an Inverse Gamma distribution based on the same parameters (Sect. 4). Our results allow a more accurate measurement for multidimensional linear attacks.

With these distributions, in Sect. 5 we discuss three well-known measures when considering the data complexity of multidimensional linear attacks: the reciprocal of average capacity, the average and the (general) median complexity. The following fundamental questions in single linear attacks are then generalized to multidimensional linear attacks and solved.

Firstly, we consider the standard key equivalence hypothesis. We discover that instead of holding for a majority of keys, the average capacity actually holds for less than half of the keys, no matter how many linear approximations are used. Hence, we modify the hypothesis in a way which is more in line with the practical situation.

Secondly, as we know, the average data complexity of single linear attacks is difficult to calculate, since the linear hull effect may result in zero correlation for some keys. However, we show that the situation changes when multiple linear approximations are involved, and in this case the average data complexity can be easily calculated from the Inverse Gamma distribution. Then, by generalizing Murphy’s idea from the case of linear hulls to the case of multiple linear approximations, the reciprocal of average capacity is proved to be only a lower bound of the average data complexity. We also figure out the exact difference between this lower bound and the average data complexity.

Thirdly, we solve the open problem proposed by Leander in [19] by developing the usage of median complexity to multidimensional linear attacks. Finally, all measures of data complexity are compared under different dimensions. An interesting observation is that, the median complexity infinitely approaches to the average one as the dimension increases.

In Sect. 6, we revisit Cho’s 25 rounds of multidimensional linear attack on PRESENT [10], which targets the most rounds of PRESENT with data complexity less than the whole codebook. As an application of our theoretical analysis, we can directly estimate the average capacity, instead of making a complex proof like [10]. Our results are very close to Cho’s. Moreover, the exact knowledge of the capacity distribution allows us to compute the ratio of weak keys precisely. Using Cho’s attack method by changing some parameters in the attack, \(2^{123.24}\) weak keys for 26 rounds PRESENT can be recovered with no more than \(2^{62.5}\) plaintext-ciphertext pairs.

2 Preliminaries

2.1 Block Ciphers and Linear Cryptanalysis

Let \(\mathbb {F}_2\) be the binary field with two elements and \(\mathbb {F}_2^n\) be the n-dimensional vector space over \(\mathbb {F}_2\). The inner product on \(\mathbb {F}_2^n\) is defined by \(a \cdot b = \sum _{i=1}^{n}a_ib_i\), where a, b \(\in \mathbb {F}_2^n\).

A block cipher is a mapping \(E : \mathbb {F}_2^n \times \mathbb {F}_2^\kappa \rightarrow \mathbb {F}_2^n\) with \(E_k(\cdot ) \overset{def}{=} E(k,\cdot )\) for each \(k \in \mathbb {F}_2^\kappa \). If \(y = E_k(x)\), x, y and k are referred to as the plaintext, the ciphertext and the master key, respectively. A key-alternating cipher is a block cipher consisting of an alternating sequence of unkeyed rounds and simple bitwise key additions.

Linear cryptanalysis uses a linear relation between bits from x, y and k. A linear approximation (uv) is a probabilistic linear relation expressed as a boolean function of these bits, i.e.,

$$\begin{aligned} B(k) \overset{def}{=} u \cdot x \oplus v \cdot E_k(x), \end{aligned}$$
(1)

where (uv) is called the text mask. B(k) is a boolean random variable characterized by

We call \(c(k) = 2p(k) - 1\) the fixed-key correlation of the linear approximation (uv). The linear probability (LP) of approximation (uv) is defined as \(LP(k) = c(k)^2\). Both c(k) and LP(k) vary over different keys, and can be regarded as real-value random variables over the whole key space.

In a linear approximation (uv), there may be many paths with different intermediate masks, but sharing the same input and output mask (uv). A path that considers linear relation round by round is called as linear trail (or linear characteristic). Note that in a key-alternating cipher, the LP of a linear trailFootnote 1 is independent of the subkeys.

2.2 Multidimensional Linear Approximations and Data Complexity

Multidimensional linear attacks use m approximations with linearly independent text masks, called base approximations, to construct an m-dimensional vectorial boolean function f. Let p = (\(p_0\), \(p_1\),..., \(p_{2^m-1}\)) be the probability distribution of f. It can be computed by the following lemma.

Lemma 1

([15, Corollary 1]) Let \(f: \mathbb {F}_2^n \mapsto \mathbb {F}_2^m\) be a vectorial boolean function with the probability distribution p. Then, we have

$$c_a = \sum _{\eta \in \mathbb {F}_2^m}(-1)^{a \cdot \eta }p_{\eta },\ for \ all \ a \in \mathbb {F}_2^m$$

and

$$p_{\eta } = 2^{-m}\sum _{a \in \mathbb {F}_2^m}(-1)^{a \cdot \eta }c_a,\ for \ all \ \eta \in \mathbb {F}_2^m.$$

Here, \(c_a\) is the correlation of the boolean function \(a \cdot f\), \(a \in \mathbb {F}_2^m\).

In multidimensional linear attack, \(c_a\) is indeed the correlation of the approximation that combines the base approximations linearly.

Let \(q = (q_0,...,q_{2^m-1})\) be another discrete probability distribution of an m-bit random variable. Then, the capacity of p and q is defined as follows.

Definition 1

The capacity between two probability distributions p and q is defined by

$$C(p,q) = \sum _{\eta =0}^{2^m-1}(p_{\eta }-q_{\eta })^2q_{\eta }^{-1}.$$

The capacity of multidimensional linear approximations with probability distribution p is \(C(p) = C(p,\theta )\), where \(\theta \) is the uniform distribution.

Lemma 2

([15, Corollary 2]) Given an m-dimensional vectorial boolean function f with the probability distribution p, the capacity is

$$\begin{aligned} C(p)= \sum _{a \in \mathbb {F}_2^m, a \ne 0}c_a^2. \end{aligned}$$

Thus, the capacity of multidimensional linear approximations is computed from m base approximations and other \(2^m-1-m\) approximations that are XOR sum of the m base approximations. These \(2^m-1-m\) approximations, denoted as combined approximations, are linearly spanned from the m base approximations.

To estimate the data complexity of multidimensional linear cryptanalysis, the Chernoff information \(D^*\) can be considered [2].

Theorem 1

([2, Theorem 1]) Let \(BestAdv_N(p,q)\) be the best advantage for distinguishing probability distribution p from probability distribution q, using N samples. We have

$$1-BestAdv_N(p,q) = 2^{-ND^*(p,q)+o(N)}.$$

Hence, the data complexity is \(N \approx \frac{1}{D^*(p,q)}\). When q is the uniform distribution and p is close to q, the Chernoff information can be approximated by the capacity C(p), [2, Theorem 7], by

$$D^*(p,q) \simeq \frac{C(p)}{8\ln 2}.$$

In this case, when the optimal distinguisher based on LLR-statistic (or \(\chi ^2\)-statistic) is used, the data complexity is given as \(\frac{\lambda }{C(p)}\), where \(\lambda \) depends on the success probability of the distinguisher.

The probability distribution p of an m-dimensional linear approximation actually varies over different keys, so does the capacity (as we will show later). Hereafter, instead of using C(p(k)), we use C(k) to represent the variable of key-dependent capacity.

2.3 Related Distributions and Assumptions

Note 3

Let \(\mathcal {N}(\mu ,\sigma ^2)\) be the normal distribution with mean \(\mu \) and variance \(\sigma ^2\). Let \(\varGamma (\alpha ,\theta )\) be the Gamma distribution under the shape-scale parametrization, with mean \(\alpha \theta \), the probability density function g and the cumulative distribution function \(\mathcal {G}\). If \(X \sim \mathcal {N}(0,\sigma ^2)\), then \(X^2 \sim \varGamma (1/2,2\sigma ^2)\). Inv-Gamma(\(\alpha \),\(\beta \)) denotes the inverse-Gamma distribution with mean \(\frac{\beta }{\alpha -1}\) for \(\alpha > 1\). If \(X \sim \varGamma (\alpha ,\theta )\), then \(\frac{1}{X} \sim \) Inv-Gamma(\(\alpha ,\theta ^{-1})\).

Daemen et al. give the distribution of the fixed-key LP of linear approximations when linear hull effect is considered [13].

Approximation 4

[13, Theorem 22] Given a key-alternating cipher with independent round-keys, when the number of linear trails of (uv) is large enough and their LP are small compared to ELP(u,v), the fixed-key correlation of (uv), c(k), which is a real-value random variable, follows

$$c(k) \sim \mathcal {N}(0,ELP(u,v)).$$

The fixed-key LP(k) follows the distribution of \(\varGamma (\frac{1}{2},2ELP(u,v))\), with mean ELP(u,v) and variance \(2ELP(u,v)^2\), where \(ELP(\cdot \)) is the average linear probability of the approximation over all keys.

The ELP(uv) can be denoted as \(\overline{c^2}\) and computed by the following proposition for key-alternating ciphers.

Proposition 1

[12, 24] Let E be a key-alternating block cipher and assume that all subkeys are independent. The average LP of a linear approximation is the sum of all LP of the linear trails \(t_j\), LPT(\(t_j\)), between the input and output mask of this approximation, i.e.,

$$ELP(u,v) = \sum _{t_j \in (u,v)}LPT(t_j).$$

3 Key-Dependent Capacity in Multidimensional Linear Approximations

In this section, we study the distribution of key-dependent capacity. Let c(k) (resp. LP(k)) be a real-value random variable representing the fixed-key correlation (resp. linear probability) of the linear approximation and we can know c(k) and LP(k) from Approximation 4. When multiple linear approximations are used, we use i in the subscript to denote the index of linear approximations, e.g., denote \(c_{i}(k)\) as the fixed-key correlation of the ith linear approximation. W.l.o.g, we use \(i = 1,\dots ,m\) to represent the subscript of m base approximations.

In [16], the authors claim that in practical experiments the probability distributions vary a lot with the keys while the capacity remains rather constant. However, in this section we point out that the capacity also varies over different keys from the theoretical point and give experimental verification. We focus on dealing with two cases, both existing in practical block ciphers. These two cases are shown in Propositions 2 and 3, respectively.

Proposition 2

In an m-dimensional linear attack using m base approximations with correlations \(c_i(k)\) i.i.d. to \(\mathcal {N}(0,\overline{c^2})\) over the keys, where \(\overline{c^2}\) is the average LP. If for each fixed key, the binary random variables associated to the base approximations are statistically independent, the fixed-key capacity of this m-dimensional linear approximation, C(k), approximately follows Gamma-distribution \(\varGamma (\frac{m}{2},2\overline{c^2})\).

Proof

Let \(f_1(k),\dots ,f_m(k)\) be m linearly independent base approximations to construct the m-dimensional approximation f(k), and \(f(k) = (f_1(k),\dots ,f_m(k))\) is an m-dimensional vectorial boolean function with the probability distribution \(p(k)=\{p_\eta (k)\}\), where \(\eta \in \mathbb {F}_2^m\) and \(p_\eta (k)\) is the probability that \(f(k) = \eta \). Indeed, \(f_i(k)\) is a binary random variable with correlation \(c_i(k)\). Since \(f_i(k)\) are statistically independent each other for each fixed key k,

$$p_\eta (k) = \prod _{i = 1}^{m}(\frac{1}{2}+(-1)^{f_i(k)}\frac{c_i(k)}{2}), \eta \in \mathbb {F}_2^m$$

According to Definition 1,

$$\begin{aligned} C(k)&= \sum _{\eta \in \mathbb {F}_2^m}(p_\eta (k)-2^{-m})^2/2^{-m} = 2^m\sum _{\eta \in \mathbb {F}_2^m}(p_\eta (k)-2^{-m})^2\\&= 2^m\sum _{\eta \in \mathbb {F}_2^m}(\prod _{i = 1}^{m}(\frac{1}{2}+(-1)^{f_i(k)}\frac{c_i(k)}{2})-2^{-m})^2 \end{aligned}$$

For each fixed key, \(c_i(k) \cdot c_j(k) \ll c_i(k)\),

$$\begin{aligned} C(k)&= 2^m\sum _{\eta \in \mathbb {F}_2^m}[\sum _{i = 1}^{m}(-1)^{f_i(k)}\frac{c_i(k)}{2\cdot 2^{m-1}}]^2\\&= 2^m\sum _{\eta \in \mathbb {F}_2^m}[\frac{1}{2^{2m-2}}(\sum _{i = 1}^{m}(\frac{c_i(k)}{2})^2 + 2\sum _{i \ne j}(-1)^{f_i(k)+f_j(k)}\frac{c_i(k)}{2}\frac{c_j(k)}{2})] \end{aligned}$$

Since \(\sum _{\eta \in \mathbb {F}_2^m}\sum _{i \ne j}(-1)^{f_i(k)+f_j(k)}\frac{c_i(k)}{2}\frac{c_j(k)}{2} = 0\),

$$C(k) = \frac{2^m}{2^{2m-2}}\sum _{\eta \in \mathbb {F}_2^m}\sum _{i = 1}^{m}(\frac{c_i(k)}{2})^2 = \sum _{i = 1}^{m}c_i(k)^2 = \sum _{i = 1}^{m}LP_i(k)$$

Since \(c_i(k)\) are i.i.d. to \(\mathcal {N}(0,\overline{c^2})\), \(LP_i(k)\) are i.i.d to \(\varGamma (\frac{1}{2},2\overline{c^2})\), \(i = 1,\dots ,m\). Thus, C(k) is the sum of m independent Gamma distribution \(\varGamma (\frac{1}{2},2\overline{c^2})\). Hence, \(C(k) \sim \varGamma (\frac{m}{2},2\overline{c^2})\). \(\square \)

Recall that for one-dimensional linear approximations, \(\overline{c^2}\) can be calculated by Proposition 1 when the dominant trails in a linear approximation are known.

Proposition 2 considers the scenario where the LP of base approximations are dominant. In this case, we approximate the capacity by summing the LP of base approximations and ignoring the LP of combined approximations (see Lemma 2). To show the reasonableness of this approximated capacity, we also bound the error of our approximation. For this part of analysis, please see Appendix B.

In the other hand, Proposition 3 considers the case that not only m base approximations but also \(2^m-1-m\) combined approximations have non-negligible contribution to the capacity. In this case, the correlations of \(2^m-1-m\) combined approximations are not independent any more. Thus, we derive the capacity in this case under another hypothesis.

Proposition 3

In an m-dimensional linear attack using the m-dimensional linear approximation with the probability distribution \(p_{\eta }(k)\) i.i.d to a normal distribution \(\mathcal {N}(2^{-m},\sigma ^2)\), \(\eta \in \mathbb {F}_2^m\), the fixed-key capacity of this m-dimensional linear approximation, C(k), follows Gamma-distribution \(\varGamma (\frac{2^m-1}{2},2 \cdot 2^m\sigma ^2)\).

Proof

Since \(p_{\eta }(k)\) are i.i.d. to \(\mathcal {N}(2^{-m},\sigma ^2)\),

$$Q = \sum _{\eta =0}^{2^m-1}\frac{(p_{\eta }(k)-2^{-m})^2}{\sigma ^2} \sim \chi ^2(2^m-1) = \varGamma (\frac{2^m-1}{2},2)$$

According to the definition of capacity,

$$C(k) = \sum _{\eta =0}^{2^m-1}\frac{(p_{\eta }(k)-2^{-m})^2}{2^{-m}}= 2^m \sigma ^2 Q = \varGamma (\frac{2^m-1}{2},2 \cdot 2^m\sigma ^2)$$

\(\square \)

Compared with Proposition 2 which considers only m base approximations with equally dominant correlations, Proposition 3 indeed addresses the situation where the correlation \(c_a(k)\) of \(2^m-1\) approximations are identically distributed (for the proof please refer to Appendix A). Thus, the average LP of \(2^m-1\) approximations are equal, denoted as \(\overline{c^2}\) again. As we know, the average capacity is the sum of the average LP of involved approximations, i.e., \((2^m-1)\cdot 2^m\sigma ^2 = (2^m-1)\overline{c^2}\), the distribution of capacity in Proposition 3 can also be represented as \(\varGamma (\frac{2^m-1}{2},2\overline{c^2})\).

Experimental Verification. In order to verify that the above analysis reflects the reality with reasonable accuracy, we have experimentally computed the capacity distributions sampled from 5000 randomly chosen keys for 5-round PRESENT. A set of usable one-dimensional linear approximations is discovered in [26], with theoretical average LP computed as \(2^{-16.83}\). Thus, the correlation distributions of these approximations are \(\mathcal {N}(0,2^{-16.83})\), and the LP distributions are \(\varGamma (\frac{1}{2},2^{-15.83})\) Footnote 2.

We can select linearly independent approximations from this set as the base approximations. Here we examine the 2-dimensional and 4-dimensional linear approximations for the case of Proposition 2.

In this case, the base approximations with input masks from different S-boxes in the first round and output masks from different S-boxes in the last round are chosen. According to Proposition 2, the theoretical distribution of 2-dimensional capacity is \(\varGamma (1,2^{-15.83})\) and of 4-dimensional capacity is \(\varGamma (2,2^{-15.83})\). The experimental distributions of 2-dimensional and 4-dimensional capacity sampled over 5000 keys are as (a) and (b) of Fig. 1, respectively.

Fig. 1.
figure 1

Experimental (black) and theoretical (red) distributions of the capacity for the 2 and 4-dimensional approximation of the first case (Color figure online)

As illustrated in Fig. 1, the experimental distribution of capacity follows the theoretical estimate closely. The scattering of data points occurs due to the fact that we basically use a histogram, and deal with raw data instead of averaging.

4 Distribution of Data Complexity

With the knowledge of capacity distribution, the distribution of data complexity, which approximates to \(\lambda \) times the reciprocal of capacity, can be obtained formally. Hereafter we focus on the case mentioned in Proposition 2. The case of Proposition 3 can be deduced in a similar way.

Corollary 1

If the fixed-key capacity of the multidimensional linear approximation follows \(C(k) \sim \varGamma (\frac{m}{2},2\overline{c^2})\), then the fixed-key data complexity of the corresponding multidimensional attack follows \(N(k) \sim \) Inv-Gamma(\(\frac{m}{2}\),\(\frac{\lambda }{2\overline{c^2}})\).

Corollary 1 is derived directly from Proposition 2 (also refer to Note 3), and addresses the case that m correlations of base approximations play a prominent role in the capacity. Since \(\lambda \) is a constant for any fixed success probability in an attack, w.l.o.g. hereafter we study the above data complexity distribution as Inv-Gamma(\(\frac{m}{2},\frac{1}{2\overline{c^2}})\). For each key k, N(k) is asymptotically inversely proportional to C(k). The average data complexity over all keys is denoted by N, \(N = E_k[N(k)]\), which is proportional to

$$E_k \bigg [\frac{1}{C(k)}\bigg ] = \frac{1}{|\mathcal {K}|}\sum _{k \in \mathcal {K}}\frac{1}{C(k)},$$

where \(\mathcal {K}\) denotes the whole key space, and \(E_k(\cdot )\) means an expected value taken over the whole key space. According to Corollary 1 and the mean of inverse Gamma distribution (see Note 3), the average data complexity is \(E_k[\frac{1}{C(k)}] = \frac{1}{2\overline{c^2}(m/2-1)}\) = \(\frac{1}{m\overline{c^2}-2\overline{c^2}}\).

Remark. The data complexity distribution in Corollary 1 also holds for single linear attacks where \(m = 1\). In the case of \(m = 1\), the average data complexity is infinite as pointed out by [19]Footnote 3, which corresponds to the fact that the mean of the distribution Inv-Gamma(\(\frac{1}{2},\frac{1}{2\overline{c^2}}\)) doesn’t exist. When m is equal to 2, the mean of the inverse Gamma distribution also doesn’t exist because there are always values going to infinite according to the distribution.

Similarly, the average capacity over the keys

$$E_k[C(k)] = \frac{1}{|\mathcal {K}|}\sum _{k \in \mathcal {K}}C(k)$$

is equal to \(m\overline{c^2}\), derived from the mean of the Gamma distribution in Proposition 2 (see Note 3).

Fig. 2.
figure 2

Distributions of the data complexity for m = 2, 4, 6, 8, 20.

Fig. 3.
figure 3

Distributions of the capacity for m = 2, 4, 6, 8, 20.

Example 5

For clearer explanation, hereafter a simple example which quite meets real situations in practical ciphers is used in our analysis. We take \(\overline{c^2}\) as \(2^{-40}\), which roughly equates the case in 15-round PRESENT, and take different m as 2, 4, 6, 8, 20 respectively. In this example, the distribution functions of data complexity are shown in Fig. 2, and the distribution functions of capacity are shown in Fig. 3.

5 Evaluation of the Data Complexity

In practical attacks, \(E_k[\frac{1}{C(k)}]\) and \(\frac{1}{E_k[C(k)]}\) are highly related to the evaluation of data complexity. Since \(E_k[\frac{1}{C(k)}]\) is hard to estimate, the complexity is usually measured by \(\frac{1}{E_k[C(k)]}\). In this section, we firstly propose a refined key equivalent hypothesis for \(E_k[C(k)]\) (Sect. 5.1). With the exact description of data complexity distributions, the difficulty of evaluating \(E_k[\frac{1}{C(k)}]\) is overcome, and a basic issue about the relation of average capacity and average data complexity is studied (Sect. 5.2). We also extend Leander’s idea of exploiting median data complexities [19] to multidimensional linear attacks (Sect. 5.3). Finally, all measures are compared.

5.1 Adjusted Key Equivalence Hypothesis

In regard to the connection between the fixed-key capacity and the average capacity in a multidimensional linear system, the traditional key equivalence hypothesis indicates that the fixed-key capacity does not deviate significantly from its average value [14, 18]. This key equivalence hypothesis can be interpreted as follows: \(C(k) \approx E_k[C(k)]\), for almost all keys k. As we have shown, the capacity is actually Gamma distributed so that this hypothesis does not hold. Thus, two questions arise: which value is suitable for the evaluation of the attack complexity? Is that average value enough and correct? We start with the following conjecture to show that the average capacity is far from being able to represent the majority of keys.

Conjecture 1

There are always less than half of the keys having a capacity larger than the average capacity. That is, \(|\{k^{*} \in \mathcal {K} | C(k^{*}) \ge E_k[C(k)]\}| < \frac{1}{2}|\mathcal {K}|\). Hence, less than half of the right keys can be recovered with a data complexity of \(\frac{\lambda }{E_k[C(k)]}\), where \(\mathcal {K}\) is the whole key space.

Table 1. The ratio of keys that have a capacity larger than the average capacity

This conjecture is illustrated in Table 1 with Example 5. With the increase of m, the ratio of keys that have a capacity larger than the average capacity approximates to \(\frac{1}{2}\), but cannot equal to \(\frac{1}{2}\). This is because, for such a skew Gamma distribution as in Proposition 2, the median value is always smaller than the mean. It can be concluded that, using the number of cipher texts equal to \(\frac{\lambda }{E_k[C(k)]}\), more than half of the keys cannot be recovered successfully with a reasonable probability. Thus, the average capacity is not enough to bring a sound estimation of attack complexities for most keys, especially when m is not large enough.

Since the capacity is highly dependent on the choice of the key, we concern that with how many data texts the multidimensional attacks can succeed for a majority of keys. A natural way to adjust the hypothesis is to consider the upper bound of data complexity for, e.g. 90 %, of the keys, meaning that for these 90 % keys the amount of data texts can guarantee a successful attack with high probability, even for some of these keys this data complexity is overestimated.

Hypothesis 6

(Adjusted Key Equivalence Hypothesis) If the capacity distribution of an m-dimensional linear attack satisfies Proposition 2, then \(90\,\%\) of the keys in the key space have a capacity no smaller than \(\mathcal {G}^{-1}(0.1)\), where \(\mathcal {G}\) is the cumulative distribution function of \(\varGamma (\frac{m}{2},2\overline{c^2})\). Using \(\frac{\lambda }{\mathcal {G}^{-1}(0.1)}\) data is enough for recovering \(90\,\%\) of the keys in the key space.

5.2 On Average Data Complexity

Why the Average Data Complexity is Calculable? It is known that in the classical single linear attacks considering linear hull effect, the average data complexity is hard to derive and usually infinite because of the existence of zero correlation. This difficulty now can be solved in the situation of m-dimensional linear attacks, since the average value can be easily derived from the accurate distributions of data complexity, when m is larger than 2. From the point of capacity distributions, we can understand more about the reason why the average data complexity is calculable in multidimensional attacks.

In the single linear setting, the keys with zero C(k) may make the average complexity infinite, thus, this part of keys should be focused on. Here, we point out that by taking multiple linear approximations simultaneously into consideration instead of only one, the number of keys with zero capacity can be very tiny so that the average complexity turns out to be computable.

We compare the ratio of keys bringing C(k) between zero and \(\epsilon \), where \(\epsilon \) is a fixed value very close to zero. From (b) of Fig. 3, it is obvious that with the increase of m, the ratio of keys with capacity going to zero decreases. This ratio for several fixed \(\epsilon \) is shown in Table 2. From Table 2 we can see that as the increase of m, the ratio of keys with capacity close to zero decreases dramatically. This is because as the number of approximations grows, for each key there is higher probability that at least one approximation brings a non-zero LP, so that a non-zero capacity. Hence, for a fixed \(\epsilon \), the more base approximations are used, the fewer the number of keys which bring infinite data complexities becomes. When \(\epsilon \) is small enough and m has a reasonable size, this ratio can be negligible in the whole key space. In this case it is sound to assume that there is no key causing a zero capacity, so that the average data complexity is computable.

A Difference Between \({\varvec{E}}_{\varvec{k}}\mathbf {[}\frac{\mathbf {1}}{{\varvec{C}}\mathbf ( {\varvec{k}}\mathbf ) }\mathbf {]}\) and \(\frac{\mathbf {1}}{{\varvec{E}}_{\varvec{k}}\mathbf {[}{\varvec{C}}\mathbf {(}{\varvec{k}}\mathbf {)}\mathbf {]}}\) . The problem discussed here is firstly found in the context of linear hull effect by Murphy [22]. We extend it to multidimensional linear attacks and make further investigation.

In some attack analysis, e.g. [10], the reduction in data complexity given by multiple approximations is based on the assertion that the data complexity N is proportional to \(\frac{1}{E_k[C(k)]}\). Like the effectiveness issue of linear hull effect studied in [22], there is also a difference between \(\frac{1}{E_k[C(k)]}\) and the actual average data complexity. According to Jensen’s Inequality and the fact that reciprocal of positive real numbers is a convex function, we have

$$E_k\bigg [\frac{1}{C(k)}\bigg ] \ge \frac{1}{E_k[C(k)]}.$$

Thus, the \(\frac{1}{E_k[C(k)]}\) can only be used to give a lower bound to the average data complexity.

Jensen’s Inequality gives a general comparison without considering the details of the variables. When the distributions of both C(k) and \(\frac{1}{C(k)}\) are known, \(E_k[\frac{1}{C(k)}]\) and \(\frac{1}{E_k[C(k)]}\) can be derived as in Sect. 4. Their difference is formulated as \(\frac{1}{m\overline{c^2}-2\overline{c^2}}-\frac{1}{m\overline{c^2}}\) = \(\frac{2}{m(m-2)\overline{c^2}}\). Therefore, in fact the equality will never hold for m larger than 2, i.e., \(E_k[\frac{1}{C(k)}]\) is always larger than \(\frac{1}{E_k[C(k)]}\). The difference can be ignored only when m is large enough. Figure 4 shows the difference for \(m = 4\) and \(m = 20\). For small m the difference is much more non-negligible, and \(\frac{1}{E_k[C(k)]}\) does not reflect the real average data complexity. As more approximations are involved, the difference has a quicker trend to be small. For a fixed m, the smaller the average LP is, the larger the difference becomes. That is, as \(\overline{c^2}\) decreases, which is a typical case since cryptanalysts always try to break as many rounds of the cipher as possible, the difference between \(E_k[\frac{1}{C(k)}]\) and \(\frac{1}{E_k[C(k)]}\) turns to be huge.

Table 2. The ratio of keys with capacity close to zero for different m and \(\epsilon \)

5.3 On Median Data Complexity

Leander proposed a way to overcome the problem of infinite data complexities for single linear attacks [19]. Namely, instead of studying the average complexity, he studied the median complexities \(\widetilde{N}\) such that for half of the keys the data complexity of an attack is less than or equal to \(\widetilde{N}\). So far the usage of median complexity in multidimensional linear attacks remains unsolved, which we will discuss in this section. A general definition of \(N_{p}\) is as follows, where \(\widetilde{N}\) = \(N_{1/2}\).

Definition 2

([19, Definition 1]) \(N_p\) is defined as the complexity such that the probability that for a given key the attack complexity is lower than \(N_p\), is p.

Although Leander gave this general definition, he focused on the case of \(N_{1/2}\) in single linear attacks. With the knowledge of accurate distributions of data complexity, we generalize Leander’s Theorem 2 in [19] not only under the multidimensional linear model but also from \(N_{1/2}\) to \(N_p\).

Theorem 2

Assuming independent subkeys in an m-dimensional linear attack using m base approximations with the i.i.d. LP that is \(\varGamma (\frac{1}{2},2\overline{c^2})\), p percent of the keys yield to a capacity of at least \(\mathcal {G}^{-1}(1-p)\), where \(\mathcal {G}\) is the cumulative distribution function of \(\varGamma (\frac{m}{2},2\overline{c^2})\). Thus, the complexity of this m-dimensional linear attack is less than \(\frac{\lambda }{\mathcal {G}^{-1}(1-p)}\) with the probability p.

Fig. 4.
figure 4

The difference between \(E_k[\frac{1}{C(k)}]\) and \(\frac{1}{E_k[C(k)]}\) with \(\overline{c^2}\) ranging from \(2^{-60}\) to \(2^{-40}\).

Leander’s Theorem 2 is a special case of Theorem 2 taking m as 1 and p as \(\frac{1}{2}\), when the noisy linear trails are ignored in the linear hull effect (If the noisy trails are considered, the ratio of keys reduces by a factor of 2). If we explain Leander’s Theorem 2 in our context, we use the fact that \(F^{-1}(1/2) = 0.46\overline{c^2}\), where F is the cumulative distribution function of \(\varGamma (\frac{1}{2},2\overline{c^2})\) (see [19] for more details).

As illustrated in (b) of Fig. 3, for the Y-axis at 1 / 2, the median capacity increases with the increment of m. That is, when the LP of base approximations are i.i.d., the more approximations we use, the lower data complexity we require for the same ratio of weak keys. Given a fixed capacity (so that a fixed data complexity), the ratio of keys causing a larger capacity than the fixed one increases when more base approximations are used. Thus, the ratio of weak keys resulting in a data complexity lower than the fixed one also increases.

Considering Example 5 again, we take different p, and fix the same \(\lambda \) (as 1 w.l.o.g.) for each m. The highest data complexity required for different m-dimensional linear attacks for p percent of keys is shown in Table 3.

When the general median complexity \(N_p\) is applied, there is such a question: which p is more suitable for measuring and comparing the strength of a linear attack. Obviously, it is meaningless to compare \(N_{1/3}\) and \(N_{2/3}\) directly. A natural and simple way is to consider the value of \(\frac{N_p}{p}\) because the division of p can unify the disparity for different \(N_p\) to a reasonably great extent. For example, if the attack complexity is lower than \(N_{1/3}\) with probability 1 / 3, then the attack requires to be repeated 3 times for a sufficiently sound success rate. This should be equivalently compared with the case that, let’s say, an attack with complexity lower than \(N_{1/2}\) has to be repeated twice. By confirming the existence of the minimal \(\frac{N_p}{p}\), we can evaluate different multidimensional linear attacks with the value of \(\min _{p}\frac{N_p}{p}\). The results are shown in Table 4.

Table 3. The highest data complexity for different m and different ratios of keys
Table 4. Comparison of the average data complexity, the median data complexity, the reciprocal of average capacity, and \(\min _{p}\frac{N_p}{p}\).

Moreover, comparing \(E_k[\frac{1}{C(k)}]\), \(\frac{1}{E_k[C(k)]}\) and the median complexity, we observe that the average complexity is always larger than the median one, and the median complexity is always larger than the reciprocal of average capacity. As m increases, the difference between these three values decreases. When m is large enough, these values are approximately equal (see Table 4), since the Gamma and Inverse Gamma distribution turn to be normal distributions.

6 Application to Cho’s Multidimensional Attack on PRESENT

6.1 Cho’s Attack on 25-Round PRESENT

The structure of PRESENT [7] makes it vulnerable for a multidimensional attack: there are several strong one-dimensional approximations. The linear hull of each such approximation with non-negligible correlations consists of several equally strong single-bit trails, whose intermediate masks have Hamming weight one. The average LP \(\overline{c^2}\) of all such approximations are \(2^{2(-2r)}L(r)\) [26], where L(r) is the number of r-round trails in each approximation. The so far best result for PRESENT is proposed by Cho aiming to 25 rounds [10]. Nine 23-round m-dimensional linear approximations are used simultaneously, and each of them has the dimension \(m = 8\) starting at one of the S-boxes \(S_i\), i = 5, 9 or 13 and ending at one of the S-boxes \(S_j\), j = 5, 6 or 7. They recover 16 bits of key in the first round and 16 bits of key in the last round. Please refer to [10] for more details of this attack. Cho proved that the average capacity is \(2^{-52.77}\), and gave the formula of data complexity as in [10]:

$$\begin{aligned} N = (\sqrt{advantage\cdot 4\cdot M}+4(\varPhi ^{-1}(2P_s-1))^2)/C = \lambda /C \end{aligned}$$
(2)

where \(\varPhi \) is the cumulative distribution function of the normal distribution, \(P_s\) is the success probability, C(p) is the capacity, M is the number of linear approximations used in the attack. In Eq. (2), if the advantage is equal to a bits, then the right key candidate should be within the position of \(2^{\ell -a}\), where \(\ell \) is the number of targeted key bits. Cho chose the \(\lambda = 2^{9.08}\) (advantage is 32 bits, \(M = 9\cdot (2^8-1)\), \(P_s = 0.95\))Footnote 4, and estimated the average data complexity about \(2^{61.85}\).

6.2 Our Investigation on Cho’s Attack

We give a simpler but close estimation on the capacity and data complexity of Cho’s attack. The authors in [16] claimed that Cho observes in practical experiments that the probability distribution of multidimensional linear approximations varies a lot with the keys, while the capacity remains rather constant. We have shown that the capacity also varies for different keys from theoretical and experimental viewpoints.

In order to attack 25-round PRESENT, 23-round approximations are used, thus \(r = 23\). According to [26], \(L(23) = 367261713\), thus \(\overline{c^2} = 2^{-63.55}\). With Propositions 2 and 3, the fixed-key capacity of 9 8-dimensional approximations is estimated to be \(\varGamma (9 \cdot \frac{2^8-1}{2},2^{-62.55})\). Hence, the average capacity is \(2^{-52.39}\). With the same \(\lambda \) as Cho, we obtain the data complexity \(N = \frac{2^{9.08}}{C(k)} \sim Inv-Gamma (9 \cdot \frac{2^8-1}{2},2^{71.63})\). The average data complexity is \(2^{61.47}\). This result is very close to the estimate in Cho’s attack, but easier to compute.

In the same way, we compute the capacity distribution used for 26-round PRESENT, which approximates to \(\varGamma (9 \cdot \frac{2^8-1}{2},2^{-65.16})\). With the knowledge of distributions, we can derive the exact number of weak keys corresponding to different attack scenarios. Using Cho’s attack method by taking \(\lambda = 2^{7.58}\) (advantage is 4 bits, \(P_s=0.8\)), there are now \(2^{123.24}\) (3.7 % in the whole key space) weak keys with capacity larger than \(2^{-54.92}\). That means, for \(2^{123.24}\) keys out of \(2^{128}\) keys, 26-round PRESENT can be attacked using less than \(2^{62.5}\) plaintext/ciphertext pairs, with success probability 0.8.

7 Conclusion and Further Work

In this paper, we deal with the multidimensional linear attacks using m base approximations with i.i.d. correlations (linear probability). We focus more on the case where the base linear approximations can be regarded as statistically independent. In this case, we point out that the capacity of multidimensional linear approximations satisfies a Gamma distribution, which also leads to an exact Inverse Gamma distribution for the data complexity. Both distributions are parametrized by the dimension and the average linear probability of each approximation. These theoretical results have been verified by experiments on PRESENT. We establish an explicit connection between the fixed-key behaviour and the average behaviour. Based on the distributions, several fundamental issues are discussed in more detail. Multidimensional linear attacks not only benefit from data complexity, but also offer more convenience for measuring the average data complexity due to the fact that the ratio of keys with capacity going to zero decreases with the increase of dimension. The relation of the median and average data complexity, as well as the inverse of average capacity is derived. When the dimension is large enough, these three values are infinitely close. We also propose a modified key equivalent hypothesis that is more suitable for practical situations. Finally, the multidimensional linear attack on 25- and 26-round PRESENT is analyzed based on our theoretical result.

In future work, more complicated cases about the relations of LP distributions should be studied, which may bring more precise evaluation on multidimensional attacks. The measure of \(\frac{N_p}{p}\) can be extended to single linear attacks. Moreover, given the close relation between statistical saturation attacks and multidimensional linear attacks, our results may allow a clearer understanding for the capacity of statistical saturation attacks, whose key-dependent performance still lacks accurate measurement.