Keywords

1 Introduction

Take any n-bit hash function h. Assuming that this hash function can be modelled as a random function, the probability that the outputs of h collide given \(q \ll 2^{n/2}\) distinct inputs is about \(q^2/2^n\): the well-known birthday attack.

Now let us consider another hash function g, defined as the r-th iterate of h, i.e. \(g(m) = h(h(\ldots h(m)))\), where h is applied r times. For the same number of queries \(q \ll 2^{n/2}\), the birthday attack has about an r times higher probability to succeed for g than for h (see e.g. Preneel and van Oorschot [18, Lemma 2]).

Iteration is of fundamental importance in many cryptographic constructions. For example, a “possibly weak” function may be iterated to improve its resistance against various cryptanalysis attacks, or a password hashing function may be iterated to slow down dictionary attacks. But quite surprisingly, the security of iterating a random function is not yet a well-understood problem.

In the aforementioned (non-adaptive) birthday attack, the distinguishing advantage between a random function and an iterated random function increases by about a factor r. But what happens if we consider adaptive collision-finding attacks as well? Or in general, what if we want to consider any adaptive attack, not necessarily a collision-finding attack? Could there be more efficient attacks that have not yet been discovered?

Recently at CRYPTO 2015, Minaud and Seurin [15] put this possibility to rest for the iterated random permutation problem. They proved that the advantage to distinguish an iterated random permutation from a random permutation using q queries is bounded by O(qr / N), where N is the size of the domain, and showed that their bound is almost tight by providing a matching attack.

In this paper, we will do the same for the iterated random function problem. Whereas the best bound in previous work is \(O(q^2r^2/N)\), we will prove a bound of \(O(q^2r(\log r)^3/N)\), where \(\log \) is the logarithm to the base e. Our bound is tight up to a factor of about \((\log r)^3\), and thereby rules out the possibility of better attacks.

Note. We will focus on asymptotic bounds for large r, as this is parameter range where large improvements over the currently best-known bounds can be achieved. Although our bounds hold for any \(r\ge 2\), we will apply generous relaxations to derive an easy-to-see bound that only improves the currently-known bounds for larger, but nevertheless practically-relevant values of r. Also, we will only consider the iteration of a uniformly random function in an information-theoretic setting. A simple hybrid argument can be used to extend this result to the pseudorandom function (prf) advantage in a computational setting, as shown by Minaud and Seurin [15, Theorem 1] for the iterated random permutation problem.

Applications. In spite of the frequent use of iterated random functions in practice, this paper is the first to study this problem without relying on the trivial CBC-MAC bound. The most obvious application of iterated random functions is in password hashing, where a hash function is iterated in order to slow down brute force attacks. This idea is used in PKCS #5’s PBKDF1 and PBKDF2. In typical password-based key derivation functions, the iteration count is often quite high, ranging from several hundreds of thousands [9], to even ten million [19], as suggested by NIST for critical keys. To analyse the effect of iteration in these constructions, it is common to model the secret low-entropy password as a random-but-known key [11], or even an adversarially-chosen input [20]. But also small values of r, such as \(r=2\), appear in practical applications. In the book “Practical cryptography” [13], Ferguson and Schneier suggest to use SHA-256(SHA-256(m)) to avoid length-extension attacks. They use this construction in their RSA encryption implementation, as well as in their Fortuna random number generator. Interestingly, about \(2^{64}\) evaluations of SHA-256(SHA-256(m)) are performed every second as part of bitcoin mining [21].

Related Work. The security of an iterated random function was first analysed by Yao and Yin [22, 23], when they analysed the security of the password-based key derivation functions PBKDF1 and PBKDF2. Their work is parallel to that of Wagner and Goldberg [20], who analysed the security of an iterated random permutation in the context of the Unix password hashing algorithm. Bellare et al. [4] extended these results, and also pointed out some problems in the proofs of Yao and Yin.

As Wagner and Goldberg explain in [20], it is possible to interpret the iterated random permutation problem as a special case of CBC-MAC where the iteration count r equals the number of message blocks, and all message blocks except for the first one are all-zero. The same holds for the iterated random function problem, except that a random function instead of a random permutation is used inside the CBC-MAC construction.

A first proof of the security of CBC-MAC was given by Bellare et al. in [1, 2]. For CBC-MAC with a random function, they prove that the advantage of an information-theoretic adversary that makes at most q queries is upper bounded by \(1.5r^2q^2/N\). Using the well-known prp-prf switching lemma [5], they derive from this an upper bound of \(2r^2q^2/N\) for CBC-MAC with a random permutation. The simplicity of CBC-MAC makes it a good test case for various proof techniques. Of particular interest is the short proof of CBC-MAC by Bernstein [7]. For a more detailed proof using the same technique, we refer to Nandi [16].

In [3], Bellare et al. proved a security bound that is linear in r, instead of quadratic in r as in previous proofs. They point out that their analysis only applies to CBC-MAC with a random permutation, and not with a random function: such a bound is ruled out by an attack by Berke [6]. However, Berke’s attack cannot be translated to the iterated random function problem, as the number of message blocks for each of the queries in the attack is not constant.

The iterated random function problem is similar to the nested iterated (NI) construction that Gaži et al. [14] analysed at CRYPTO 2014. However, the analysis of the NI construction critically relies on the use of two different random functions, or more precisely on the use of a pseudo-random function (prf) with two different keys. Our analysis applies to the case where only one random function is iterated. As we will show, the iterated random function problem will require a more complicated analysis of collision probabilities, in order to avoid ending up with a bound that is quadratic in r.

Main Results. The main results of this paper are the proofs of two theorems. Theorem 1 bounds the success probability of a common class of collision adversaries, and Theorem 2 bounds the advantage of distinguishing an iterated random function from a random function. In these theorems, the function \(\phi (q,r)\) is defined as

Theorem 1

Let f be a random function, and let be a collision-finding adversary that makes q queries to \(f^r\) as follows: every query is either chosen from a set (of size \(m\le q\)) of predetermined points, or is the response of a previous query. Under the assumption that \(N\log r>90\), the following bound holds for the success probability of :

Theorem 2

Let f be a random function, and let be an adversary trying to distinguish \(f^r\) from f through q queries. Then, under the assumption that \(N\log r>90\), we have

A Note on the Setting. We should point out that our results are in an indistinguishability setting. Our goal is to distinguish, in a black-box way, between an iterated random function and a random function. In the indifferentiability setting, the adversary also has access to the underlying random function, or to a simulator that tries to mimic its behaviour. Dodis et al. [12] proved that indifferentiability for an iterated random function holds only with poor concrete security bounds, as they provide a lower bound on the complexity of any successful simulator.

Outline. Notation and preliminaries are introduced in Sect. 2. We study the probabilities to find various types of collisions in a random function in Sect. 3. These results are used in Sect. 4 to bound the probabilities of single-trail attacks and two-trail collision attacks, and eventually to also bound a more general collision attack on an iterated random function. The advantage of distinguishing an iterated random function from a random function is bounded in Sect. 5. For readability, we defer the technical proof of Lemma 7 of Sect. 4 to Sect. 6. We conclude the paper in Sect. 7.

2 Notation and Preliminaries

In this section, we will state some simple lemmas without proof. The proofs of these lemmas can be found in the full version of this paper [8].

Functions. Let \(f: \mathcal {D}\rightarrow \mathcal {D}\) be a function over a domain \(\mathcal {D}\) of size N. A collision for a function f is defined as a pair \((x,x')\in \mathcal {D}\) with \(x\ne x'\) such that \(f(x)=f(x')\). A three-way collision is a triple \((x,x',x'')\) such that \(f(x)=f(x')=f(x'')\) for distinct x, \(x'\) and \(x''\). For a positive integer r, the r-th iterate \(f^r\) of a function f is defined inductively as follows:

$$\begin{aligned} f^1&=f,\\ f^r&=f\circ f^{r-1},r>1. \end{aligned}$$

By convention, let \(f^0\) be the identity function. In the remainder of this paper, we will assume that \(r\ge 2\). Let a random function denote a function that is drawn uniformly at random from the set of all functions of the same domain and range.

Falling Factorial Powers and the \(\beta \) Function. We use the falling factorial powers notation, where for a non-negative integer \(i\le N\), is defined as

(1)

Note that denotes the number of permutations of N items taken i at a time, or the number of ways to choose a sample of size i without replacement from a population of size N. When \(i>N\), we define . We also define a function \(\beta (i)\) that we will frequently encounter:

(2)

Again, we define \(\beta (i):=0\) for \(i>N\). We derive below a simple bound on \(\beta (i)\).

Lemma 1

Let \(\alpha >0\) be a real number. Then, for \(i\ge \sqrt{2\alpha N}+1\), we have

$$\begin{aligned} \beta (i)\le e^{-\alpha }. \end{aligned}$$

Partial Sums of the Harmonic Series. The divergent infinite series

$$\begin{aligned} \sum _{i=1}^{\infty }\frac{1}{i} = 1+\frac{1}{2}+\frac{1}{3}+\frac{1}{4}+\cdots \end{aligned}$$

is known as the harmonic series. We will be interested in partial sums of the series of the form

$$\begin{aligned} \sum _{i=a+1}^{b}\frac{1}{i} = \frac{1}{a+1}+\frac{1}{a+2}+\cdots +\frac{1}{b-1}+\frac{1}{b}. \end{aligned}$$

We will use the following simple bound for this sum. Throughout this paper, let \(\log \) denote the natural logarithm, that is the logarithm to the base e.

Lemma 2

For any two positive integers a and b with \(b\ge a\),

$$\begin{aligned} \sum _{i=a+1}^{b}\frac{1}{i}\le \log \left( \frac{b}{a}\right) \end{aligned}$$

Counting Divisors. For a positive integer a and an integer b we use the notation \(a\vert b\) to denote a divides b, i.e., \(ak=b\) for some integer k. We write \(a\not \mid b\) when a does not divide b. The number of divisors of b is denoted . We will use the following simple bound on .

Lemma 3

For any positive integer b,

The \(\sigma \) Function. The function \(\sigma (b)\) defined as

$$\begin{aligned} \sigma (b):=\sum _{a\vert b}a \end{aligned}$$

denotes the sum of the divisors of b. We will use the following simple lemma about \(\sigma (b)\).

Lemma 4

For any positive integer b,

$$\begin{aligned} \sum _{a\vert b}\frac{b}{a} = \sigma (b). \end{aligned}$$

A simple bound on \(\sigma (b)\) can be obtained as follows.

Lemma 5

For any positive integer \(b\ge 2\),

$$\begin{aligned} \sigma (b) < 3b\log b. \end{aligned}$$

3 Random Function Collisions

In this section, we look at different approaches to find collisions on a random function f. We will bound their success probabilities, and use them in Sect. 4 to get bounds on the success probabilities of collision attacks on an iterated random function \(f^r\).

3.1 Single-Trail Attack

Single-Trail Attack. Let [q] denote the set \(\{1,\ldots ,q\}\). The single-trail attack works by starting with an arbitrary initial point x and producing a trail of points, hoping to find a collision. A trail is uniquely defined by q queries \(f^{i-1}(x)\) for \(i\in [q]\), where the i-th query \(f^{i-1}(x)\) has response \(f^{i}(x)\). We assume that the attack does not stop when a collision is found, but makes q queries and then checks for collisions. If a collision is found, it will appear as a rho-shaped trail, as illustrated in Fig. 1. Therefore, a collision obtained through a single-trail attack will be called a \(\rho \)-collision.

Fig. 1.
figure 1

Single-trail attack starting from x, resulting in a \(\rho \) collision with tail length t and cycle length c. We call the probability of this collision \({\textsf {cp}}_{\rho }(t,c)\).

Terminology. Suppose the q-query single-trail attack finds a collision. For some tc, suppose it takes \(t+c\) queries to find this collision, so that

$$\begin{aligned} f^{t+c}(x)=f^t(x), \end{aligned}$$

i.e., the output of the \((t+c)\)-th query is identical to the output of the t-th query. Then, t is called the tail length of the \(\rho \)-collision, and c is called the cycle length. For fixed tc, we want to bound the probability that a q-query single-trail attack gives a \(\rho \)-collision on f with tail length t and cycle length c. Call this probability .

Bounding . To get a \(\rho \)-collision on f with tail length t and cycle length c, we need to call f at \(t+c\) distinct values. Thus, if \(q<t+c\), . So suppose \(q\ge t+c\). Out of these \(t+c\) calls to f, the first \(t+c-1\) give distinct outputs, and the last coincides with the t-th output. Thus, the number of different ways this can happen is , out of the total \(N^{t+c}\) possible outcomes for the \(t+c\) calls to f. Thus,

This is just a function of t and c (since the queries made after the collision is found are of no consequence), so we will use the simpler notation , with the implicit assumption that \(q\ge t+c\). For a fixed real \(\alpha >0\), when \(t+c\ge \sqrt{2\alpha N}+2\), Lemma 1 gives us the bound

(3)

When \(t+c<\sqrt{2\alpha N}+2\), we will simply use the bound

(4)

3.2 Two-Trail Attack

Two-Trail Attack. In the two-trail attack, we start with two different points \(x_1\) and \(x_2\), and produce two trails: the trail \(f^{i-1}(x_1)\) for \(i\in [q_1]\), and the trail \(f^{i-1}(x_2)\) for \(i\in [q_2]\), hoping to find a collision. In total \(q_1+q_2\) queries are made, where the i-th query for \(i\in [q_1]\) is \(f^{i-1}(x_1)\), with response \(f^i(x_1)\), and the \((q_1+i)\)-th query for \(i\in [q_2]\) is \(f^{i-1}(x_2)\), with response \(f^i(x_2)\). If a collision is found, the two trails will form a lambda shape, as illustrated in Fig. 2. Therefore, a collision obtained through a two-trail attack will be called a \(\lambda \)-collision.

Fig. 2.
figure 2

Two-trail attack starting from \(x_1\) and \(x_2\), resulting in a \(\lambda \)-collision with foot lengths \(t_1\) and \(t_2\), respectively. We call the probability of this collision \({\textsf {cp}}_{\lambda }(t_1,t_2)\).

Terminology. Suppose the \((q_1,q_2)\)-query two-trail attack finds a \(\lambda \)-collision, regardless of whether a \(\rho \)-collisions has occurred on either trail. Suppose that a \(\lambda \)-collision is found after making \(t_1\) queries along the first trail and \(t_2\) queries along the second, i.e.,

$$\begin{aligned} f^{t_1}(x_1) = f^{t_2}(x_2). \end{aligned}$$

\(t_1\) and \(t_2\) are called the foot lengths of the \(\lambda \)-collision. For fixed \(t_1,t_2\), we want to bound the probability that a \((q_1,q_2)\)-query two-trail attack finds a \(\lambda \)-collision with foot lengths \(t_1\) and \(t_2\). Denote this probability as \({\textsf {cp}}_{\lambda }[q_1,q_2](t_1,t_2)\).

Bounding \({\textsf {cp}}_{\lambda }[q_1,q_2](t_1,t_2)\). To get a \(\lambda \)-collision on f with foot lengths \(t_1\) and \(t_2\), we need to call f at \(t_1\) distinct values on the first trail and \(t_2\) distinct values on the second trail. Thus, if \(q_1<t_1\) or \(q_2<t_2\), \({\textsf {cp}}_{\lambda }[q_1,q_2](t_1,t_2)=0\). So we assume \(q_1\ge t_1\) and \(q_2\ge t_2\). Out of these \(t_1+t_2\) queries, the first \(t_1-1\) in one trail and the first \(t_2-1\) in the other trail give distinct outputs, and the last calls on the two trails coincide on a value distinct from all the earlier ones, i.e., the \(t_1+t_2\) calls lead to \(t_1+t_2-1\) distinct outputs, and one collision. Thus, the number of different ways this can happen is , out of the total \(N^{t_1+t_2}\) possible outcomes for the \(t_1+t_2\) calls to f. Thus,

Again, this is only a function of \(t_1\) and \(t_2\) (since the queries made after the collision is found are of no consequence), so we will use the simpler notation \({\textsf {cp}}_{\lambda }(t_1,t_2)\), with the implicit assumption that \(q_1\ge t_1\) and \(q_2\ge t_2\). For our purposes it will be enough to use the bound

$$\begin{aligned} {\textsf {cp}}_{\lambda }(t_1,t_2)\le \frac{1}{N}. \end{aligned}$$
(5)

3.3 A \(\lambda \rho \)-Double-Collision on a Two-Trail Attack

When a two-trail attack leads to two collisions, a double-collision is said to occur. In Sect. 4, in addition to the above bounds, we also need a bound on the probability of two closely related double-collisions. We deal with a \(\lambda \rho \)-double-collision in this section, and a \(\rho '\)-double-collision in the next. A \(\lambda \rho \)-double-collision takes place when a two-trail attack leads to a \(\lambda \)-collision, and then the combined trail becomes the tail of a \(\rho \)-collision, as shown in Fig. 3.Footnote 1

Fig. 3.
figure 3

Two-trail attack starting from \(x_1\) and \(x_2\), resulting in a \(\lambda \rho \)-collision. First, there is a \(\lambda \)-collision with foot lengths \(t_1\) and \(t_2\), respectively. Then, the combined trail continues for \(\varDelta t\) queries, and completes a cycle of length c, after which a \(\rho \)-collision occurs. We call the probability of this double-collision \({\textsf {cp}}_{\lambda \rho }(t_1,t_2,\varDelta t,c)\).

Terminology. We assign four parameters to this collision: the foot lengths \(t_1\) and \(t_2\) of the \(\lambda \), the intervening length \(\varDelta t\) between the two collisions, and the cycle length c of the \(\rho \). Note that \(\varDelta t\) can be seen as the tail length of the \(\rho \)-collision if we imagine it to have resulted from a single-trail attack beginning at the point of the \(\lambda \)-collision. For fixed \(t_1,t_2,\varDelta t,c\) we want to find the probability that a \((q_1,q_2)\)-query two-trail attack finds a \(\lambda \rho \)-double-collision with foot lengths \(t_1\) and \(t_2\), intervening length \(\varDelta t\) and cycle length c. Call this probability \({\textsf {cp}}_{\lambda \rho }[q_1,q_2](t_1,t_2,\varDelta t,c)\).

Bounding \({\textsf {cp}}_{\lambda \rho }[q_1,q_2](t_1,t_2,\varDelta t,c)\). To get a \(\lambda \)-collision on f with foot lengths \(t_1\) and \(t_2\), we need to call f at \(t_1\) distinct values on the first trail, and \(t_2\) distinct values on the second trail; and to get a \(\rho \)-collision on f with tail length \(\varDelta t\) and cycle length c, we need to call f at \(\varDelta t\) common values on each trail, and a further c points on the first trail; this adds up to \(t_1+t_2+\varDelta t+c\) distinct values in all. Thus, when \(q_1<t_1+\varDelta t+c\) or \(q_2<t_2+\varDelta t\), \({\textsf {cp}}_{\lambda \rho }[q_1,q_2](t_1,t_2,\varDelta t,c)=0\). So we assume \(q_1\ge t_1+\varDelta t+c\) and \(q_2\ge t_2+\varDelta t\). These \(t_1+t_2+\varDelta t+c\) calls lead to \(t_1+t_2+\varDelta t+c-2\) distinct outputs, and two collisions. Thus, the number of different ways this can happen is , out of the total \(N^{t_1+t_2+\varDelta t+c}\) possible outcomes for the \(t_1+t_2+\varDelta t+c\) calls to f. Thus,

As before, this is only a function of \(t_1,t_2,\varDelta t\) and c (since the queries made after the \(\rho \) collision is found are of no consequence), so we use the simpler notation \({\textsf {cp}}_{\lambda \rho }(t_1,t_2,\varDelta t,c)\), with the implicit assumption that \(q_1\ge t_1+\varDelta t+c\) and \(q_2\ge t_2+\varDelta t\). For a fixed real \(\alpha >0\), when \(t_1+t_2+\varDelta t+c\ge \sqrt{2\alpha N}+3\), Lemma 1 gives us the bound

$$\begin{aligned} {\textsf {cp}}_{\lambda \rho }(t_1,t_2,\varDelta t,c)\le \frac{e^{-\alpha }}{N^2}. \end{aligned}$$
(6)

When \(t_1+t_2+\varDelta t+c<\sqrt{2\alpha N}+3\), we will simply use the bound

$$\begin{aligned} {\textsf {cp}}_{\lambda \rho }(t_1,t_2,\varDelta t,c)\le \frac{1}{N^2}. \end{aligned}$$
(7)

3.4 A \(\rho '\)-Double-Collision on a Two-Trail Attack

A \(\rho '\)-double-collision takes place when a two-trail attack leads to a \(\rho \) with two tails. This is shown in Fig. 4. We will allow \(\varDelta t=0\), in which case a three-way collision occurs.

Fig. 4.
figure 4

Two-trail attack starting from \(x_1\) and \(x_2\), resulting in a \(\rho '\)-collision with tail lengths \(t_1\) and \(t_2\), intervening length \(\varDelta t\), and cycle length c. We will allow \(\varDelta t = 0\), in which case a three-way collision occurs. We call the probability of this double-collision \({\textsf {cp}}_{\rho '}(t_1,t_2,\varDelta t,c)\).

Terminology. As before, we assign four parameters to this collision: the tail lengths \(t_1\) and \(t_2\) of the \(\rho \), the intervening length \(\varDelta t\) between the two collisions, and the cycle length c of the \(\rho \). For fixed \(t_1,t_2,\varDelta t,c\) we want to find the probability that a two-trail attack with sufficiently many queries finds a \(\rho '\)-double-collision with tail lengths \(t_1\) and \(t_2\), intervening length \(\varDelta t\), and cycle length c. Call this probability \({\textsf {cp}}_{\rho '}[q_1,q_2](t_1,t_2,\varDelta t,c)\).

Bounding \({\textsf {cp}}_{\rho '}[q_1,q_2](t_1,t_2,\varDelta t,c)\). The bounding of \({\textsf {cp}}_{\rho '}[q_1,q_2](t_1,t_2,\varDelta t,c)\) is almost identical to that of \({\textsf {cp}}_{\lambda \rho }[q_1,q_2](t_1,t_2,\varDelta t,c)\). To get a \(\rho '\)-double-collision with tail lengths \(t_1\) and \(t_2\), intervening length \(\varDelta t\), and cycle length c, we need to call f at \(t_1+c-\varDelta t\) distinct values on the first trail, \(t_2\) distinct values on the second trail, and \(\varDelta t\) common values on each trail, resulting in calls at \(t_1+t_2+c\) distinct values in all. Thus, when \(q_1<t_1+c\) or \(q_2<t_2+\varDelta t\), \({\textsf {cp}}_{\rho '}[q_1,q_2](t_1,t_2,\varDelta t,c)=0\). So we assume \(q_1\ge t_1+c\) and \(q_2\ge t_2+\varDelta t\). These \(t_1+t_2+c\) calls lead to \(t_1+t_2+c-2\) distinct outputs. Thus, the number of different ways this can happen is , out of the total \(N^{t_1+t_2+c}\) possible outcomes for the \(t_1+t_2+c\) calls to f. Thus,

As before, this is only a function of \(t_1,t_2,\varDelta t\) and c (since the queries made after the \(\rho \) collision is found are of no consequence), so we use the simpler notation \({\textsf {cp}}_{\lambda \rho }(t_1,t_2,\varDelta t,c)\), with the implicit assumption that \(q_1\ge t_1+\varDelta t+c\) and \(q_2\ge t_1+\varDelta t\). Recalling that

$$\begin{aligned} {\textsf {cp}}_{\lambda \rho }(t_1,t_2,0,c)=\frac{\beta (t_1+t_2+c-2)}{N^2}, \end{aligned}$$

we conclude that

$$\begin{aligned} {\textsf {cp}}_{\rho '}(t_1,t_2,\varDelta t,c)={\textsf {cp}}_{\lambda \rho }(t_1,t_2,0,c). \end{aligned}$$
(8)

4 Iterated Random Function Collisions

In this section we revisit the two types of collision attacks described in Sect. 3, and analyse their success probabilities when applied to \(f^r\). The main proof in this paper relies heavily on the results obtained in this section.

A Cautionary Note. At first glance, this section may appear to be similarly organised as Sect. 3. It is important to keep in mind that we are now interested in something entirely different. In Sect. 3, we looked at the probabilities of specific \(\rho \)- and \(\lambda \)-collisions with fixed parameters. In this section, instead, we focus on the probabilities that single-trail attacks and two-trail attacks of some specified number of queries succeed in finding collisions on \(f^r\). By reducing these collisions to collisions on f, we can use the union bound on the bounds obtained in Sect. 3 to get the desired bounds. To distinguish from the collision probabilities on f, which we denoted , we now use the notation for the collision probabilities on \(f^r\).

4.1 Single-Trail Attack

We want to bound the probability that a q-query single-trail attack finds a collision on \(f^r\). Call this probability \({\textsf {cp}}^r_{\rho }[q]\).

Reducing to Collision on f. Suppose the q-query single-trail attack finds a \(\rho \)-collision on \(f^r\) with tail length \(t'\) and cycle length \(c'\). Observe that this collision necessarily arises out of a \(\rho \)-collision on f, with tail length t and cycle length c for some tc. This can happen in two ways:

  • Direct Collision. This happens when r divides c. Then, define k such that rk is the first multiple of r that is not less than t, i.e.,

    $$\begin{aligned} k:=\left\lceil \frac{t}{r}\right\rceil , \end{aligned}$$

    then \(rk+c\) is also a multiple of r, and since \(f^{t+c}(x)=f^{t}(x)\), and \(rk\ge t\), we also have

    $$\begin{aligned} f^{rk+c}(x)=f^{rk}(x). \end{aligned}$$

    Writing

    $$\begin{aligned} k'=\frac{c}{r}, \end{aligned}$$

    we have

    $$\begin{aligned} (f^r)^{k+k'}(x)=(f^r)^{k}(x), \end{aligned}$$

    our \(\rho \)-collision on \(f^r\). Note that according to this notation,

    $$\begin{aligned} {t'=k=\left\lceil \frac{t}{r}\right\rceil ,c'=k'=\frac{c}{r}.} \end{aligned}$$

    Loosely speaking, in a direct collision, the first collision on f arrives in phase with r, i.e.,

    $$\begin{aligned} t=t+c{\textsf { mod }}r, \end{aligned}$$

    so that this first collision on f leads immediately to a collision on \(f^r\) at the next multiple of r.

  • Delayed Collision. A delayed collision occurs when r does not divide c, i.e., the first collision arrives out of phase. Then we need to keep cycling about the \(\rho \) of f till the phase is adjusted, and only then we arrive at the next multiple of r and find a collision on \(f^r\). Suppose it cycles around \(\eta \) times. For the phase to be adjusted, \(c\eta \) should be a multiple of r. The smallest value of \(\eta \) that satisfies this is

    $$\begin{aligned} \eta =\frac{r}{d} \end{aligned}$$

    where \(d={\textsf {gcd}}(c,r)\) is the greatest common divisor of c and r. Let \(k=\left\lceil \frac{t}{r}\right\rceil \) as before, and let

    $$\begin{aligned} k'=\frac{c}{d}. \end{aligned}$$

    As before, since we have \(f^{t+c\eta }(x)=f^{t}(x)\), and \(rk\ge t\), we have

    $$\begin{aligned} f^{rk+c\eta }(x)=f^{rk}(x), \end{aligned}$$

    which gives us the \(\rho \)-collision

    $$\begin{aligned} (f^r)^{k+k'}(x)=(f^r)^{k}(x), \end{aligned}$$

    as before. Again, according to this notation,

    $$\begin{aligned} {t'=k=\left\lceil \frac{t}{r}\right\rceil ,c'=k'=\frac{c}{d}.} \end{aligned}$$

Required Conditions. Observing that a direct collision can be seen as a special case of delayed collision, where \(d={\textsf {gcd}}(c,r)=r\), we can summarise the above as follows: a \(\rho \)-collision on f with tail length t and cycle length c eventually leads to a \(\rho \)-collision on \(f^r\) with tail length \(t'\) and cycle length \(c'\) where

$$\begin{aligned} t'=k=\left\lceil \frac{t}{r}\right\rceil ,c'=k'=\frac{c}{d}, \end{aligned}$$

with \(d={\textsf {gcd}}(c,r)\) as before. Thus, for a \(\rho \)-collision on f to result in a \(\rho \)-collision on \(f^r\), the only required condition is that q is sufficiently large, i.e.,

$$\begin{aligned} t'+c'\le q. \end{aligned}$$

In terms of t and c, this becomes

$$\begin{aligned} {\left\lceil \frac{t}{r}\right\rceil +\frac{c}{d}\le q.} \end{aligned}$$

Recall that we are trying to bound the probability \({\textsf {cp}}^r_{\rho }[q]\) of finding a \(\rho \)-collision on \(f^r\) in q queries. This is equivalent to the probability of finding a \(\rho \)-collision on f with the parameters t and c satisfying the above condition. Recall that in Sect. 3, we bounded this probability for a fixed (tc), which we called . We can now use the union bound to get a bound on \({\textsf {cp}}^r_{\rho }[q]\).

Using the Union Bound on \({\textsf {cp}}^r_{\rho }[q]\). Let \(\mathcal {S}\) be the set of (tc) values that satisfy the requirement

$$\begin{aligned} \left\lceil \frac{t}{r}\right\rceil +\frac{c}{{\textsf {gcd}}(c,r)}\le q. \end{aligned}$$

For a fixed \(\alpha >0\), we can split \(\mathcal {S}\) into two parts:

$$\begin{aligned} \mathcal {S}^+[\alpha ]&:=\left\{ (t,c)\in \mathcal {S}\mid t+c\ge \sqrt{2\alpha N}+2\right\} ,\\ \mathcal {S}^-[\alpha ]&:=\left\{ (t,c)\in \mathcal {S}\mid t+c<\sqrt{2\alpha N}+2\right\} . \end{aligned}$$

Applying the union bound with bounds (3) and (4) obtained for gives

$$\begin{aligned} {\textsf {cp}}^r_{\rho }[q]&\le \sum _{\mathcal {S}}{\textsf {cp}}_{\rho }(t,c)\nonumber \\&= \sum _{\mathcal {S}^+ [\alpha ]}{\textsf {cp}}_{\rho }(t,c) + \sum _{\mathcal {S}^-[\alpha ]}{\textsf {cp}}_{\rho }(t,c)\nonumber \\&\le \sum _{\mathcal {S}^+ [\alpha ]}\frac{e^{-\alpha }}{N} + \sum _{\mathcal {S}^-[\alpha ]}\frac{1}{N}\nonumber \\&= \#\mathcal {S}^+[\alpha ]\cdot \frac{e^{-\alpha }}{N}+\#\mathcal {S}^-[\alpha ]\cdot \frac{1}{N} \end{aligned}$$
(9)

Bounding \(\#\mathcal {S}^-[\alpha ]\). We observe that whenever \((t,c)\in \mathcal {S}^-[\alpha ]\),

$$\begin{aligned} t<\sqrt{2\alpha N}+2, \end{aligned}$$

and

$$\begin{aligned} c<q\cdot {\textsf {gcd}}(c,r). \end{aligned}$$

If we count the number of (tc) satisfying these conditions, it will give us an upper bound on \(\#\mathcal {S}^-[\alpha ]\). There are at most \(\sqrt{2\alpha N}+2\) values of t satisfying \(t<\sqrt{2\alpha N}+2\). For a fixed \(d={\textsf {gcd}}(c,r)\), c has to be a multiple of d not exceeding qd. The number of such values of c is q. Since d must be a factor of r, we get the total number of values of c satisfying \(c<q\cdot {\textsf {gcd}}(c,r)\) to be at most . Putting it all together we get

(10)

Bounding \(\#\mathcal {S}^+[\alpha ]\). For \((t,c)\in \mathcal {S}^+[\alpha ]\), it will be enough for our purposes to consider the bounds

$$\begin{aligned} t\le qr, \end{aligned}$$

and

$$\begin{aligned} c<q\cdot {\textsf {gcd}}(c,r). \end{aligned}$$

Using the same reasoning as before, the number of values of c that satisfy \(c<q\cdot {\textsf {gcd}}(c,r)\) is at most . For t there are now at most qr values. Thus, we obtain the bound

(11)

Final Bound for \({\textsf {cp}}^r_{\rho }[q]\). We can now plug (10) and (11) into (9):

for any real \(\alpha >0\). We will simplify it by plugging in a suitable value of \(\alpha \).

Simplifying the Bound. We know from Lemma 3 that

We put \(\alpha =\log r\). Then we have

$$\begin{aligned} \sqrt{2\alpha N}=\sqrt{2N\log r}, \end{aligned}$$

and

$$\begin{aligned} e^{-\alpha }=\frac{1}{r}. \end{aligned}$$

When \(N\log r\ge 16\), we have

$$\begin{aligned} \sqrt{2\alpha N}+2&=\sqrt{2N\log r}+2\\&=2\sqrt{N\log r}-\left[ (2-\sqrt{2})\cdot \sqrt{N\log r}-2\right] \\&\le 2\sqrt{N\log r}-\left[ (2-\sqrt{2})\cdot 4-2\right] \\&=2\sqrt{N\log r}-\left[ 6-\sqrt{2}\right] \\&<2\sqrt{N\log r}. \end{aligned}$$

Thus,

$$\begin{aligned} {\textsf {cp}}^r_{\rho }[q]\le 2\cdot \left( \frac{q^2\sqrt{r}}{N}\right) +2\cdot \sqrt{\frac{q^2r\log r}{N}}. \end{aligned}$$

This gives us a bound for the success probability of a q-query single-trail attack on \(f^r\). We state the result as a lemma.

Lemma 6

Under the assumption that \(N\log r\ge 16\), we have

$$\begin{aligned} {\textsf {cp}}^r_{\rho }[q]\le 2\cdot \left( \frac{q^2\sqrt{r}}{N}\right) +2\cdot \sqrt{\frac{q^2r\log r}{N}}. \end{aligned}$$

4.2 Two-Trail Attack

We want to bound the probability that a \((q_1,q_2)\)-query two-trail attack finds a \(\lambda \)-collision on \(f^r\). Call this probability \({\textsf {cp}}^r_{\lambda }[q_1,q_2]\).

Reducing to Collision on f. Suppose the \((q_1,q_2)\)-query two-trail attack finds a \(\lambda \)-collision on \(f^r\) with foot lengths \(t'_1\) and \(t'_2\). As in the case of the \(\rho \)-collision on \(f^r\), this can only arise from a \(\lambda \)-collision on f, say with foot lengths \(t_1\) and \(t_2\), which can again happen in two ways:

  • Direct Collision. A direct collision takes place when the two f-trails collide in phase, i.e.,

    $$\begin{aligned} t_1=t_2{\textsf { mod }}r. \end{aligned}$$

    When this happens, the two trails continue till the next multiple of r, where they give a \(\lambda \)-collision on \(f^r\). This collision takes place at

    $$\begin{aligned} {t'_1=\left\lceil \frac{t_1}{r}\right\rceil ,t'_2=\left\lceil \frac{t_2}{r}\right\rceil .} \end{aligned}$$
  • Delayed Collision. A delayed collision takes place when the two f-trails collide out of phase, i.e.,

    $$\begin{aligned} t_1\ne t_2{\textsf { mod }}r. \end{aligned}$$

If one of the trails results in a \(\rho \)-collision on \(f^r\), this implies that a successful single-trail attack has been carried out on \(f^r\). Here, we will only focus on the scenario where a \(\lambda \)-collision on \(f^r\) can still happen. But then one of the two f-trails must have entered into a cycle, otherwise both f-trails will remain out of phase. This can only happen in one of two ways:

  • After the \(\lambda \)-collision on f, the combined trail forms the tail of a \(\rho \) collision on f, that is, they form a \(\lambda \rho \)-collision on f as in Fig. 3. One of the trails, say the one from \(x_1\), cycles around the \(\rho \) enough number of times to adjust the phase, and then the two f-trails continue to the next multiple of r, giving a \(\lambda \)-collision on \(f^r\);Footnote 2

  • After the \(\lambda \)-collision on f, one of the two f-trails, say the one from \(x_1\), continues and collides with the trail from \(x_2\), that is, they form a \(\rho '\)-collision on f as in Fig. 4. When \(\varDelta t=0\), a three-way collision on f occurs. The trail from \(x_1\) cycles around the \(\rho \) enough number of times to adjust the phase, giving a \(\lambda \)-collision on \(f^r\).

In our calculations, we assume that it is the trail from \(x_1\) that cycles multiple times, while the one from \(x_2\) waits for the collision on \(f^r\) to happen. We obtain a bound which is symmetric over \(q_1\) and \(q_2\), and thus also holds for the case when the two trails reverse roles. Let \(\tau _1\) and \(\tau _2\) be the respective lengths of the two trails till the point of waiting, i.e., the point of \(\rho \)-collision of the trail from \(x_1\). Calling \(\varDelta t\) the distance between the two collision points, we simply have

$$\begin{aligned} \tau _1=t_1+\varDelta t,\tau _2=t_2+\varDelta t \end{aligned}$$

for the \(\lambda \rho \)-collision, and

$$\begin{aligned} \tau _1=t_1,\tau _2=t_2+\varDelta t \end{aligned}$$

for the \(\rho '\)-collision. Let the cycle length of this \(\rho \) be c (note that its tail length is \(\tau _1\) with respect to this trail). Suppose this trail cycles \(\eta \) times about the \(\rho \) in order to adjust the phase difference. Then \(\eta \) is the smallest number that satisfies

$$\begin{aligned} \tau _1+c\eta =\tau _2{\textsf { mod }}r. \end{aligned}$$

Suppose k is such that

$$\begin{aligned} \tau _1+c\eta =\tau _2+rk. \end{aligned}$$

Also, let

$$\begin{aligned} k_2=\left\lceil \frac{\tau _2}{r}\right\rceil . \end{aligned}$$

From our definition of \(\tau _1\) and \(\tau _2\), we have that

$$\begin{aligned} f^{\tau _1}(x_1)=f^{\tau _2}(x_2), \end{aligned}$$

and from the \(\rho \)-collision \(f^{\tau _1+c}(x_1)=f^{\tau _1}(x_1)\), it follows that

$$\begin{aligned} f^{\tau _1+c\eta }(x_1)=f^{\tau _1}(x_1). \end{aligned}$$

From these two we get

$$\begin{aligned} f^{\tau _1+c\eta }(x_1)=f^{\tau _2}(x_2). \end{aligned}$$

From the definition of k we have

$$\begin{aligned} f^{\tau _2+rk}(x_1)=f^{\tau _2}(x_2). \end{aligned}$$

Continuing on to \(rk_2\), we get a \(\lambda \)-collision on \(f^r\) as

$$\begin{aligned} (f^r)^{k+k_2}(x_1)=(f^r)^{k_2}(x_2). \end{aligned}$$

According to this notation we have a \(\lambda \)-collision on \(f^r\) with foot lengths \(t'_1\) and \(t'_2\), such that

$$\begin{aligned} {t'_1=k+k_2=\left\lceil \frac{\tau _1+c\eta }{r}\right\rceil ,t'_2=k_2=\left\lceil \frac{\tau _2}{r}\right\rceil .} \end{aligned}$$

When this comes from a \(\lambda \rho \)-collision, we have

$$\begin{aligned} {t'_1=\left\lceil \frac{t_1+\varDelta t+c\eta }{r}\right\rceil ,t'_2=\left\lceil \frac{t_2+\varDelta t}{r}\right\rceil .} \end{aligned}$$

When this comes from a \(\rho '\)-collision, we have

$$\begin{aligned} {t'_1=\left\lceil \frac{t_1+c\eta }{r}\right\rceil ,t'_2=\left\lceil \frac{t_2+\varDelta t}{r}\right\rceil .} \end{aligned}$$

We will treat these two cases separately, even though they are closely related.

Required Conditions. Again, we observe that the direct collision is a special case of the delayed collision with \(\varDelta t=0\) and \(\eta =0\). However, there is an important difference. For the delayed \(\lambda \)-collision, we require two collisions on f, unlike all other collisions we have seen so far, which need only one. This case corresponds to the \(\lambda \rho \)-double-collision and the \(\rho '\)-double-collision from Sect. 3, and requires some special treatment, as we will see in the course of our calculations. The condition needed here is that both trails continue long enough for the collision to happen, i.e.,

$$\begin{aligned} t'_1\le q_1,t'_2\le q_2. \end{aligned}$$

In terms of \(t_1,t_2,\varDelta t,c,\eta \), this translates to

$$\begin{aligned} {\left\lceil \frac{t_1+\varDelta t+c\eta }{r}\right\rceil \le q_1,\left\lceil \frac{t_2+\varDelta t}{r}\right\rceil \le q_2} \end{aligned}$$

for the \(\lambda \rho \)-double-collision and

$$\begin{aligned} {\left\lceil \frac{t_1+c\eta }{r}\right\rceil \le q_1,\left\lceil \frac{t_2+\varDelta t}{r}\right\rceil \le q_2} \end{aligned}$$

for the \(\rho '\)-double-collision. Recall that we are trying to calculate \({\textsf {cp}}^r_{\lambda }[q_1,q_2]\), the probability of getting a \(\lambda \)-collision on \(f^r\) with a \((q_1,q_2)\)-query two-trail attack starting from \(x_1\) and \(x_2\). Based on our observations above, this can happen in two ways:

  • A Direct \(\lambda \) -collision on f. This is the direct collision scenario, where the collision is in phase. The foot lengths \(t_1\) and \(t_2\) have the constraints

    $$\begin{aligned} {\left\lceil \frac{t_1}{r}\right\rceil \le q_1,\left\lceil \frac{t_2}{r}\right\rceil \le q_2,t_1=t_2{\textsf { mod }}r.} \end{aligned}$$

    For fixed \(t_1,t_2\), we recall that the probability of this collision is \({\textsf {cp}}_{\lambda }(t_1,t_2)\).

  • A \(\lambda \rho \) -double-collision on f. This is the first case of the delayed collision scenario, where the collision is out of phase. Here, \(t_1\) and \(t_2\) are the foot lengths of the \(\lambda \), \(\varDelta t\) is the distance between the two collision points, c is the cycle length of the \(\rho \), and \(\eta \) is the number of cycles necessary around the \(\rho \). Recall that one of the trails circles around the \(\rho \), while the other waits for the \(\lambda \)-collision on \(f^r\) to happen. We continue with our assumption that the one from \(x_1\) does the cycling and the one from \(x_2\) waits, since we will eventually count over all pairs of trails. Now \(t_1,t_2,\varDelta t,c,\eta \) have the constraints

    $$\begin{aligned} {\left\lceil \frac{t_1+\varDelta t+c\eta }{r}\right\rceil \le q_1,\left\lceil \frac{t_2+\varDelta t}{r}\right\rceil \le q_2,t_1+c\eta =t_2{\textsf { mod }}r.} \end{aligned}$$

    For fixed \(t_1,t_2,\varDelta t,c,\eta \), we recall that the probability of this \(\lambda \rho \)-double-collision is \({\textsf {cp}}_{\lambda \rho }(t_1,t_2,\varDelta t,c)\).

  • A \(\rho '\) -double-collision on f. This is the second case of the delayed collision scenario. Here, \(t_1\) and \(t_2\) are the lengths of the two tails of the \(\rho \), \(\varDelta t\) is the distance between the two collision points, c is the cycle length of the \(\rho \), and \(\eta \) is the number of cycles necessary around the \(\rho \). Again, the trail from \(x_1\) circles around the \(\rho \), while the trail from \(x_2\) waits for the \(\lambda \)-collision on \(f^r\) to happen. Thus, \(t_1,t_2,\varDelta t,c,\eta \) have the constraints

    $$\begin{aligned} {\left\lceil \frac{t_1+c\eta }{r}\right\rceil \le q_1,\left\lceil \frac{t_2+\varDelta t}{r}\right\rceil \le q_2,t_1+c\eta =t_2+\varDelta t{\textsf { mod }}r.} \end{aligned}$$

Our strategy for bounding \({\textsf {cp}}^r_{\lambda }[q_1,q_2]\) will be similar to the one we used for bounding \({\textsf {cp}}^r_{\rho }[q]\): to take the bounds on \({\textsf {cp}}_{\lambda }(t_1,t_2)\) for fixed \(t_1,t_2\), \({\textsf {cp}}_{\lambda \rho }(t_1,t_2,\varDelta t,c)\) for fixed \(t_1,t_2,\varDelta t,c\) and \({\textsf {cp}}_{\rho '}(t_1,t_2,\varDelta t,c)\) for fixed \(t_1,t_2,\varDelta t,c\) obtained in Sect. 3, and then use the union bound over all possible values these parameters can take.

Applying the Union Bound to \({\textsf {cp}}^r_{\lambda }[q_1,q_2]\). Let \(\mathcal {S}_1\) be the set of \((t_1,t_2)\) values that satisfy the constraints

$$\begin{aligned} \left\lceil \frac{t_1}{r}\right\rceil \le q_1,\left\lceil \frac{t_2}{r}\right\rceil \le q_2,t_1=t_2{\textsf { mod }}r, \end{aligned}$$

and let

$$\begin{aligned} {\textsf {p}}_1:=\sum _{\mathcal {S}_1}{\textsf {cp}}_{\lambda }(t_1,t_2). \end{aligned}$$

Let \(\mathcal {S}_2\) be the set of \((t_1,t_2,\varDelta t,c,\eta )\) values that satisfy the constraints

$$\begin{aligned} \left\lceil \frac{t_1+\varDelta t+c\eta }{r}\right\rceil \le q_1,\left\lceil \frac{t_2+\varDelta t}{r}\right\rceil \le q_2,t_1+c\eta =t_2{\textsf { mod }}r, \end{aligned}$$

and let

$$\begin{aligned} {\textsf {p}}_2:=\sum _{\mathcal {S}_2}{\textsf {cp}}_{\lambda \rho }(t_1,t_2,\varDelta t,c). \end{aligned}$$

Let \(\mathcal {S}_3\) be the set of \((t_1,t_2,\varDelta t,c,\eta )\) values that satisfy the constraints

$$\begin{aligned} \left\lceil \frac{t_1+c\eta }{r}\right\rceil \le q_1,\left\lceil \frac{t_2+\varDelta t}{r}\right\rceil \le q_2,t_1+c\eta =t_2+\varDelta t{\textsf { mod }}r, \end{aligned}$$

and let

$$\begin{aligned} {\textsf {p}}_3:=\sum _{\mathcal {S}_3}{\textsf {cp}}_{\rho '}(t_1,t_2,\varDelta t,c). \end{aligned}$$

In addition, for the case where the trails reverse roles, we define \(\mathcal {S}_4\) as the set of \((t_1,t_2,\varDelta t,c,\eta )\) values that satisfy the constraints

$$\begin{aligned} \left\lceil \frac{t_1+\varDelta t}{r}\right\rceil \le q_1,\left\lceil \frac{t_2+\varDelta t+c\eta }{r}\right\rceil \le q_2,t_1=t_2+c\eta {\textsf { mod }}r, \end{aligned}$$

and

$$\begin{aligned} {\textsf {p}}_4:=\sum _{\mathcal {S}_4}{\textsf {cp}}_{\lambda \rho }(t_1,t_2,\varDelta t,c). \end{aligned}$$

Similarly, we define \(\mathcal {S}_5\) as the set of \((t_1,t_2,\varDelta t,c,\eta )\) values that satisfy the constraints

$$\begin{aligned} \left\lceil \frac{t_1+\varDelta t}{r}\right\rceil \le q_1,\left\lceil \frac{t_2+c\eta }{r}\right\rceil \le q_2,t_1+\varDelta t=t_2+c\eta {\textsf { mod }}r, \end{aligned}$$

and

$$\begin{aligned} {\textsf {p}}_5:=\sum _{\mathcal {S}_5}{\textsf {cp}}_{\rho '}(t_1,t_2,\varDelta t,c). \end{aligned}$$

We state here the following bounds on \({\textsf {p}}_1,{\textsf {p}}_2,{\textsf {p}}_3\), the proof of which we defer to Sect. 6:

Lemma 7

Under the assumption that \(N\log r>90\),

$$\begin{aligned} {\textsf {p}}_1&\le \frac{q_1q_2r}{N},\\ {\textsf {p}}_2&\le 8\cdot (\log r)^2\cdot \left( \frac{q_1q_2r}{N}\right) ^2+24\cdot (\log r)^3\cdot \left( \frac{q_1q_2r}{N}\right) ,\\ {\textsf {p}}_3&\le 8\cdot (\log r)^2\cdot \left( \frac{q_1q_2r}{N}\right) ^2+24\cdot (\log r)^3\cdot \left( \frac{q_1q_2r}{N}\right) . \end{aligned}$$

Final Bound for \({\textsf {cp}}^r_{\lambda }[q_1,q_2]\). We observe that the bounds for \({\textsf {p}}_2\) and \({\textsf {p}}_3\) in Lemma 7 are symmetric over \(q_1\) and \(q_2\). Thus, we have

$$\begin{aligned} {\textsf {p}}_4&\le 8\cdot (\log r)^2\cdot \left( \frac{q_1q_2r}{N}\right) ^2+24\cdot (\log r)^3\cdot \left( \frac{q_1q_2r}{N}\right) ,\\ {\textsf {p}}_5&\le 8\cdot (\log r)^2\cdot \left( \frac{q_1q_2r}{N}\right) ^2+24\cdot (\log r)^3\cdot \left( \frac{q_1q_2r}{N}\right) . \end{aligned}$$

Using the union bound, we get

$$\begin{aligned} {{\textsf {cp}}^r_{\lambda }[q_1,q_2]\le {\textsf {p}}_1+{\textsf {p}}_2+{\textsf {p}}_3+{\textsf {p}}_4+{\textsf {p}}_5.} \end{aligned}$$

This gives us the required bound, which we state next in the form of a lemma.

Lemma 8

When \(N\log r>90\),

$$\begin{aligned} {\textsf {cp}}^r_{\lambda }[q_1,q_2]\le 32\cdot \left( \frac{q_1q_2r\log r}{N}\right) ^2+97\cdot (\log r)^2\cdot \left( \frac{q_1q_2r\log r}{N}\right) . \end{aligned}$$

Proof

As \(r\ge 2\), we can relax the bound of \({\textsf {p}}_1\) as

$$\begin{aligned} {\textsf {p}}_1\le \frac{q_1q_2r}{N}\le \frac{q_1q_2r}{N}\cdot (\log r)^3. \end{aligned}$$

The rest follows from Lemma 7.    \(\square \)

4.3 A More General Collision Attack

Previously, we looked at two main approaches for a collision attack: the single-trail attack and the two-trail attack, and we bounded their success probabilities. Now, we will bound the success probability of a more general collision attack. More specifically, we consider collision attack subject to the restriction that is given in the statement of Theorem 1 in Sect. 1: every query is either chosen from a set of size m (with \(m\le q\)) of predetermined starting points, or is the response of a previous query. First, let us introduce the notion of a transcript.

Transcript. Let us consider any adversary that interacts with an oracle \(\mathcal {O}\). This interaction can be represented as a transcript, that is, as a list of queries made and answers returned. Let the transcript \({\textsf {tr}}\) be defined as the q-tuple of input-output pairs \({\textsf {tr}}= ((x_1, y_1), (x_2, y_2), \ldots , (x_q, y_q))\). Without loss of generality, we do not consider adversaries here that repeat the same query, i.e., all q queries are distinct.

Sources and Trails. For \(j,j'\in [q],j\ne j'\), we say that \(x_{j'}\) is a predecessor of \(x_j\) if

$$\begin{aligned} f(x_{j'})=x_j. \end{aligned}$$

We call \(x_j\) a source if it does not have a predecessor. If there exists a non-empty subset of the queries for which every query has a predecessor that is in the same subset, and no query has a predecessor outside the set, we call this subset a permutation cycle. Note that a permutation cycle forms a rho-shape with a tail of length zero. For a permutation cycle, we define the query \(x_j\) of the permutation cycle with the smallest index j to be a source.

Suppose that there are m sources along the q queries, which we call \(z_1,\ldots ,z_m\). Then we can see the attack as an m-trail attack, with the m trails starting from \(z_1,\ldots ,z_m\) and of lengths \(q_1,\ldots ,q_m\) respectively. Thus, each point that is not a source must be on one of these m trails.

If the collision attack is successful, then for some \(i,i'\in [q]\) with \(i\ne i'\), we have

$$\begin{aligned} f(x_i)=f(x_{i'}). \end{aligned}$$

In that case, one of the following must hold:

  • \(x_i\) and \(x_{i'}\) are on the same trail, say the one from \(z_p\) – in this case, a successful \(q_p\)-query single-trail attack starting from \(z_p\) has occurred;

  • \(x_i\) and \(x_{i'}\) are on different trails, say the ones from \(z_p\) and \(z_{p'}\) respectively – in this case, a successful \((q_p,q_{p'})\)-query two-trail attack starting from \((z_p,z_{p'})\) has occurred.

A Word on the Choice of \(q_1,\ldots ,q_m\). We note here that since we are allowing the trails to collide and merge with each other, the trail lengths \(q_1,\ldots ,q_m\) are not necessarily unique, since the queries on the merged trail can be counted on either trail, or both. We can get around this by choosing to count each merged trail as part of any one of the pre-merging trails, while the other is thought to stop at the point of collision. This way, we ensure that \(\sum _{j=1}^mq_j=q\).

To bound the success probability of this more general collision attack, we can use the previously obtained bounds on the success probabilities of single-trail attacks and two-trail attacks along with the union bound. With notation as above we recall the following bounds:

  • Single-Trail Attack. For a q-query single-trail attack, Lemma 6 gives us the bound

    $$\begin{aligned} {\textsf {cp}}^r_{\rho }[q]\le 2\cdot \left( \frac{q^2\sqrt{r}}{N}\right) +2\cdot \sqrt{\frac{q^2r\log r}{N}}. \end{aligned}$$
  • Two-Trail Attack. For a \((q_1,q_2)\)-query two-trail attack, Lemma 8 gives us the bound

    $$\begin{aligned} {\textsf {cp}}^r_{\lambda }[q_1,q_2]\le 32\cdot \left( \frac{q_1q_2r\log r}{N}\right) ^2+97\cdot (\log r)^2\cdot \left( \frac{q_1q_2r\log r}{N}\right) . \end{aligned}$$

Let denote the probability that the collision adversary making q queries finds a collision on \(f^r\). For \(q_1,\ldots ,q_m\), with

$$\begin{aligned} \sum _{i=1}^mq_i=q, \end{aligned}$$

and let denote the probability that a collision attack with m trails of lengths \(q_1,\ldots ,q_m\) finds a collision on \(f^r\). Thus,

By the union bound, we have

figure a

We bound the two terms separately.

$$\begin{aligned} \sum _{i=1}^m{\textsf {cp}}^r_{\rho }[q_i]&=\sum _{i=1}^m\left[ 2\cdot \left( \frac{q_i^2\sqrt{r}}{N}\right) +2\cdot \sqrt{\frac{q_i^2r\log r}{N}}\right] \\&=2\cdot \left( \frac{\sqrt{r}}{N}\right) \cdot \sum _{i=1}^m q_i^2+2\cdot \sqrt{\frac{r\log r}{N}}\cdot \sum _{i=1}^m q_i\\&\le 2\cdot \left( \frac{\sqrt{r}}{N}\right) \cdot q^2+2\cdot \sqrt{\frac{r\log r}{N}}\cdot q\\&=2\cdot \left( \frac{q^2\sqrt{r}}{N}\right) +2\cdot \sqrt{\frac{q^2r\log r}{N}}; \end{aligned}$$
$$\begin{aligned} \sum _{i=1}^{m-1}\sum _{j=i+1}^m{\textsf {cp}}^r_{\lambda }[q_i,q_j]&=\sum _{i=1}^{m-1}\sum _{j=i+1}^m\Bigg [32\cdot \left( \frac{q_iq_jr\log r}{N}\right) ^2\\&\qquad \qquad +97\cdot (\log r)^2\cdot \left( \frac{q_iq_jr\log r}{N}\right) \Bigg ]\\&=32\cdot \left( \frac{r\log r}{N}\right) ^2\cdot \sum _{i=1}^{m-1}\sum _{j=i+1}^mq_i^2q_j^2\\&\qquad \qquad +97\cdot (\log r)^2\cdot \left( \frac{r\log r}{N}\right) \cdot \sum _{i=1}^{m-1}\sum _{j=i+1}^mq_iq_j\\&\le 16\cdot \left( \frac{r\log r}{N}\right) ^2\cdot q^4+49\cdot (\log r)^2\cdot \left( \frac{r\log r}{N}\right) \cdot q^2\\&=16\cdot \left( \frac{q^2r\log r}{N}\right) ^2+49\cdot (\log r)^2\cdot \left( \frac{q^2r\log r}{N}\right) . \end{aligned}$$

Since these bounds are free of \(q_1,\ldots ,q_m\), this proves Theorem 1 of the paper.

5 Bounding the Advantage of Distinguishing f and \(f^r\)

5.1 Security Game

The Setup. An oracle \(\mathcal {O}\) imitating a function g takes q queries \(\left\{ x_i\mid i\in [q]\right\} \) and returns

$$\begin{aligned} \left\{ y_i=g(x_i)\mid i\in [q]\right\} . \end{aligned}$$

The q-tuple of input-output pairs of the oracle is called the transcript, denoted as

$$\begin{aligned} {\textsf {tr}}= ((x_1, y_1), (x_2, y_2), \ldots , (x_q, y_q)). \end{aligned}$$

Both the real oracle \(\mathcal {O}_{\textsc {REAL}}\) and the ideal oracle \(\mathcal {O}_{\textsc {IDEAL}}\) will initially select a uniformly random function f. Then, \(\mathcal {O}_{\textsc {REAL}}\) goes on to imitate \(f^r\), while \(\mathcal {O}_{\textsc {IDEAL}}\) imitates f itself. For any adversary , we want to bound its advantage, defined as

As in the collision attack of Sect. 4.3, we can view the transcript \({\textsf {tr}}\) as m trails of lengths \(q_1,\ldots ,q_m\) with sources \(z_1,\ldots ,z_m\), possibly with collisions, such that no query is counted in more than one trail, and hence

$$\begin{aligned} \sum _{j=1}^mq_j=q. \end{aligned}$$

For \(i\in [m]\), we shall use the notation

$$\begin{aligned} z_{i,1}&:=\mathcal {O}(z_i),\\ z_{i,j}&:=\mathcal {O}(z_{i,j-1}),2\le j\le q_i. \end{aligned}$$

Good and Bad Transcripts. We partition the set of attainable transcripts into a set \(\mathcal {T}_{{\textsf {good}}}\) of good transcripts, and a set \(\mathcal {T}_{{\textsf {bad}}}\) of bad transcripts. We say \({\textsf {tr}}\in \mathcal {T}_{{\textsf {bad}}}\) if either of the following holds:

  • For some \(i\in [m]\),

    $$\begin{aligned} z_{i,q_i}=z_i, \end{aligned}$$

    that is, the i-th trail forms a permutation cycle. Note that, by our construction of the trails, \(z_{i_1,j}\) cannot equal \(z_{i_2}\) unless \(i_1=i_2\).

  • For some \(i_1,i_2\in [m],j_1\in [q_{i_1}],j_2\in [q_{i_2}]\) with \((i_1,j_1)\ne (i_2,j_2)\), we have

    $$\begin{aligned} z_{i_1,j_1}=z_{i_2,j_2}, \end{aligned}$$

    that is, there is a \(\rho \)-collision on one of the trails (\(i_1=i_2\)), or there is a \(\lambda \)-collision on two of the trails (\(i_1\ne i_2\)).

5.2 Applying the H-Coefficient Technique

Let us denote the probability distribution of the transcripts in the real world by \(\mathrm {Pr}_{\mathcal {O}_{\textsc {REAL}}}\), and in the ideal world by \(\mathrm {Pr}_{\mathcal {O}_{\textsc {IDEAL}}}\). Our proof will use Patarin’s H-coefficient technique [17].

Lemma 9

(H-Coefficient Technique). Let be an adversary, and let \(\mathcal {T}= \mathcal {T}_{{\textsf {good}}}\cup \mathcal {T}_{{\textsf {bad}}}\) be a partition of the set of attainable transcripts. Let \(\varepsilon _1\) be such that for all \({\textsf {tr}}\in \mathcal {T}_{{\textsf {good}}}\):

Furthermore, let . Then .

Proof

For a proof and a detailed explanation of this technique, see Chen and Steinberger [10].    \(\square \)

Probability of Bad Transcripts in Ideal Model. We can easily bound the probability that a transcript \({\textsf {tr}}\) from the ideal oracle \(\mathcal {O}_{\textsc {IDEAL}}\) is in \(\mathcal {T}_{{\textsf {bad}}}\). Suppose all of the q responses lie outside \(\left\{ z_i\mid i\in [m]\right\} \), and there is no collision between any of the responses. When this happens, \({\textsf {tr}}\) cannot be in \(\mathcal {T}_{{\textsf {bad}}}\). The probability of this is at least \(1-\displaystyle \frac{2q^2}{N}\): two responses collide with probability at most \(\displaystyle \frac{q^2}{N}\); and a response collides with a \(z_i\) with probability at most \(\displaystyle \frac{q^2}{N}\), since there are m different values of \(z_i\), and \(m\le q\). Thus,

Probability of Good Transcripts. We now focus only on transcripts in \(\mathcal {T}_{{\textsf {good}}}\). Let us consider a good and attainable transcript \({\textsf {tr}}\in \mathcal {T}_{{\textsf {good}}}\). For the ideal oracle, as the number of distinct inputs is q, we have

Now we bound for \({\textsf {tr}}\in \mathcal {T}_{{\textsf {good}}}\). Consider a \((q_1,\ldots ,q_m)\)-query m-trail collision attack on \(f^r\), with sources \(z_1,\ldots ,z_m\) respectively. Theorem 1 tells us that this attack fails with probability at least \(1-\phi (q,r)\), where

We now observe that when this attack fails, the attack transcript is either isomorphic as a graph to \({\textsf {tr}}\), or contains a permutation cycle.Footnote 3 A permutation cycle occurs when queries of \(f^r\) collide with a source \(z_i\), which has probability at most \(\displaystyle \frac{q^2r}{N}\), since there are m different values of \(z_i\) and \(m\le q\). Thus, the attack transcript is isomorphic to \({\textsf {tr}}\) with probability at least

$$\begin{aligned} 1-\phi (q,r)-\frac{q^2r}{N}. \end{aligned}$$

Now the graph of this attack transcript has \(q+m\) nodes, all distinct. Of these, the m sources are already fixed. The rest can take values in ways. Now all of these graphs are equally likely to occur in the scenario described above, i.e., when the m-trail attack fails and does not contain a permutation cycle. One of the equally likely graphs is the graph of \({\textsf {tr}}\). Thus,

Applying the H-Coefficient Technique. Let \(R({\textsf {tr}})\) be the ratio of the probabilities of \({\textsf {tr}}\in \mathcal {T}_{{\textsf {good}}}\) under \(\mathcal {O}_{\textsc {REAL}}\) and \(\mathcal {O}_{\textsc {IDEAL}}\) respectively. Then we have shown above that

$$\begin{aligned} R({\textsf {tr}})\ge \left( 1-\phi (q,r)-\frac{q^2r}{N}\right) \cdot \frac{1}{\beta (q)}. \end{aligned}$$

From Lemma 1, we have

$$\begin{aligned} \beta (q)\le 1. \end{aligned}$$

Thus,

$$\begin{aligned} R({\textsf {tr}})\ge 1-\varepsilon _1 \end{aligned}$$

where

$$\begin{aligned} \varepsilon _1:=\phi (q,r)+\frac{q^2r}{N}. \end{aligned}$$

Hence, by the H-coefficient technique of Lemma 9, we have

This proves Theorem 2 of the paper.

6 Proof of Lemma 7

Recalling the Setup. In Sect. 4 we defined three sets \(\mathcal {S}_1\), \(\mathcal {S}_2\), and \(\mathcal {S}_3\). \(\mathcal {S}_1\) is the set of \((t_1,t_2)\) values that satisfy the constraints

$$\begin{aligned} \left\lceil \frac{t_1}{r}\right\rceil \le q_1,\left\lceil \frac{t_2}{r}\right\rceil \le q_2,t_1=t_2{\textsf { mod }}r; \end{aligned}$$

\(\mathcal {S}_2\) is the set of \((t_1,t_2,\varDelta t,c,\eta )\) values that satisfy the constraints

$$\begin{aligned} \left\lceil \frac{t_1+\varDelta t+c\eta }{r}\right\rceil \le q_1,\left\lceil \frac{t_2+\varDelta t}{r}\right\rceil \le q_2,t_1+c\eta =t_2{\textsf { mod }}r, \end{aligned}$$

\(\mathcal {S}_3\) is the set of \((t_1,t_2,\varDelta t,c,\eta )\) values that satisfy the constraints

$$\begin{aligned} \left\lceil \frac{t_1+c\eta }{r}\right\rceil \le q_1,\left\lceil \frac{t_2+\varDelta t}{r}\right\rceil \le q_2,t_1+c\eta =t_2+\varDelta t{\textsf { mod }}r. \end{aligned}$$

We further defined the following:

$$\begin{aligned} {\textsf {p}}_1&=\sum _{\mathcal {S}_1}{\textsf {cp}}_{\lambda }(t_1,t_2);\\ {\textsf {p}}_2&=\sum _{\mathcal {S}_2}{\textsf {cp}}_{\lambda \rho }(t_1,t_2,\varDelta t,c);\\ {\textsf {p}}_3&=\sum _{\mathcal {S}_3}{\textsf {cp}}_{\rho '}(t_1,t_2,\varDelta t,c). \end{aligned}$$

Lemma 7 claimed the following bounds for \({\textsf {p}}_1,{\textsf {p}}_2\) and \({\textsf {p}}_3\) (as long as \(N\log r>90\)):

$$\begin{aligned} {\textsf {p}}_1&\le \frac{q_1q_2r}{N},\\ {\textsf {p}}_2&\le 6\cdot (\log r)^2\cdot \left( \frac{q_1q_2r}{N}\right) ^2+18\cdot (\log r)^3\cdot \left( \frac{q_1q_2r}{N}\right) ,\\ {\textsf {p}}_3&\le 6\cdot (\log r)^2\cdot \left( \frac{q_1q_2r}{N}\right) ^2+18\cdot (\log r)^3\cdot \left( \frac{q_1q_2r}{N}\right) . \end{aligned}$$

In this section, we establish these bounds.

Bounding \({\textsf {p}}_1\). For this we need to bound \(\#\mathcal {S}_1\). This case is very simple. We observe the \(t_1\le q_1r\), so there are at most \(q_1r\) choices for \(t_1\). Once \(t_1\) is fixed, given the constraints \(t_1=t_2{\textsf { mod }}r\) and \(t_2\le q_2r\), there are at most \(q_2\) choices for \(t_2\). Thus, we have

$$\begin{aligned} \#\mathcal {S}_1\le q_1q_2r, \end{aligned}$$

which, using (5), gives the bound

$$\begin{aligned} {\textsf {p}}_1=\sum _{\mathcal {S}_1}{\textsf {cp}}_{\lambda }(t_1,t_2)\le \#\mathcal {S}_1\cdot \frac{1}{N}\le \frac{q_1q_2r}{N}. \end{aligned}$$

Towards Bounding \({\textsf {p}}_2\): Counting over \(t_1\), \(t_2\) and \(\varDelta t\). This is the most involved part of the calculations. For simplicity of notation we define the function

$$\begin{aligned} \zeta (\alpha ):=(\sqrt{2\alpha N}+3)^2 = 2\alpha N + 6\sqrt{2\alpha N} + 9. \end{aligned}$$

Recall that \(\mathcal {S}_2\) is the set of all \((t_1, t_2,\varDelta t, c,\eta )\) satisfying

$$\begin{aligned} \left\lceil \frac{t_1+\varDelta t+c\eta }{r}\right\rceil \le q_1,\left\lceil \frac{t_2+\varDelta t}{r}\right\rceil \le q_2,t_1+c\eta =t_2{\textsf { mod }}r. \end{aligned}$$

We begin by fixing a choice of c and \(\eta \). We want to bound the number of choices for \((t_1,t_2,\varDelta t)\). For this we relax the constraints a little. Let \(\mathcal {S}_2'=\mathcal {S}_2'(c,\eta )\) be the set of values for \((t_1,t_2,\varDelta t)\) satisfying

$$\begin{aligned} t_1\le q_1r,\varDelta t\le q_2r, t_2\le q_2r, t_1+c\eta =t_2{\textsf { mod }}r. \end{aligned}$$

Now we fix a real number \(\alpha >0\), and split \(\mathcal {S}_2'\) into two disjoint sets:

$$\begin{aligned} \mathcal {S}_2'^+[\alpha ]&:=\left\{ (t_1,t_2,\varDelta t)\in \mathcal {S}_2'\mid \max (t_1,\varDelta t)\ge \sqrt{2\alpha N}+3\right\} ,\\ \mathcal {S}_2'^-[\alpha ]&:=\left\{ (t_1,t_2,\varDelta t)\in \mathcal {S}_2'\mid \max (t_1,\varDelta t)<\sqrt{2\alpha N}+3\right\} . \end{aligned}$$

For \(\mathcal {S}_2'^+[\alpha ]\), there are at most \(q_1r\) choices for \(t_1\) and at most \(q_2r\) choices for \(\varDelta t\), and for each of these choices, we have at most \(q_2\) choices for \(t_2\). Thus,

$$\begin{aligned} \#\mathcal {S}_2'^+[\alpha ]\le q_1q_2^2r^2. \end{aligned}$$

For \(\mathcal {S}_2'^-[\alpha ]\), there are at most \(\sqrt{2\alpha N}+3\) choices for \(t_1\) and at most \(\sqrt{2\alpha N}+3\) choices for \(\varDelta t\), and for each of these choices, since choosing \(t_1\) also fixes \(t_2{\textsf { mod }}r\), we have at most \(q_2\) choices for \(t_2\). Thus,

$$\begin{aligned} \#\mathcal {S}_2'^-[\alpha ]\le (\sqrt{2\alpha N} + 3)^2\cdot q_2=\zeta (\alpha )\cdot q_2. \end{aligned}$$

When \((t_1,t_2,\varDelta t)\in \mathcal {S}_2'^ + [\alpha ]\),

$$\begin{aligned} t_1 + t_2+\varDelta t + c\eta \ge \sqrt{2\alpha N} + 3, \end{aligned}$$

so that according to (6):

$$\begin{aligned} {\textsf {cp}}_{\lambda \rho }(t_1,t_2,\varDelta t,c)\le \frac{e^{-\alpha }}{N^2}. \end{aligned}$$

When \((t_1,t_2,\varDelta t)\in \mathcal {S}_2'^-[\alpha ]\), (7) gives us

$$\begin{aligned} {\textsf {cp}}_{\lambda \rho }(t_1,t_2,\varDelta t,c)\le \frac{1}{N^2}. \end{aligned}$$

Let

$$\begin{aligned} {\textsf {p}}_2(c,\eta )&:=\sum _{\mathcal {S}_2'}{\textsf {cp}}_{\lambda \rho }(t_1,t_2,\varDelta t,c)\\&=\sum _{\mathcal {S}_2'^+[\alpha ]}{\textsf {cp}}_{\lambda \rho }(t_1,t_2,\varDelta t,c)+\sum _{\mathcal {S}_2'^-[\alpha ]}{\textsf {cp}}_{\lambda \rho }(t_1,t_2,\varDelta t,c)\\&\le q_1q_2^2r^2\cdot \frac{e^{-\alpha }}{N^2}+\zeta (\alpha )\cdot q_2\cdot \frac{1}{N^2}\\&=\frac{q_2}{N^2}\cdot \left[ q_1q_2r^2\cdot e^{-\alpha }+\zeta (\alpha )\right] . \end{aligned}$$

Towards Bounding \({\textsf {p}}_2\): Counting over c and \(\eta \). We next bound the number of choices for \((c,\eta )\) that satisfy the constraints. Again, we relax the constraints a little. Let be the set of \((c,\eta )\) values such that

$$\begin{aligned} c\eta \le q_1r. \end{aligned}$$

Next we fix \(d={\textsf {gcd}}(c,r)\). Let denote the set

c now takes values over multiples of d. We split the counting into two parts:

  • When \(c\le q_1d\), we recall that \(\eta \) is defined as the smallest solution to \(t_1+c\eta =t_2{\textsf { mod }}r.\) From elementary number theory, we have \({\displaystyle \eta \le \frac{r}{d}}.\) Thus, there are \(q_1\) choices of c and for each there are \({\displaystyle \frac{r}{d}}\) choices for \(\eta \), so in all there are \({\displaystyle \frac{q_1r}{d}}\) such choices for \(\eta \) and c.

  • When \(c>q_1d\), we use the bounds \(c\le q_1r\) and \(\eta \le \displaystyle \frac{q_1r}{c}\). Let \(z=\displaystyle \frac{c}{d}\). Thus, as c runs over all multiples of d from \((q_1+1)\cdot d\) to \(q_1r\), z takes all integer values from \(q_1+1\) to \({\displaystyle \frac{q_1r}{d}}\). Thus, the number of choices for \(\eta \) and c with \(c>q_1d\) is

    $$\begin{aligned} \sum _{z=q_1+1}^{\textstyle \frac{q_1r}{d}}\frac{q_1r}{zd}=\frac{q_1r}{d}\cdot \sum _{z=q_1+1}^{\textstyle \frac{q_1r}{d}}\frac{1}{z} \le \frac{q_1r}{d}\cdot \log \left( \frac{r}{d}\right) , \end{aligned}$$

    the last step following from Lemma 2.

Putting these two together, we get

figure b

Now, d can take values over all factors of r, so we have

the last step coming from Lemma 4.

Finally, we observe that whenever \((t_1,t_2,\varDelta t,c,\eta )\in \mathcal {S}_2\), we have \((t_1,t_2,\varDelta t)\in \mathcal {S}_2'(c,\eta )\), and . Hence,

figure c

This gives us the bound

$$\begin{aligned} {\textsf {p}}_2\le \frac{q_1q_2}{N^2}\cdot (1+\log r)\cdot \sigma (r)\cdot \left[ q_1q_2r^2\cdot e^{-\alpha } + \zeta (\alpha )\right] . \end{aligned}$$
(12)

Bounding \({\textsc {p}}_3\). Recall that \(\mathcal {S}_3\) is the set of all \((t_1,t_2,\varDelta t,c,\eta )\) satisfying

$$\begin{aligned} \left\lceil \frac{t_1+c\eta }{r}\right\rceil \le q_1,\left\lceil \frac{t_2+\varDelta t}{r}\right\rceil \le q_2,t_1+c\eta =t_2+\varDelta t{\textsf { mod }}r. \end{aligned}$$

The set \(\mathcal {S}_3\) is almost identical to the set \(\mathcal {S}_2\). However, the counting arguments are identical to those for \({\textsf {p}}_2\), as the relaxation of the constraints is valid for \({\textsf {p}}_2\) as well as \({\textsf {p}}_3\). Combined with (8), we have

figure d

Thus, we have

$$\begin{aligned} {\textsf {p}}_3\le \frac{q_1q_2}{N^2}\cdot (1+\log r)\cdot \sigma (r)\cdot \left[ q_1q_2r^2\cdot e^{-\alpha }+\zeta (\alpha )\right] . \end{aligned}$$

Simplifying the Bounds. Now we make a series of generous relaxations to get a simple easy-to-see bound for \({\textsf {p}}_2\) and \({\textsf {p}}_3\). Under the assumption that \(\sqrt{2\alpha N}+3\le \sqrt{3\alpha N}\), we have \(\zeta (\alpha )\le 3\alpha N.\) The assumption can be written as

$$\begin{aligned} (\sqrt{3}-\sqrt{2}).\sqrt{\alpha N}\ge 3. \end{aligned}$$

In other words,

$$\begin{aligned} \alpha N\ge 9(\sqrt{3}+\sqrt{2})^2=9(5+2\sqrt{6}). \end{aligned}$$

Now, \(2\sqrt{6}<5\), so a sufficient condition to ensure this is \(\alpha N\ge 90\). We now put \(\alpha =\log r\), and observe in passing that the ensuing assumption that \(N\log r\ge 90\) is quite reasonable. For this choice of \(\alpha \), we have

$$\begin{aligned} \zeta (\alpha )\le 3N\log r, \end{aligned}$$
(13)

and

$$\begin{aligned} e^{-\alpha } = \frac{1}{r}. \end{aligned}$$
(14)

Since \((5/3) \cdot \log r\ge 1\) for \(r\ge 2\), we have

$$\begin{aligned} 1 + \log r<\frac{5}{3}\log r + \log r = \frac{8}{3}\log r. \end{aligned}$$
(15)

Finally, to bound \(\sigma (r)\), we use Lemma 5, which gives us

$$\begin{aligned} \sigma (r)<3r\log r. \end{aligned}$$
(16)

Plugging (13)–(16) into (12), we have

$$\begin{aligned} {\textsf {p}}_2&\le \frac{q_1q_2}{N^2}\cdot 3r\log r \cdot \frac{8}{3}\log r\cdot (q_1q_2r^2\cdot \frac{1}{r}+3N\log r)\\&= 8\cdot (\log r)^2\cdot \left( \frac{q_1q_2r}{N}\right) ^2 + 24\cdot (\log r)^3\cdot \left( \frac{q_1q_2r}{N}\right) . \end{aligned}$$

Similarly,

$$\begin{aligned} {\textsf {p}}_3\le 8\cdot (\log r)^2\cdot \left( \frac{q_1q_2r}{N}\right) ^2 + 24\cdot (\log r)^3\cdot \left( \frac{q_1q_2r}{N}\right) . \end{aligned}$$

This completes the proof of Lemma 7.

7 Conclusion and Future Work

We studied the iterated random function problem, and proved the first bound in this setting that is tight up to a factor of \((\log r)^3\). In previous work, the iterated random function problem was seen as a special case of CBC-MAC based on a random function f. We obtained our bound by analysing the probability of a common class of collision attacks, and applying Patarin’s H-coefficient technique to bound the advantage of distinguishing \(f^r\) from f. Trying to improve the \((\log r)^3\) factor in the security bound is an interesting topic for future work.