1 Introduction

Lattice-Based Signatures. Lattice-based cryptography has proved to be a versatile way of achieving a very wide range of cryptographic primitives with strong security guarantees that are also believed to hold in the postquantum setting. For a while, it was largely confined to the realm of theoretical cryptography, mostly concerned with asymptotic efficiency, but it has made major strides towards practicality in recent years. Significant progress has been made in terms of practical constructions, refined concrete security estimates and fast implementations. As a result, lattice-based schemes are seen as strong contenders in the NIST postquantum standardization process.

In terms of practical signature schemes in particular, lattice-based constructions broadly fit within either of two large frameworks: Fiat–Shamir type constructions on the one hand, and hash-and-sign constructions on the other.

Fiat–Shamir lattice based signatures rely on a variant of the Fiat–Shamir paradigm [16] developed by Lyubashevsky, called “Fiat–Shamir with aborts” [31], which has proved particularly fruitful. It has given rise to numerous practically efficient schemes [2, 8, 23] including the two second round NIST candidates Dilithium [10, 33] and qTESLA [5].

The hash-and-sign family has a longer history, dating back to Goldreich–Goldwasser–Halevi [22] signatures as well as NTRUSign [24]. Those early proposals were shown to be insecure [12, 19, 21, 40], however, due to a statistical dependence between the distribution of signatures and the signing key. That issue was only overcome with the development of lattice trapdoors by Gentry, Peikert and Vaikuntanathan [20]. In the GPV scheme, signatures follow a distribution that is provably independent of the secret key (a discrete Gaussian supported on the public lattice), but which is hard to sample from without knowing a secret, short basis of the lattice. The scheme is quite attractive from a theoretical standpoint (for example, it is easier to establish QROM security for it than for Fiat–Shamir type schemes), but suffers from large keys and a potentially costly procedure for discrete Gaussian sampling over a lattice. Several follow-up works have then striven to improve its concrete efficiency [13, 34, 37, 42, 49], culminating in two main efficient and compact implementations: the scheme of Ducas, Lyubashevsky and Prest (DLP) [11], and its successor, NIST second round candidate Falcon [47], both instantiated over NTRU lattices [24] in power-of-two cyclotomic fields. One can also mention NIST first round candidates pqNTRUSign [52] and DRS [44] as members of this family, the latter of which actually fell prey to a clever statistical attack [51] in the spirit of those against GGH and NTRUSign.

Side-Channel Analysis of Lattice-Based Signatures. With the NIST postquantum standardization process underway, it is crucial to investigate the security of lattice-based schemes not only in a pure algorithmic sense, but also with respect to implementation attacks, such as side-channels. For lattice-based signatures constructed using the Fiat–Shamir paradigm, this problem has received a significant amount of attention in the literature, with numerous works [4, 6, 7, 14, 43, 50] pointing out vulnerabilities with respect to timing attacks, cache attacks, power analysis and other types of side-channels. Those attacks have proved particularly devastating against schemes using discrete Gaussian sampling, such as the celebrated BLISS signature scheme [8]. In response, several countermeasures have also been proposed [27, 28, 39], some of them provably secure [3, 4], but the side-channel arms race does not appear to have subsided quite yet.

In contrast, the case of hash-and-sign lattice-based signatures, including DLP and Falcon, remains largely unexplored, despite concerns being raised regarding their vulnerability to side-channel attacks. For example, the NIST status report on first round candidates, announcing the selection of Falcon to the second round, notes that “more work is needed to ensure that the signing algorithm is secure against side-channel attacks”. The relative lack of cryptanalytic works regarding these schemes can probably be attributed to the fact that the relationship between secret keys and the information that leaks through side-channels is a lot more subtle than in the Fiat–Shamir setting.

Indeed, in Fiat–Shamir style schemes, the signing algorithm uses the secret key very directly (it is combined linearly with other elements to form the signature), and as a result, side-channel leakage on sensitive variables, like the random nonce, easily leads to key exposure. By comparison, the way the signing key is used in GPV type schemes is much less straightforward. The key is used to construct the trapdoor information used for the lattice discrete Gaussian sampler; in the case of the samplers [13, 20, 30] used in GPV, DLP and Falcon, that information is essentially the Gram–Schmidt orthogonalization (GSO) of a matrix associated with the secret key. Moreover, due to the way that GSO matrix is used in the sampling algorithm, only a small amount of information about it is liable to leak through side-channels, and how that small amount relates to the signing key is far from clear. To the best of our knowledge, neither the problem of identifying a clear side-channel leakage, nor that of relating that such a leakage to the signing key have been tackled in the literature so far.

Our Contributions. In this work, we initiate the study of how side-channel leakage impacts the security of hash-and-sign lattice-based signature, focusing our attention to the two most notable practical schemes in that family, namely DLP and Falcon. Our contributions towards that goal are mainly threefold.

First, we identify a specific leakage of the implementations of both DLP and Falcon (at least in its original incarnation) with respect to timing side-channels. As noted above, the lattice discrete Gaussian sampler used in signature generation relies on the Gram–Schmidt orthogonalization of a certain matrix associated with the secret key. Furthermore, the problem of sampling a discrete Gaussian distribution supported over the lattice is reduced to sampling one-dimensional discrete Gaussians with standard deviations computed from the norms of the rows of that GSO matrix. In particular, the one-dimensional sampler has to support varying standard deviations, which is not easy to do in constant time. Unsurprisingly, the target implementations both leak that standard deviation through timing side-channels; specifically, they rely on rejection sampling, and the acceptance rate of the corresponding loop is directly related to the standard deviation. As a result, timing attacks will reveal the Gram–Schmidt norms of the matrix associated to the secret key (or rather, an approximation thereof, to a precision increasing with the number of available samples).

Second, we use algebraic number theoretic techniques to elucidate the link between those Gram–Schmidt norms and the secret key. In fact, we show that the secret key can be entirely reconstructed from the knowledge of those Gram–Schmidt norms (at least if they are known exactly), in a way which crucially relies on the algebraic structure of the corresponding lattices.

Since both DLP and Falcon work in an NTRU lattice, the signing key can be expressed as a pair (fg) of small elements in a cyclotomic ring \(\mathcal R= \mathbb {Z}[\zeta ]\) (of power-of-two conductor, in the case of those schemes). The secret, short basis of the NTRU lattice is constructed by blocks from the multiplication matrices of f and g (and related elements FG) in a certain basis of \(\mathcal R\) as a \(\mathbb {Z}\)-algebra (DLP uses the usual power basis, whereas Falcon uses the power basis in bit-reversed order; this apparently small difference interestingly plays a crucial role in this work). It is then easily seen that the Gram matrix of the first half of the lattice basis is essentially the multiplication matrix associated with the element \(u = f\bar{f}+g\bar{g}\), where the bar denotes the complex conjugation \(\bar{\zeta } = \zeta ^{-1}\). From that observation, we deduce that knowing the Gram–Schmidt norms of lattice basis is essentially equivalent to knowing the leading principal minors of the multiplication matrix of u, which is a real, totally positive element of \(\mathcal R\).

We then give general efficient algorithms, both for the power basis (DLP case) and for the bit-reversed order power basis (Falcon case), which recover an arbitrary totally positive element u (up to a possible automorphism of the ambient field) given the leading principal minors of its multiplication matrix. The case of the power basis is relatively easy: we can actually recover the coefficients iteratively one by one, with each coefficient given as a solution of quadratic equation over \(\mathbb {Q}\) depending only on the minors and the previous coefficients. The bit-reversed order power basis is more contrived, however; recovery is then carried out recursively, by reduction to the successive subfields of the power-of-two cyclotomic tower.

Finally, to complete the recovery, we need to deduce f and g from u. We show that this can be done using the public key \(h = g/f\bmod q\): we can use it to reconstruct both the relative norm \(f\bar{f}\) of f, and the ideal \((f)\subset \mathcal R\). That data can then be plugged into the Gentry–Szydlo algorithm [21] to obtain f in polynomial time, and hence g. Those steps, though simple, are also of independent interest, since they can be applied to the side-channel attack against BLISS described in [14], in order to get rid of the expensive factorization of an algebraic norm, and hence make the attack efficient for all keys (instead of a small percentage of weak keys as originally stated).

Our third contribution is to actually collect timing traces for the DLP scheme and mount the concrete key recovery. This is not an immediate consequence of the previous points, since our totally positive element recovery algorithm a priori requires the exact knowledge of Gram–Schmidt norms, whereas side-channel leakage only provides approximations (and since some of the squared Gram–Schmidt norms are rational numbers of very large height, recovering them exactly would require an unrealistic number of traces). As a result, the recovery algorithm has to be combined with some pruned tree search in order to account for approximate inputs. In practice, for the larger parameters of DLP signatures (with a claimed security level of 192 bits), we manage to recover the key with good probability using \(2^{33}\) to \(2^{35}\) DLP timing traces.

Carrying out such an experiment in the Falcon setting, however, is left as a challenging open problem for further work. This is because adapting the bit-reversed order totally positive recovery algorithm to deal with approximate inputs appears to be much more difficult (instead of sieving integers whose square lies in some specified interval, one would need to find the cyclotomic integers whose square lies in some target set, which does not even look simple to describe).

The source code of the attack is available at https://github.com/yuyang-crypto/Key_Recovery_from_GSnorms.

Related Work. As noted above, the side-channel security of Fiat–Shamir lattice-based signature has been studied extensively, including in [4, 6, 7, 14, 43, 50]. However, the only implementation attacks we are aware of against hash-and-sign schemes are fault analysis papers [15, 35]: side-channel attacks have not been described so far to the best of our knowledge.

Aside from the original implementations of DLP and Falcon, which are the focus of this paper, several others have appeared in the literature. However, they usually do not aim for side-channel security [36, 41] or only make the base discrete Gaussian sampler (with fixed standard deviation) constant time [29], but do not eliminate the leakage of the varying standard deviations. As a result, those implementations are also vulnerable to the attacks of this paper.

This is not the case, however, for Pornin’s very recent, updated implementation of Falcon, which uses a novel technique proposed by Prest, Ricosset and Rossi [48], combined with other recent results on constant time rejection sampling for discrete Gaussian distribution [4, 53] in order to eliminate the timing leakage of the lattice discrete Gaussian sampler. This technique applies to discrete Gaussian sampling over \(\mathbb {Z}\) with varying standard deviations, when those deviations only take values in a small range. It is then possible to eliminate the dependence on the standard deviation in the rejection sampling by scaling the target distribution to match the acceptance rate of the maximal possible standard deviation. The small range ensures that the overhead of this countermeasure is relatively modest. Thanks to this countermeasure, we stress that the most recent official implementation of Falcon is already protected against the attacks of this paper. Nevertheless, we believe our results underscore the importance of applying such countermeasures.

Organization of the Paper. Following some preliminary material in Sect. 2, Sect. 3 is devoted to recalling some general facts about signature generation for hash-and-sign lattice-based schemes. Section 4 then gives a roadmap of our attack strategy, and provides some details about the final steps (how to deduce the secret key from the totally positive element \(u=f\bar{f}+g\bar{g}\). Section 5 describes our main technical contribution: the algorithms that recover u from the Gram–Schmidt norms, both in the DLP and in the Falcon setting. Section 6 delves into the details of the side-channel leakage, showing how the implementations of the Gaussian samplers of DLP and Falcon do indeed reveal the Gram–Schmidt norms through timing side-channels. Finally, Sect. 7 presents our concrete experiments against DLP, including the tree search strategy to accommodate approximate Gram–Schmidt norms and experimental results in terms of timing and number of traces.

Notation. We use bold lowercase letters for vectors and bold uppercase for matrices. The zero vector is \(\mathbf {0}\). We denote by \(\mathbb {N}\) the non-negative integer set and by \(\log \) the natural logarithm. Vectors are in row form, and we write \(\mathbf {B}= (\mathbf {b}_0,\dotsc , \mathbf {b}_{n-1})\) to denote that \(\mathbf {b}_i\) is the i-th row of \(\mathbf {B}\). For a matrix \(\mathbf {B}\in \mathbb {R}^{n\times m}\), we denote by \(\mathbf {B}_{i,j}\) the entry in the i-th row and j-th column of \(\mathbf {B}\), where \(i\in \{{0,\dotsc , n-1}\}\) and \(j\in \{{0,\dotsc ,m-1}\}\). For \(I\subseteq [0,n), J \subseteq [0, m)\), we denote by \(\mathbf {B}_{I\times J}\) the submatrix \((\mathbf {B}_{i,j})_{i\in I,j\in J}\). In particular, we write \(\mathbf {B}_{I} = \mathbf {B}_{I\times I}\). Let \(\mathbf {B}^t\) denote the transpose of \(\mathbf {B}\).

Given \(\mathbf {u}= (u_0,\dotsc ,u_{n-1})\) and \(\mathbf {v}= (v_0,\dotsc ,v_{n-1})\), their inner product is \(\langle {\mathbf {u}, \mathbf {v}}\rangle = \sum _{i=0}^{n-1}u_iv_i\). The \(\ell _2\)-norm of \(\mathbf {v}\) is \(\Vert \mathbf {v}\Vert = \sqrt{\langle {\mathbf {v}, \mathbf {v}}\rangle }\) and the \(\ell _\infty \)-norm is \(\Vert \mathbf {v}\Vert _\infty = \max _i|v_i|\). The determinant of a square matrix \(\mathbf {B}\) is denoted by \(\det (\mathbf {B})\), so that \(\det \left( \mathbf {B}_{[0,i]}\right) \) is the i-th leading principal minor of \(\mathbf {B}\).

Let D be a distribution. We write \(z\hookleftarrow D\) when the random variable z is sampled from D, and denote by D(x) the probability that \(z=x\). The expectation of a random variable z is \(\mathbb {E}[z]\). We write \(\mathcal {N}(\mu , \sigma ^2)\) the normal distribution of mean \(\mu \) and variance \(\sigma ^2\). We let U(S) be the uniform distribution over a finite set S. For a real-valued function f and any countable set S in the domain of f, we write \(f(S)= \sum _{x\in S} f(x)\).

2 Preliminaries

A lattice \(\mathcal {L}\) is a discrete additive subgroup of \(\mathbb {R}^m\). If it is generated by \(\mathbf {B}\in \mathbb {R}^{n\times m}\), we also write \(\mathcal {L}:= \mathcal {L}(\mathbf {B}) = \{{\mathbf {x}\mathbf {B}\mid \mathbf {x}\in \mathbb {Z}^n}\}\). If \(\mathbf {B}\) has full row rank, then we call \(\mathbf {B}\) a basis and n the rank of \(\mathcal {L}\).

2.1 Gram–Schmidt Orthogonalization

Let \(\mathbf {B}= (\mathbf {b}_0,\dotsc , \mathbf {b}_{n-1}) \in \mathbb {Q}^{n\times m}\) of rank n. The Gram-Schmidt orthogonalization of \(\mathbf {B}\) is \(\mathbf {B}= \mathbf {L}\mathbf {B}^*\), where \(\mathbf {L}\in \mathbb {Q}^{n\times n}\) is lower-triangular with 1 on its diagonal and \(\mathbf {B}^* = (\mathbf {b}_0^*,\dotsc , \mathbf {b}_{n-1}^*)\) is a matrix with pairwise orthogonal rows. We call \(\Vert \mathbf {b}_i^*\Vert \) the i-th Gram-Schmidt norm of \(\mathbf {B}\), and let \(\Vert \mathbf {B}\Vert _{GS} = \max _i \Vert \mathbf {b}_i^*\Vert \).

The Gram matrix of \(\mathbf {B}\) is \(\mathbf {G}= \mathbf {B}\mathbf {B}^t\), and satisfies \(\mathbf {G}= \mathbf {L}\mathbf {D}\mathbf {L}^t\) where \(\mathbf {D}= \mathrm {diag}\left( \Vert \mathbf {b}_i^*\Vert ^2\right) \). This is also known as the Cholesky decomposition of \(\mathbf {G}\), and such a decomposition exists for any symmetric positive definite matrix. The next proposition follows from the triangular structure of \(\mathbf {L}\).

Proposition 1

Let \(\mathbf {B}= \mathbb {Q}^{n\times m}\) of rank n and \(\mathbf {G}\) its Gram matrix. Then for all integer \(0\le k\le n-1\), we have \(\det \left( \mathbf {G}_{[0,k]}\right) = \prod _{i=0}^{k} \Vert \mathbf {b}_i^*\Vert ^2\).

Let \(\mathbf {M}= \left( \begin{array}{cc} \mathbf {A}&{} \mathbf {B}\\ \mathbf {C}&{} \mathbf {D}\end{array} \right) \), where \(\mathbf {A}\in \mathbb {R}^{n\times n}\), \(\mathbf {D}\in \mathbb {R}^{m\times m}\) are invertible matrices, then \(\mathbf {M}/ \mathbf {A}= \mathbf {D}- \mathbf {C}\mathbf {A}^{-1}\mathbf {B}\in \mathbb {R}^{m\times m}\) is called the Schur complement of \(\mathbf {A}\). It holds that

$$\begin{aligned} \det (\mathbf {M}) = \det (\mathbf {A})\det (\mathbf {M}/\mathbf {A}). \end{aligned}$$
(1)

2.2 Parametric Statistics

Let \(D_p\) be some distribution determined by parameter p. Let \(\mathbf {X} = (X_1,\dotsc ,X_n)\) be a vector of observed samples of \(X\hookleftarrow D_p\). The log-likelihood function with respect to \(\mathbf {X}\) is

$$\begin{aligned} \ell _\mathbf {X}(p) = \sum _{i=1}^{n} \log (D_p(X_i)). \end{aligned}$$

Provided the log-likelihood function is bounded, a maximum likelihood estimator for samples \(\mathbf {X}\) is a real \(\text {MLE}(\mathbf {X})\) maximizing \(\ell _\mathbf {X}(p)\). The Fisher information is

$$\begin{aligned} \mathcal {I}(p) = -\mathbb {E}\left[ \frac{d^2}{d p^2}\ell _X(p)\right] . \end{aligned}$$

Seen as a random variable, it is known (e.g. [26, Theorem 6.4.2]) that \(\sqrt{n}(\text {MLE}(\mathbf {X}) - p)\) converges in distribution to \(\mathcal {N}(0, \mathcal {I}(p)^{-1})\). When the target distribution is a geometric, maximum likelihood estimators and the Fisher information are well-known. The second statement of the next lemma directly comes from a Gaussian tail bound.

Lemma 1

Let \(\text {Geo}_p\) denote a geometric distribution with parameter p, and \(\mathbf {X} = (X_1,\cdots ,X_n)\) be samples from \(\text {Geo}_p\). Then we have \(\text {MLE}(\mathbf {X}) = \frac{n}{\sum _{i=1}^n X_i}\) and \(\sqrt{n}(\text {MLE}(\mathbf {X}) - p)\) converges in distribution to \(\mathcal N(0, p^2(1-p))\). In particular, when N is large, then for any \(\alpha \ge 1\), we have \(|\text {MLE}(\mathbf {X}) - p| \le \alpha \cdot p\sqrt{\frac{1-p}{N}}\) except with probability at most \(2\exp (-\alpha ^2/2)\).

2.3 Discrete Gaussian Distributions

Let \(\rho _{\sigma ,\mathbf {c}}(\mathbf {x}) = \exp \left( -\frac{\Vert \mathbf {x}-\mathbf {c}\Vert ^2}{2\sigma ^2}\right) \) be the n-dimensional Gaussian function with center \(\mathbf {c}\in \mathbb {R}^n\) and standard deviation \(\sigma \). When \(\mathbf {c}=\mathbf {0}\), we just write \(\rho _{\sigma }(\mathbf {x})\). The discrete Gaussian over a lattice \(\mathcal {L}\) with center \(\mathbf {c}\) and standard deviation parameter \(\sigma \) is defined by the probability function

$$D_{\mathcal {L},\sigma ,\mathbf {c}}(\mathbf {x}) = \frac{\rho _{\sigma ,\mathbf {c}}(\mathbf {x})}{\rho _{\sigma ,\mathbf {c}}(\mathcal {L})}, \forall \mathbf {x}\in \mathcal {L}.$$

In this work, the case \(\mathcal {L}= \mathbb {Z}\) is of particular interest. It is well known that \(\int _{-\infty }^{\infty } \rho _{\sigma , c}(x)\text {d}x = \sigma \sqrt{2\pi }\). Notice that \(D_{\mathbb {Z},\sigma ,c}\) is equivalent to \(i + D_{\mathbb {Z},\sigma ,c-i}\) for an arbitrary \(i \in \mathbb {Z}\), hence it suffices to consider the case where \(c \in [0,1)\). The half discrete integer Gaussian, denoted by \(D^+_{\mathbb {Z},\sigma ,c}\), is defined by

$$D^+_{\mathbb {Z},\sigma ,c}(x) = \frac{\rho _{\sigma , c}(x)}{\rho _{\sigma , c}(\mathbb {N})}, \forall x \in \mathbb {N}.$$

We again omit the center when it is \(c=0\). For any \(\epsilon >0\), the (scaled)Footnote 1 smoothing parameter \(\eta _{\epsilon }'(\mathbb {Z})\) is the smallest \(s>0\) such that \(\rho _{1/s\sqrt{2\pi }}(\mathbb {Z})\le 1+\epsilon \). In practice, \(\epsilon \) is very small, say \(2^{-50}\). The smoothing parameter allows to quantify precisely how the discrete Gaussian differs from the standard Gaussian function.

Lemma 2

([38], implicit in Lemma 4.4). If \(\sigma \ge \eta _{\epsilon }'(\mathbb {Z})\), then \(\rho _\sigma (c + \mathbb {Z}) \in [\frac{1-\epsilon }{1+\epsilon }, 1]\rho _\sigma (\mathbb {Z})\) for any \(c \in [0,1)\).

Corollary 1

If \(\sigma \ge \eta _{\epsilon }'(\mathbb {Z})\), then \(\rho _\sigma (\mathbb {Z})\in [1, \frac{1+\epsilon }{1-\epsilon }]\sqrt{2\pi }\sigma \).

Proof

Notice that \(\int _{0}^{1} \rho _{\sigma }(\mathbb {Z}+c)\text {d}c = \int _{-\infty }^{\infty } \rho _{\sigma }(x)\text {d}x = \sqrt{2\pi }\sigma \), the proof is completed by Lemma 2.    \(\square \)

2.4 Power-of-Two Cyclotomic Fields

For the rest of this article, we let \(n = 2^\ell \) for some integer \(\ell \ge 1\). We let \(\zeta _n\) be a 2n-th primitive root of 1. Then \(\mathcal K_n = \mathbb {Q}(\zeta _n)\) is the n-th power-of-two cyclotomic field, and comes together with its ring of algebraic integers \(\mathcal R_n = \mathbb {Z}[\zeta _n]\). It is also equipped with n field automorphisms forming the Galois group which is commutative in this case. It can be seen that \(\mathcal K_{n/2}=\mathbb {Q}(\zeta _{n/2})\) is the subfield of \(\mathcal K_n\) fixed by the automorphism \(\sigma (\zeta _n)=-\zeta _n\) of \(\mathcal K_{n}\), as \(\zeta _n^2 = \zeta _{n/2}\). This leads to a tower of field extensions and their corresponding rings of integers

$$\begin{array}{ccccccccc} \mathcal K_{n} \!\!\!\!\!\!&{} \supseteq \!\!\!\!&{} \mathcal K_{n/2} \!\!\!\!\!&{} \supseteq \!\!\!\!&{} \cdots \!\!\!\!&{} \supseteq \!\!\!\!&{} \mathcal K_{1} \!\!\!\!\!\!&{} = \!\!\!\!&{} \mathbb {Q}\\ \cup \!\!\!\!&{} \!\!\!\!&{} \cup \!\!\!\!&{} \!\!\!\!&{} \cdots \!\!\!\!&{} \!\!\!\!&{} \cup \!\!\!\!&{} \!\!\!\!&{} \\ \mathcal R_{n} \!\!\!\!\!\!&{} \supseteq \!\!\!\!&{} \mathcal R_{n/2} \!\!\!\!\!&{} \supseteq \!\!\!\!&{} \cdots \!\!\!\!&{} \supseteq \!\!\!\!&{} \mathcal R_{1} \!\!\!\!\!\!&{} = \!\!\!\!&{} \mathbb {Z}\\ \end{array}$$

Given an extension \(\mathcal K_n|\mathcal K_{n/2}\), the relative trace \(\mathrm {Tr}: \mathcal K_n\rightarrow \mathcal K_{n/2}\) is the \(\mathcal K_{n/2}\)-linear map given by \(\mathrm {Tr}(f) = f+\sigma (f)\). Similarly, the relative norm is the multiplicative map \(\mathrm {N}(f)=f\cdot \sigma (f) \in \mathcal K_{n/2}\). Both maps send integers in \(\mathcal K_{n}\) to integers in \(\mathcal K_{n/2}\). For all \(f \in \mathcal K_n\), it holds that \(f = (\mathrm {Tr}(f) + \zeta _n\mathrm {Tr}(\zeta _n^{-1}f))/2\).

We are also interested in the field automorphism \(\zeta _n\mapsto \zeta _n^{-1}=\bar{\zeta _n}\), which corresponds to the complex conjugation. We call adjoint the image \(\bar{f}\) of f under this automorphism. The fixed subfield \(\mathcal K^+_n := \mathbb {Q}(\zeta _n+\zeta _n^{-1})\) is known as the totally real subfield and contains the self-adjoint elements, that is, such that \(f =\bar{f}\). Another way to describe self-adjoint elements is to say that all their complex embeddingsFootnote 2 are in fact reals. Elements whose embeddings are all positive are called totally positive elements, and we denote their set by \(\mathcal K^{++}_n\). A standard example of such an element is given by \(f\bar{f}\) for any \(f\in \mathcal K_n\). It is well-known that the Galois automorphisms act as permutation of these embeddings, so that a totally positive element stays positive under the action of the Galois group.

Representation of Cyclotomic Numbers. We also have \(\mathcal K_n \simeq \mathbb {Q}[x]/(x^n+1)\) and \(\mathcal R_n \simeq \mathbb {Z}[x]/(x^n+1)\), so that elements in cyclotomic fields can be seen as polynomials. In this work, each \(f = \sum _{i=0}^{n-1} f_i\zeta _n^i\in \mathcal K_{n}\) is identified with its coefficient vector \((f_0,\cdots ,f_{n-1})\). Then the inner product of f and g is \(\langle {f, g}\rangle = \sum _{i=0}^{n-1}f_ig_i\), and we write \(\Vert f\Vert \), resp. \(\Vert f\Vert _\infty \), the \(\ell _2\)-norm, resp. \(\ell _\infty \)-norm, of f. In this representation, it can be checked that \(\bar{f} = (f_0, -f_{n-1}, \dotsc , -f_1)\) and that \(\langle { f,gh}\rangle = \langle {f\bar{g}, h}\rangle \) for all \(f,g,h \in \mathcal K_n\). In particular, the constant coefficient of \(f\bar{g}\) is \(\langle {f, g}\rangle =\langle {f\bar{g}, 1}\rangle \). A self-adjoint element f has coefficients \((f_0, f_1, \dotsc , f_{n/2-1}, 0, -f_{n/2-1}, \dotsc , -f_1)\).

Elements in \(\mathcal K_n\) can also be represented by their matrix of multiplication in the basis \(1, \zeta _n, \dotsc , \zeta _n^{n-1}\). In other words, the map \(\mathcal A_{n}: \mathcal K_n \rightarrow \mathbb {Q}^{n\times n}\) defined by

$$ \mathcal A_{n}(f) = \left( \begin{array}{cccc} f_0 &{} f_{1} &{} \cdots &{} f_{n-1} \\ -f_{n-1} &{} f_0 &{} \cdots &{} f_{n-2} \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ -f_{1} &{} -f_{2} &{} \cdots &{} f_0 \end{array} \right) = \left( \begin{array}{c} f \\ \zeta _n\cdot f \\ \vdots \\ \zeta _n^{n-1}\cdot f \end{array} \right) $$

is a ring isomorphism. We have \(fg=g\cdot \mathcal A_n(f)\). We can also see that \(\mathcal A_n(\bar{f}) = \mathcal A_n(f)^t\) which justifies the term “adjoint”. We deduce that the matrix of a self-adjoint element is symmetric. It can be observed that a totally positive element \(A \in \mathcal K_n\) corresponds to the symmetric positive definite matrix \(\mathcal A_n(A)\).

For efficiency reasons, the scheme Falcon uses another representation corresponding to the tower structure. If \(f=(f_0,\dotsc , f_{n-1})\in \mathcal K_n\), we let \(f_e=\mathrm {Tr}(f)/2 = (f_0, f_2,\dotsc , f_{n-2})\) and \(f_o=\mathrm {Tr}(\zeta _n^{-1}f)/2 = (f_1, f_3, \dotsc , f_{n-1})\). Let \(\mathbf {P}_{n} \in \mathbb {Z}^{n\times n}\) be the permutation matrix corresponding to the bit-reversal order. We define \(\mathcal F_{n}(f) = \mathbf {P}_{n}\mathcal A_{n}(f)\mathbf {P}_{n}^t\). In particular, it is also symmetric positive definite when f is a totally positive element. As shown in [13], it holds that

$$\begin{aligned} \mathcal F_{n}(f) = \left( \begin{array}{cc} \mathcal F_{n/2}(f_e) &{} \mathcal F_{n/2}(f_o)\\ \mathcal F_{n/2}(\zeta _{n/2}f_o) &{} \mathcal F_{n/2}(f_e) \end{array} \right) . \end{aligned}$$
(2)

2.5 NTRU Lattices

Given \(f,g \in \mathcal R_n\) such that f is invertible modulo some \(q \in \mathbb {Z}\), we let \(h=f^{-1}g \bmod q\). The NTRU lattice determined by h is \(\mathcal L_\text {NTRU}= \{ (u,v) \in \mathcal R_n^2\,:\, u+vh = 0 \bmod q\}\). Two bases of this lattice are of particular interest for cryptography:

$$ \mathbf {B}_\text {NTRU}= \begin{pmatrix} q &{} 0 \\ -h &{} 1\end{pmatrix} ~\text {and}~ \mathbf {B}_{f,g} = \begin{pmatrix} g &{} ~-f \\ G &{} ~-F \end{pmatrix}, $$

where \(F,G \in \mathcal R_n\) such that \(fG-gF = q\). Indeed, the former basis acts usually as the public key, while the latter is the secret key, also called the trapdoor basis, when fgFG are short vectors. In practice, these matrices are represented using either the operator \(\mathcal A_n\) [11] or \(\mathcal F_n\) [47]:

$$\mathbf {B}^{\mathcal A}_{f,g} = \left( \begin{array}{cc} \mathcal A_{n}(g) &{} \mathcal A_{n}(-f)\\ \mathcal A_{n}(G) &{} \mathcal A_{n}(-F) \end{array} \right) \ \ \ \ \text {and} \ \ \ \ \mathbf {B}^{\mathcal F}_{f,g} = \left( \begin{array}{cc} \mathcal F_{n}(g) &{} \mathcal F_{n}(-f)\\ \mathcal F_{n}(G) &{} \mathcal F_{n}(-F) \end{array} \right) . $$

3 Hash-and-Sign over NTRU Lattices

Gentry, Peikert and Vaikuntanathan introduced in [20] a generic and provably secure hash-and-sign framework based on trapdoor sampling. This paradigm has then been instantiated over NTRU lattices giving rise to practically efficient cryptosystems: DLP [11] and Falcon [47] signature schemes.

In the NTRU-based hash-and-sign scheme, the secret key is a pair of short polynomials \((f, g) \in \mathcal R_n^2\) and the public key is \(h = f^{-1}g \bmod q\). The trapdoor basis \(\mathbf {B}_{f,g}\) (of \(\mathcal L_\text {NTRU}\)) derives from (fg) by computing \(F,G\in \mathcal R_n\) such that \(fG-gF =q\). In both the DLP signature and Falcon, the trapdoor basis has a bounded Gram-Schmidt norm: \(\Vert \mathbf {B}_{f,g}\Vert _{GS}\le 1.17\sqrt{q}\) for compact signatures.

The signing and verification procedure is described on a high level as follows:

figure a

Lattice Gaussian samplers [20, 42] are nowadays a standard tool to generate signatures provably statistically independent of the secret basis. However, such samplers are also a notorious target for side-channel attacks. This work makes no exception and attacks non constant-time implementations of the lattice Gaussian samplers at the heart of both DLP and Falcon, that are based on the KGPV sampler [30] or its ring variant [13]. Precisely, while previous attacks target to Gaussian with public standard deviations, our attack learns the secret-dependent Gaussian standard deviations involved in the KGPV sampler.

3.1 The KGPV Sampler and Its Variant

The KGPV sampler is a randomized variant of Babai’s nearest plane algorithm [1]: instead of rounding each center to the closest integer, the KGPV sampler determines the integral coefficients according to some integer Gaussians. It is shown in [20] that under certain smoothness condition, the algorithm outputs a sample from a distribution negligibly close to the target Gaussian. Its formal description is illustrated in Algorithm 3.1.

Note that in the KGPV sampler (or its ring variant), the standard deviations of integer Gaussians are inversely proportional to the Gram-Schmidt norms of the input basis. In the DLP scheme, \(\mathbf {B}\) is in fact the trapdoor basis \(\mathbf {B}^{\mathcal A}_{f,g} \in \mathbb {Z}^{2n\times 2n}\).

The Ducas–Prest Sampler. Falcon uses a variant of the KGPV algorithm which stems naturally from Ducas–Prest’s fast Fourier nearest plane algorithm [13]. It exploits the tower structure of power-of-two cyclotomic rings. Just like the KGPV sampler, the Ducas-Prest sampler fundamentally relies on integer Gaussian sampling to output Gaussian vectors. We omit its algorithmic description, as it is not needed in this work. Overall, what matters is to understand that the standard deviations of involved integer Gaussians are also in the form \(\sigma _i = \sigma /\Vert \mathbf {b}_{i}^*\Vert \), but that \(\mathbf {B}= \mathbf {B}^{\mathcal F}_{f,g}\) in this context.

figure b

4 Side-Channel Attack Against Trapdoor Samplers: A Roadmap

Our algorithm proceeds as follows:

  1. 1.

    Side-channel leakage: extract the \(\Vert \mathbf {b}_i^*\Vert \)’s associated to \(\mathbf {B}_{f,g}^\mathcal A\), resp. \(\mathbf {B}_{f,g}^\mathcal F\) via the timing leakage of integer Gaussian sampler in the DLP scheme, reps. Falcon.

  2. 2.

    Totally positive recovery: from the given \(\Vert \mathbf {b}_i^*\Vert \)’s, recover a Galois conjugate u of \(f\overline{f} + g\overline{g} \in \mathcal K^{++}_n\).

  3. 3.

    Final recovery: compute f from u and the public key \(g/f \mod q\).

Steps 1 and 2 of the attack are the focus of Sects. 6 and 5 respectively. Below we describe how the third step is performed. First we recover the element \(f\overline{g}\), using the fact that it has small coefficients. More precisely, the \(j^{\text {th}}\) coefficient is \(\langle {f, \zeta _n^jg}\rangle \) where f and \(\zeta _n^jg\) are independent and identically distributed according to \(D_{\mathbb {Z}^n,r}\), with \(r=1.17\sqrt{\frac{q}{2n}}\). By [32, Lemma 4.3], we know that all these coefficients are of size much smaller than q/2 with high probability. Now, we can compute \(v = u\overline{h}(1+h\overline{h})^{-1} \bmod q\), where \(h = f^{-1}g\bmod q\) is the public verification key. We readily see that \(v = f\overline{g} \bmod q\) if and only if \(u = f\overline{f} + g\overline{g}\). If u is a conjugate of \(f\overline{f}+g\overline{g}\), then most likely the coefficients of v will look random in \((-q/2, q/2]\). This can mostly be interpreted as the NTRU assumption, that is, h being indistinguishable from a random element modulo q. When this happens, we just consider another conjugate of u, until we obtain a distinguishably small element, which must then be \(f\overline{g}\) (not just in reduction modulo q, but in fact over the integers).

Once this is done, we can then deduce the reduction modulo q of \(f\bar{f} \equiv f\bar{g} / \bar{h} \pmod q\), which again coincides with \(f\bar{f}\) over the integers with high probability (if we again lift elements of \(\mathbb {Z}_q\) to \((-q/2,q/2]\), except for the constant coefficient, which should be lifted positively). This boils down to the fact that with high probability \(f\overline{f}\) has its constant coefficient in (0, q) and the others are in \((-q/2, q/2)\). Indeed, the constant coefficient of \(f\overline{f}\) is \(\Vert f\Vert ^2\), and the others are \(\langle {f, \zeta _n^jf}\rangle \)’s with \(j\ge 1\). By some Gaussian tail bound, we can show \(\Vert f\Vert ^2\le q\) with high probability. As for \(\langle {f, \zeta _n^jf}\rangle \)’s, despite the dependency between f and \(\zeta _n^jf\), we can still expect \(|\langle {f, \zeta _n^jf}\rangle | < q/2\) for all \(j\ge 1\) with high probability. We leave details in the full version [17] for interested readers.

Next, we compute the ideal (f) from the knowledge of \(f\overline{f}\) and \(f\overline{g}\). Indeed, as f and g are co-prime from the key generation algorithm, we directly have \((f) = (f\overline{f}) + (f\overline{g})\). At this point, we have obtained both the ideal (f) and the relative norm \(f\bar{f}\) of f on the totally real subfield. That data is exactly what we need to apply the Gentry–Szydlo algorithm [21], and finally recover f itself in polynomial time. Note furthermore that the practicality of the Gentry–Szydlo algorithm for the dimensions we consider (\(n=512\)) has been validated in previous work [14].

Comparison with Existing Method. As part of their side-channel analysis of the BLISS signature scheme, Espitau et al. [14] used the Howgrave-Graham–Szydlo algorithm to recover an NTRU secret f from \(f\overline{f}\). They successfully solved a small proportion \(({\approx } 7\%)\) of NTRU instances with \(n=512\) in practice. The Howgrave-Graham–Szydlo algorithm first recovers the ideal (f) and then calls the Gentry–Szydlo algorithm as we do above. The bottleneck of this method is in its reliance on integer factorization for ideal recovery: the integers involved can become quite large for an arbitrary f, so that recovery cannot be done in classical polynomial time in general. This is why only a small proportion of instances can be solved in practice.

However, the technique we describe above bypasses this expensive factorization step by exploiting the arithmetic property of the NTRU secret key. In particular, it is immediate to obtain a two-element description of (f), so that the Gentry-Szydlo algorithm can be run as soon as \(f\bar{f}\) and \(f\bar{g}\) are computed. This significantly improves the applicability and efficiency of Espitau et al.’s side-channel attack against BLISS [14]. The question of avoiding the reliance on Gentry–Szydlo algorithm by using the knowledge of \(f\overline{g}\) and \(f\overline{f}\) remains open, however.

5 Recovering Totally Positive Elements

Totally positive elements in \(\mathcal K_n\) correspond to symmetric positive definite matrices with an inner structure coming from the algebra of the field. In particular, it is enough to know only one line of the matrix to recover the corresponding field element. Hence it can be expected that being given the diagonal part of the LDL decomposition also suffices to perform a recovery. In this section, we show that this is indeed the case provided we know exactly the diagonal.

Recall on the one hand that the \(\mathcal A_n\) representation is the skew circulant matrix in which each diagonal consists of the same entries. On the other hand, the \(\mathcal F_n\) representation does not follow the circulant structure, but it is compatible with the tower of rings structure, i.e. its sub-matrices are the \(\mathcal F_{n/2}\) representations of elements in the subfield \(\mathcal K_{n/2}\). Each operator leads to a distinct approach, which is described in Sects. 5.1 and 5.2 respectively.

While the algorithms of this section can be used independently, they are naturally related to hash-and-sign over \(\text {NTRU}\) lattices. Let \(\mathbf {B}\) be a matrix representation of some secret key \((g,-f)\), and \(\mathbf {G}= \mathbf {B}\mathbf {B}^t\). Then the diagonal part of \(\mathbf {G}\)’s LDL decomposition contains the \(\Vert \mathbf {b}_i^*\Vert \)’s, and \(\mathbf {G}\) is a matrix representation of \(f\overline{f} + g\overline{g} \in \mathcal K^{++}_n\). As illustrated in Sect. 4, the knowledge of \(u=f\overline{f} + g\overline{g}\) allows to recover the secret key in polynomial time. Therefore results in this section pave the way for a better use of secret Gram-Schmidt norms.

In practice however, we will obtain only approximations of the \(\Vert \mathbf {b}_i^*\Vert \)’s. The algorithms of this section must then be tweaked to handle the approximation error. The case of \(\mathcal A_n\) is dealt with in Sect. 7.1. While we do not solve the “approximate” case of \(\mathcal F_n\), we believe our “exact” algorithms to be of independent interest to the community.

5.1 Case of the Power Basis

The goal of this section is to obtain the next theorem. It involves the heuristic argument that some rational quadratic equations always admits exactly one integer root, which will correspond to a coefficient of the recovered totally positive element. Experimentally, when it happens that there are two integer roots and the wrong one is chosen, the algorithm “fails” with overwhelming probability at the next step: the next discriminant does not lead to integer roots.

Theorem 1

Let \(u\in \mathcal R_n \cap \mathcal K^{++}_n\). Write \(\mathcal A_n(u) = \mathbf {L}\cdot \mathrm {diag}(\lambda _i)_i \cdot \mathbf {L}^t\). There is a (heuristic) algorithm \(\mathsf {Recovery}_\mathcal A\) that, given \(\lambda _i\)’s, computes u or \(\sigma (u)\). It runs in \(\widetilde{O}(n^3\log \Vert u\Vert _\infty )\).

The complexity analysis is given in the full version [17]. In Sect. 7.2, a version tweaked to handle approximations of the \(\lambda _i\)’s is given, and may achieve quasi-quadratic complexity. It is in any case very efficient in practice, and it is used in our attack against DLP signature.

We now describe Algorithm 5.1. By Proposition 1, \(\prod _{j=0}^{i} \lambda _i = \det \left( \mathcal A_n(u)_{[0,i]}\right) \) is an integer, thus we take \(m_i = \prod _{j=0}^{i} \lambda _i\) instead of \(\lambda _i\) as input for integrality. It holds that \(u_0 = \det \left( \mathcal A_n(u)_{[0,0]}\right) = \lambda _0\). By the self-adjointness of u, we only need to consider the first n/2 coefficients. For any \(0 \le i < n/2-1\), we have

$$ \mathcal A_n(u)_{[0,i+1]} = \begin{pmatrix} &{} &{} &{} u_{i+1} \\ &{} \mathcal A_n(u)_{[0,i]} &{} &{}\vdots \\ &{} &{} &{} u_1 \\ u_{i+1} &{} \dots &{} u_1 &{} u_0 \end{pmatrix}. $$

Let \(\mathbf {v}_i = (u_{i+1}, \dotsc , u_1)\). By the definition of the Schur complement and Proposition 1, we see that

$$\begin{aligned} \frac{\det \left( \mathcal A_n(u)_{[0,i+1]}\right) }{ \det \left( \mathcal A_n(u)_{[0,i]}\right) } = u_0 - \mathbf {v}_{i}\mathcal A_n(u)_{[0,i]}^{-1}\mathbf {v}_{i}^t, \end{aligned}$$

where the left-hand side is actually \(\lambda _{i+1}\), and the right-hand side gives a quadratic equation in \(u_{i+1}\) with rational coefficients that can be computed from the knowledge of \((u_0,\dotsc , u_i)\). When \(i=0\), the equation is equivalent to \(\lambda _0\lambda _1 = u_0^2-u_1^2\): there are two candidates of \(u_1\) up to sign. Once \(u_1\) is chosen, for \(i\ge 1\), the quadratic equation should have with very high probability a unique integer solution, i.e. the corresponding \(u_{i+1}\). This leads to Algorithm 5.1. Note that the sign of \(u_1\) determines whether the algorithm recovers u or \(\sigma (u)\). This comes from the fact that \(\mathcal A_n(u) = \mathrm {diag}((-1)^i)_{i\le n} \cdot \mathcal A_n(\sigma (u))\cdot \mathrm {diag}((-1)^i)_{i\le n}\).

figure c

5.2 Case of the Bit-Reversed Order Basis

In this section, we are given the diagonal part of the LDL decomposition \(\mathcal F_n(u)=\mathbf {L}'\mathrm {diag}(\lambda _i)\mathbf {L}'^t\), which rewrites as \((\mathbf {L}'^{-1}\mathbf {P}_n)\mathcal A_n(u)\mathbf {(}\mathbf {L}'^{-1}\mathbf {P}_n)^t = \mathrm {diag}(\lambda _i)\). Since the triangular structure is shuffled by the bit-reversal representation, recovering u from the \(\lambda _i\)’s is not as straightforward as in the previous section. Nevertheless, the compatibility of the \(\mathcal F_n\) operator with the tower of extension can be exploited. It gives a recursive approach that stems from natural identities between the trace and norm maps relative to the extension \(\mathcal K_n\,|\,\mathcal K_{n/2}\), crucially uses the self-adjointness and total positivity of u, and fundamentally relies on computing square roots in \(\mathcal R_n\).

Theorem 2

Let \(u\in \mathcal R_n \cap \mathcal K^{++}_n\). Write \(\mathcal F_n(u) =\mathbf {L}'\cdot \mathrm {diag}(\lambda _i)_i \cdot \mathbf {L}'^t\). There is a (heuristic) algorithm that, given the \(\lambda _i\)’s, computes a conjugate of u. It runs in \(\widetilde{O}(n^3\log \Vert u\Vert _\infty )\).

The recursiveness of the algorithm and its reliance on square roots will force it to always work “up to Galois conjugation”. In particular, at some point we will assume heuristically that only one of the conjugates of a value computed within the algorithm is in a given coset of the subgroup of relative norms in the quadratic subfield. Since that constraint only holds with negligible probability for random values, the heuristic is essentially always verified in practice. Recall that we showed in Sect. 4 how to recover the needed conjugate in practice by a distinguishing argument.

The rest of the section describes the algorithm, while the complexity analysis is presented in the full version [17]. First, we observe from

$$ \mathrm {Tr}(u) + \zeta _n\mathrm {Tr}(\zeta _n^{-1}u) = 2u = 2\bar{u} = \overline{\mathrm {Tr}(u)} + \zeta _n^{-1} \overline{\mathrm {Tr}(\zeta _n^{-1}u)} $$

that \(\mathrm {Tr}(u)\) is self-adjoint. The positivity of u implies that \(\mathrm {Tr}(u) \in \mathcal K^{++}_{n/2}\). From Eq. (2), we know that the n/2 first minors of \(\mathcal F_n(u)\) are the minors of \(\mathcal F_{n/2}(\mathrm {Tr}(u)/2)\). The identity above also shows that \(\mathrm {Tr}(\zeta _n^{-1}u)\) is a square root of the element \(\zeta _{n/2}^{-1}\mathrm {Tr}(\zeta _n^{-1}u)\overline{\mathrm {Tr}(\zeta _n^{-1}u)}\) in \(\mathcal K_{n/2}\). Thus, if we knew \(\mathrm {Tr}(\zeta _n^{-1}u)\overline{\mathrm {Tr}(\zeta _n^{-1}u)}\), we could reduce the problem of computing \(u\in \mathcal K_n\) to computations in \(\mathcal K_{n/2}\), more precisely, recovering a totally positive element from “its minors” and a square root computation.

It turns out that \(\mathrm {Tr}(\zeta _n^{-1}u)\overline{\mathrm {Tr}(\zeta _n^{-1}u)}\) can be computed by going down the tower as well. One can see that

$$\begin{aligned} \mathrm {Tr}(u)^2 - 4\mathrm {N}(u) = \mathrm {Tr}(\zeta _n^{-1}u)\overline{\mathrm {Tr}(\zeta _n^{-1}u)}, \end{aligned}$$
(3)

where \(\mathrm {N}(u)\) is totally positive since u (and therefore \(\sigma (u)\)) is. This identityFootnote 3 can be thought as a “number field version” of the \(\mathcal F_n\) representation. Indeed, recall that \(u_e = (1/2)\mathrm {Tr}(u)\) and \(u_o=(1/2)\mathrm {Tr}(\zeta _n^{-1}u)\). Then by block determinant formula and the fact that \(\mathcal F_n\) is a ring isomorphism, we see that

$$\begin{aligned} \det \mathcal F_n(u) = \prod _{i=0}^{n-1}\lambda _i&= \det (\mathcal F_{n/2}(u_e)^2 - \mathcal F_{n/2}(u_o\overline{u_o})). \end{aligned}$$

This strongly suggests a link between the successive minors of \(\mathcal F_n(u)\) and the element \(\mathrm {N}(u)\). The next lemma makes this relation precise, and essentially amounts to taking Schur complements in the above formula.

Lemma 3

Let \(u\in \mathcal K^{++}_n\) and \(\widehat{u} = \frac{2\mathrm {N}(u)}{\mathrm {Tr}(u)}\in \mathcal K^{++}_{n/2}\). Then for \(0< k< n/2\), we have

$$\det \left( \mathcal F_{n}(u)_{{[0, k+\frac{n}{2})}}\right) = \det \big (\mathcal F_{n/2}(u_e)\big )\det \big (\mathcal F_{n/2}(\widehat{u})_{{[0,k)}}\big ).$$

Proof

Let \(\mathbf {G}= \mathcal F_{n}(u)\) and \(\mathbf {B}= \mathcal F_{n/2}(u_o)_{[0,\frac{n}{2}) \times [0,k)}\) in order to write

$$\mathbf {G}_{[0, \frac{n}{2} + k)} = \left( \begin{array}{cc} \mathcal F_{n/2}(u_e) &{} \mathbf {B}\\ \mathbf {B}^t &{} \mathcal F_{n/2}(u_e)_{[0,k)} \end{array} \right) ,$$

with \(\mathbf {B}^t=\mathcal F_{n/2}(\overline{u_o})_{[0,k) \times [0,\frac{n}{2})}\). Let \(\mathbf {S}= \mathbf {G}_{[0, \frac{n}{2} + k)}/\mathcal F_{n/2}(u_e) = \mathcal F_{n/2}(u_e)_{[0,k)} - \mathbf {B}\mathcal F_{n/2}(u_e)^{-1}\mathbf {B}^t\). Since \(\mathcal F_n\) is a ring isomorphism, a routine computation shows that \(\mathbf {S}= \mathcal F_{n/2}(\widehat{u})_{[0,k)}\). The proof follows from Eq. (1).    \(\square \)

Lemma 3 tells us that knowing \(\mathrm {Tr}(u)\) and the principal minors of \(\mathcal F_n(u)\) is enough to recover those of \(\mathcal F_{n/2}(\widehat{u})\), so that the computations in \(\mathcal K_{n}\) are again reduced to computing a totally positive element in \(\mathcal K_{n/2}\) from its minors. Then from Eq. (3), we can obtain \(\mathrm {Tr}(\zeta _n^{-1}u)\overline{\mathrm {Tr}(\zeta _n^{-1}u)}\). The last step is then to compute a square root of \(\zeta _{n/2}^{-1}\mathrm {Tr}(\zeta _n^{-1}u)\overline{\mathrm {Tr}(\zeta _n^{-1}u)}\) in \(\mathcal K_{n/2}\) to recover \(\pm \mathrm {Tr}(\zeta _n^{-1}u)\). In particular, this step will lead to u or its conjugate \(\sigma (u)\). As observed above, this translates ultimately in recovering only a conjugate of u.

Lastly, when \(n=2\), that is, when we work in \(\mathbb {Q}(i)\), a totally positive element is in fact in \(\mathbb {Q}_+\). This leads to Algorithm 5.2, which is presented in the general context of \(\mathcal K_n\) to fit the description above, for the sake of simplicity. The algorithm \(\mathsf {TowerRoot}\) of Step 9 computes square roots in \(\mathcal K_n\) and a quasi-quadratic version for integers is presented and analyzed in the next section.

figure d

The whole procedure is constructing a binary tree as illustrated in Fig. 1. The algorithm can be made to rely essentially only on algebraic integers, which also helps in analyzing its complexity. This gives the claim of Theorem 2 (see the full version [17] for details). At Step 6, the algorithm finds the (heuristically unique) conjugate \(\widehat{u}\) of \(\widetilde{u}\) such that \(\widehat{u}\cdot u^+\) is a relative norm (since we must have \(\widehat{u}\cdot u^+ = \mathrm {N}(u)\) by the above). In practice, in the integral version of this algorithm, we carry out this test not by checking for being a norm, but as an integrality test.

Fig. 1.
figure 1

Binary tree built by \(\mathsf {TowerRecovery}_\mathcal F\).

5.2.1 Computing Square Roots in Cyclotomic Towers

In this section, we will focus on computing square roots of algebraic integers: given \(s = t^2 \in \mathcal R_n\), compute t. The reason for focusing on integers is that both our Algorithm 5.2 and practical applications deal only with algebraic integers. A previous approach was suggested in [25], relying on finding primes with small splitting pattern in \(\mathcal R_n\), computing square roots in several finite fields and brute-forcing to find the correct candidate. A hassle in analyzing this approach is to first find a prime larger enough than an arbitrary input, and that splits in, say, two factors in \(\mathcal R_n\). Omitting the cost of finding such a prime, this algorithm can be shown to run in \(\widetilde{O}(n^2(\log \Vert s\Vert _\infty )^2)\). Our recursive approach does not theoretically rely on finding a correct prime, and again exploits the tower structure to achieve the next claim.

Theorem 3

Given a square s in \(\mathcal R_n\), there is a deterministic algorithm that computes \(t \in \mathcal R_n\) such that \(t^2=s\) in time \(\widetilde{O}(n^2\log \Vert s\Vert _\infty )\).

Recall that the subfield \(\mathcal K_{n/2}\) is fixed by the automorphism \(\sigma (\zeta _n) = -\zeta _n\). For any element t in \(\mathcal R_n\), recall that \(t = \frac{1}{2}(\mathrm {Tr}(t) + \zeta _n\mathrm {Tr}(\zeta _n^{-1} t))\), where \(\mathrm {Tr}\) is the trace relative to this extension. We can also see that

$$\begin{aligned} \mathrm {Tr}(t)^2&= \mathrm {Tr}(t^2) + 2\mathrm {N}(t) = \mathrm {Tr}(s) + 2\mathrm {N}(t),\nonumber \\ \mathrm {Tr}(\zeta _n^{-1} t)^2&= \zeta _n^{-2}( \mathrm {Tr}(t^2) - 2\mathrm {N}(t)) = \zeta _{n/2}^{-1}(\mathrm {Tr}(s) - 2\mathrm {N}(t)), \end{aligned}$$
(4)

for the relative norm. Hence recovering \(\mathrm {Tr}(t)\) and \(\mathrm {Tr}(\zeta _n^{-1} t)\) can be done by computing the square roots of elements in \(\mathcal R_{n/2}\) determined by s and \(\mathrm {N}(t)\). The fact that \(\mathrm {N}(s) = \mathrm {N}(t)^2\) leads to Algorithm 5.3.

figure e

Notice that square roots are only known up to sign. This means that an algorithm exploiting the tower structure of fields must perform several sign checks to ensure that it will lift the correct root to the next extension. For our algorithm, we only need to check for the sign of \(\mathrm {N}(t)\) (the signs of \(\mathrm {Tr}(t)\) and \(\mathrm {Tr}(\zeta _n^{-1} t)\) can be determined by checking if their current values allow to recover s). This verification happens at Step 6 of Algorithm 5.3, where after computing the square root of \(\mathrm {N}(s)\), we know \((-1)^b\mathrm {N}(t)\) for some \(b\in \{0,1\}\). It relies on noticing that from Eq. (4), \(T_b := \mathrm {Tr}(s)+2\cdot (-1)^b\mathrm {N}(t)\) is a square in \(\mathcal K_{n/2}\) if and only if \(b=0\), in which case \(T_b = \mathrm {Tr}(t)^2\). (Else, \(\zeta _n^{-2}T_b\) is the square \(\mathrm {Tr}(\zeta _n^{-1} t)^2\) in \(\mathcal K_{n/2}\).) This observation can be extended to a sign check that runs in \(\widetilde{O}(n\cdot \log \Vert s\Vert _\infty )\). The detailed analysis is given in the full version [17].

In practice, we can use the following approach: since n is small, we can easily precompute a prime integer p such that \(p -1\equiv n\bmod 2n\). For such a prime, there is a primitive \(n^{\text {th}}\) root \(\omega \) of unity in \(\mathbb {F}_p\), and such a root cannot be a square in \(\mathbb {F}_p\) (else 2n would divide \(p-1\)). Checking squareness then amounts to checking which of \(T_b(\omega )\) or \(\omega ^{-2}T_b(\omega )\) is a square \(\bmod \, p\) by computing a Legendre symbol. While we need such primes for any power of 2 that is smaller than n, in any case, this checks is done in quasi-linear time. Compared to [25], the size of p here does not matter.

Let us denote by \(\mathsf {SQRT}(n, S)\) the complexity of Algorithm 5.3 for an input \(s \in \mathcal R_n\) with coefficients of size \(S = \log \Vert s\Vert _\infty \). Using e.g. FFT based multiplication of polynomials, \(\mathrm {N}(s)\) can be computed in \(\widetilde{O}(n S)\), and has bitsize at most \(2S+\log n\). Recall that the so-called canonical embedding of any \(s\in \mathcal K_n\) is the vector \(\tau (s)\) of its evaluations at the roots of \(x^n+1\). It is well-known that it satisfies \(\Vert \tau (s) \Vert = \sqrt{n}\Vert s\Vert \), so that \(\Vert \tau (s)\Vert _\infty \le n \Vert s\Vert _\infty \) by norm equivalence. If \(s=t^2\) we see that \(\Vert \tau (s)\Vert _\infty = \Vert \tau (t)\Vert _\infty ^2\). Using again norm equivalence, we obtain \(\Vert t\Vert _\infty \le \sqrt{n}\Vert s\Vert _\infty ^{1/2}\). In the case of \(\mathrm {N}(s) = \mathrm {N}(t)^2\), we obtain that \(\mathrm {N}(t)\) has size at most \(S+\log n\). The cost of \(\mathsf {CheckSqr}\) is at most \(\widetilde{O}(n S)\), so we obtain

$$\begin{aligned} \mathsf {SQRT}(n, S) = \mathsf {SQRT}\left( \frac{n}{2}, 2S + \log n\right) + 2\mathsf {SQRT}\left( \frac{n}{2}, S + \log n\right) + \widetilde{O}(n S). \end{aligned}$$

A tedious computation (see the full version [17] for details) gives us Theorem 3.

6 Side-Channel Leakage of the Gram–Schmidt Norms

Our algorithms in Sect. 5 rely on the knowledge of the exact Gram-Schmidt norms \(\Vert \mathbf {b}_i^*\Vert \). In this section, we show that in the original implementations of DLP and Falcon, approximations of \(\Vert \mathbf {b}_i^*\Vert \)’s can be obtained by exploiting the leakage induced by a non constant-time rejection sampling.

In previous works targeting the rejection phase, the standard deviation of the sampler was a public constant. This work deals with a different situation, as both the centers and the standard deviations used by the samplers of DLP and Falcon are secret values determined by the secret key. These samplers output Gaussian vectors by relying on an integer Gaussian sampler, which performs rejection sampling. The secret standard deviation for the \(i^{\text {th}}\) integer Gaussian is computed as \(\sigma _i =\sigma /\Vert \mathbf {b}_i^*\Vert \) for some fixed \(\sigma \), so that exposure of the \(\sigma _i\)’s means the exposure of the Gram-Schmidt norms. The idea of the attack stems from the simple observation that the acceptance rate of the sampler is essentially a linear function of its current \(\sigma _i\). In this section, we show how, by a timing attack, one may recover all acceptance rates from sufficiently many signatures by computing a well-chosen maximum likelihood estimator. Recovering approximations of the \(\Vert \mathbf {b}_i^*\Vert \)’s then follows straightforwardly.

6.1 Leakage in the DLP Scheme

We first target the Gaussian sampling in the original implementation [46], described in Algorithms 6.1 and 6.2. It samples “shifted” Gaussian integers by relying on three layers of Gaussian integer sampling with rejection. More precisely, the target Gaussian distribution at the “top” layer has a center which depends on secret data and varies during each call. To deal with the varying center, the “shifted” sample is generated by combining zero-centered sampler and rejection sampling. Yet the zero-centered sampler has the same standard deviation as the “shifted” one, and the standard deviation depends on the secret key. At the “intermediate” layer, also by rejection sampling, the sampler rectifies a public zero-centered sample to a secret-dependent one.

At the “bottom” layer, the algorithm \(\mathsf {IntSampler}\) actually follows the BLISS sampler [8] that is already subject to side-channel attacks [7, 14, 43]. We stress again that our attack does not target this algorithm, so that the reader can assume a constant-time version of it is actually used here. The weakness we are exploiting is a non constant-time implementation of Algorithm 6.2 in the “intermediate” layer. We now describe how to actually approximate the \(\sigma _i\)’s using this leakage.

figure f
figure g

Let \(\widehat{\sigma } = \sqrt{\frac{1}{2\log (2)}}\) be the standard deviation of the Gaussian at the “bottom” layer and \(k_i = \lceil \frac{\sigma _i}{\hat{\sigma }}\rceil \). It can be verified that the average acceptance probability of Algorithm 6.2 is \(AR(\sigma _i) = \frac{\rho _{\sigma _i}(\mathbb {Z})}{\rho _{k\widehat{\sigma }}(\mathbb {Z})}\). As required by the KGPV algorithm, we know that \(k_i\widehat{\sigma } \ge \sigma _i \ge \eta _{\epsilon }'(\mathbb {Z})\) and by Corollary 1 we have \(AR(\sigma _i) \in \frac{\sigma _i}{k_i\widehat{\sigma }}\cdot \left[ \frac{1-\epsilon }{1+\epsilon }, 1\right] \). Since \(\epsilon \) is very small in this context, we do not lose much by assuming that \(AR(\sigma _i) = \frac{\sigma _i}{k_i\hat{\sigma }}\).

Next, for a given \(\sigma _i\), the number of trials before Algorithm 6.2 outputs its result follows a geometric distribution \(\text {Geo}_{AR(\sigma _i)}\). We let \(\overline{AR}_i\) be maximum likelihood estimators for the \(AR(\sigma _i)\)’s associated to N executions of the KGPV sampler, that we compute using Lemma 1. We now want to determine the \(k_i\)’s to compute \(\overline{\sigma _i} = k_i\hat{\sigma }\overline{AR}_i\). Concretely, for the suggested parameters, we can set \(k_i = 3\) for all i at the beginning and measure \(\overline{AR}_i\). Because the first half of the \(\sigma _i\)’s are in a small interval and increase slowly, it may be the case at some step that \(\overline{AR}_{i+1}\) is significantly smaller than \(\overline{AR}_{i}\) (say, \(1.1\cdot \overline{AR}_{i+1} < \overline{AR}_{i}\)). This means that \(k_{i+1} = k_i+1\), and we then increase by one all the next \(k_{i}\)’s. This approach can be done until \(\overline{AR}_{n}\) is obtained, and works well in practice. Lastly, Lemma 1 tells us that for large enough \(\alpha \) and p, taking \(N\ge 2^{2(p+\log \alpha )}\) implies \(|\overline{\sigma }_i-\sigma _i|\le 2^{-p}\cdot \sigma _i\) for all \(0\le i< 2n\) with high probability.

From [11], the constant \(\sigma \) is publicly known. This allows us to have approximations \(\overline{b_i} = \frac{\sigma }{\overline{\sigma }_i}\), which we then expect are up to p bits of accuracy on \(\Vert \mathbf {b}_i^*\Vert \).

6.2 Leakage in the Falcon Scheme

We now describe how the original implementation of Falcon presents a similar leakage of Gram–Schmidt norms via timing side-channels. In contrast to the previous section, the integer sampler of Falcon is based on one public half-Gaussian sampler and some rejection sampling to reflect sensitive standard deviations and centers. The procedure is shown in Algorithm 6.3.

figure h

Our analysis does not target the half-Gaussian sampler \(D_{\mathbb {Z},\widehat{\sigma }}^{+}\) where \(\widehat{\sigma }=2\), so that we omit its description. It can be implemented in a constant-time way [29], but this has no bearing on the leakage we describe.

We first consider \(c_i\) and \(\sigma _i\) to be fixed. Following Algorithm 6.3, we let \(p(z,b) = \exp \left( \frac{z^2}{2\widehat{\sigma }^2} - \frac{(b + (2b-1)z-c_i)^2}{2\sigma _i^2} \right) \) be the acceptance probability and note that

$$ p(z,0) = \frac{1}{\rho _{\hat{\sigma }}(z)}\exp \left( -\frac{(-z-c)^2}{2\sigma _i^2}\right) ~~\text {and}~~ p(z,1) =\frac{1}{\rho _{\hat{\sigma }}(z)} \exp \left( -\frac{(z+1-c)^2}{2\sigma _i^2}\right) . $$

Then the average acceptance probability for fixed c and \(\sigma _i\) satisfies

$$\begin{aligned} \mathbb E_{z,b}\big [p(z,b)] =&\, \frac{1}{2\rho _{\hat{\sigma }}(\mathbb {N})}\sum _{z\in \mathbb {N}} \left( \exp \left( -\frac{(-z-c)^2}{2\sigma _i^2}\right) + \exp \left( -\frac{(z+1-c)^2}{2\sigma _i^2}\right) \right) \\ =&\,\frac{\rho _{\sigma _i}(\mathbb {Z}-c)}{2\rho _{\hat{\sigma }}(\mathbb {N})}. \end{aligned}$$

As \(\widehat{\sigma } \ge \sigma _i \ge \eta _{\epsilon }'(\mathbb {Z})\) for a very small \(\epsilon \), we can again use Lemma 2 to have that \(\rho _{\sigma _i}(\mathbb {Z}-c) \approx \rho _{\sigma _i}(\mathbb {Z})\). This allows us to consider the average acceptance probability as a function \(AR(\sigma _i)\), independent of c. Using that \(2\rho _{\hat{\sigma }}^+(\mathbb {N}) = \rho _{\hat{\sigma }}(\mathbb {Z})+1\) combined with Corollary 1, we write \(AR(\sigma _i) = \frac{\sigma _i\sqrt{2\pi }}{1+2\sqrt{2\pi }}\). Then an application of Lemma 1 gives the needed number of traces to approximate \(\sigma _i\) up to a desired accuracy.

7 Practical Attack Against the DLP Scheme

For the methods in Sect. 6, measure errors seem inevitable in practice. To mount a practical attack, we have to take into account this point. In this section, we show that it is feasible to compute a totally positive element even with noisy diagonal coefficients of its LDL decomposition.

First we adapt the algorithm \(\mathsf {Recovery}_\mathcal A\) (Algorithm 5.1) to the noisy input in Sect. 7.1. To determine each coefficient, we need to solve a quadratic inequality instead of an equation due to the noise. As a consequence, each quadratic inequality may lead to several candidates of the coefficient. According to if there is a candidate or not, the algorithm extends prefixes hopefully extending to a valid solution or eliminates wrong prefixes. Thus the algorithm behaves as a tree search.

Then we detail in Sect. 7.2 some implementation techniques to accelerate the recovery algorithm in the context of the DLP scheme. While the algorithm is easy to follow, adapting it to practical noisy case is not trivial.

At last, we report experimental results in Sect. 7.3. As a conclusion, given the full timing leakage of about \(2^{34}\) signatures, one may practically break the DLP parameter claimed for 192-bit security with a good chance. We bring some theoretical support for this value in Sect. 7.4.

7.1 Totally Positive Recovery with Noisy Inputs

Section 5.1 has sketched the exact recovery algorithm. To tackle the measure errors, we introduce a new parameter to denote the error bound. The modified algorithm proceeds in the same way: given a prefix \((A_0,\cdots , A_{l-1})\), it computes all possible \(A_l\)’s satisfying the error bound condition and extends or eliminates the prefix according to if it can lead to a valid solution. A formal algorithmic description is provided in Algorithm 7.1. For convenience, we use the (noisy) diagonal coefficients (i.e. secret Gram-Schmidt norms) of the LDL decomposition as input. In fact, Proposition 1 has shown the equivalence between the diagonal part and principal minors. In addition, we include prefix in the input for ease of description. The initial prefix is \(\mathsf{prefix} = \overline{A_0} = \lfloor {\overline{d_0}}\rceil \). Clearly, the correct A must be in the final candidate list.

figure i

7.2 Practical Tweaks in the DLP Setting

Aiming at the DLP signature, we implemented our side-channel attack. By the following techniques, one can boost the practical performance of the recovery algorithm significantly and reduce the number of required signatures.

Fast Computation of the Quadratic Equation. Exploiting the Toeplitz structure of \(\mathcal A_n(A)\), we propose a fast algorithm to compute the quadratic equation, i.e. \((Q_a, Q_b, Q_c)\), that requires only O(l) multiplications and additions. The idea is as follows. Let \(\mathbf {T}_i = \mathcal A_n(A)_{[0,i]}\). Let \(\mathbf {u}_i = (A_1,\cdots , A_i)\) and \(\mathbf {v}_i = (A_i,\cdots , A_1)\), then

$$\mathbf {T}_i = \left( \begin{array}{cc} \mathbf {T}_{i-1} &{} \mathbf {v}_i^t\\ \mathbf {v}_i &{} A_0 \end{array} \right) = \left( \begin{array}{cc} A_0 &{} \mathbf {u}_i\\ \mathbf {u}_i^t &{} \mathbf {T}_{i-1} \end{array} \right) .$$

Let \(\mathbf {r}_i = \mathbf {v}_i\mathbf {T}_{i-1}^{-1}\), \(\mathbf {s}_i = \mathbf {u}_i\mathbf {T}_{i-1}^{-1}\) which is the reverse of \(\mathbf {r}_i\) and \(d_i = A_0 - \langle {\mathbf {v}_i, \mathbf {r}_i}\rangle = A_0 - \langle {\mathbf {u}_i, \mathbf {s}_i}\rangle \). A straightforward computation leads to that

$$\mathbf {T}_i^{-1} = \left( \begin{array}{cc} \mathbf {T}_{i-1}^{-1} + \mathbf {r}_i^t\mathbf {r}_i / d_i &{} \ -\mathbf {r}_i^t / d_i\\ -\mathbf {r}_i / d_i &{} \ 1/d_i \end{array} \right) .$$

Let \(f_i = \langle {\mathbf {r}_i, \mathbf {u}_i}\rangle = \langle {\mathbf {s}_i, \mathbf {v}_i}\rangle \), then the quadratic equation of \(A_i\) is

$$\begin{aligned} d_i = A_0 - \langle {\mathbf {v}_i, \mathbf {r}_i}\rangle = A_0 - (A_i - f_{i-1})^2 / d_{i-1} - \langle {\mathbf {v}_{i-1}, \mathbf {r}_{i-1}}\rangle . \end{aligned}$$

Remark that \(d_i\) is the square of the last Gram-Schmidt norm. Because \(\overline{d_i}\), a noisy \(d_i\), is the input, combining \( f_{i-1}, \mathbf {v}_{i-1}, \mathbf {r}_{i-1}\) would determine all possible \(A_i\)’s. Once \(A_i\) is recovered, one can then compute \(\mathbf {r}_i\), \(\mathbf {s}_i\) according to

$$\mathbf {s}_i = \left( \begin{array}{cc} -\frac{A_i - f_{i-1}}{d_{i-1}}\mathbf {r}_{i-1} + \mathbf {s}_{i-1},&\ \frac{A_i - f_{i-1}}{d_{i-1}} \end{array}\right) $$

and further compute \(d_{i}, f_{i}\). As the recovery algorithm starts with \(i=1\) (i.e. \(\mathsf{prefix} = A_0\)), we can compute the sequences \(\{{d_{i}}\}, \{{f_{i}}\}, \{{\mathbf {r}_i}\}, \{{\mathbf {s}_i}\}\) on the fly.

Remark 1

The input matrix is very well conditioned, so we can use a precision of only \(O(\log n)\) bits.

Remark 2

The above method implies an algorithm of complexity \(\widetilde{O}(n^2)\) for the exact case (Sect. 5.1).

Pruning. We expect that when a mistake is made in the prefix, the error committed in the Gram-Schmidt will be larger. We therefore propose to prune prefixes when \(\sum _{k=i}^j e_k^2/\sigma _k^2\ge B_{j-i}\) for some ij where \(e_k\) is the difference between the measured k-th squared Gram-Schmidt norm and the one of the prefix. The bound \(B_l\) is selected so that for \(e_k\) a Gaussian of standard deviation \(\sigma _k\), the condition is verified except with probability \(\tau /\sqrt{l}\). The failure probability \(\tau \) is geometrically decreased until the correct solution is found.

Verifying Candidates. Let \(A = f\overline{f} + g\overline{g}\), then \(f\overline{f} = A(1+h\overline{h})\bmod q\). As mentioned in Sect. 4, all coefficients except the constant one of \(f\overline{f}\) would be much smaller the modulus q. This can be used to check if a candidate is correct. In addition, both A(x) and \(A(-x)\) are the final candidates, we also check \(A(1+h(-x)\overline{h}(-x))\) to ensure that the correct \(A(-x)\) will not to be eliminated. Once either A(x) or \(A(-x)\) is found, we terminate the algorithm.

The Use of Symplecticity. As observed in [18], the trapdoor basis \(\mathbf {B}_{f,g}\) is q-symplectic and thus \(\Vert \mathbf {b}_i^*\Vert \cdot \Vert \mathbf {b}_{2n-1-i}^*\Vert = q\). Based on that, we combine the samples of the i-th and \((2n-1-i)\)-th Gaussians to approximate \(\Vert \mathbf {b}_i^*\Vert \). This helps to refine the approximations and thus to reduce the number of signatures enabling a practical attack.

7.3 Experimental Results

We validate the recovery algorithm on practical DLP instances. Experiments are conducted on the parameter set claimed for 192-bit security in which

$$\begin{aligned} n = 512,~~q\approx 2^{10},~~\sigma = 2.2358\sqrt{q},~~\Vert \mathbf {b}_i^*\Vert \le 1.17\sqrt{q}. \end{aligned}$$

The leakage data we extracted is the number of iterations of centered Gaussian samplings (Algorithm 6.2). To obtain it, we added some instrumentation to Prest’s C++ implementation [46]. The centered Gaussian samplings only depend on the secret key itself not the hashed message. Hence, instead of executing complete signing, we only perform centered Gaussian samplings. We mean by sample size the number of collected Gaussian samples. In fact, considering the rejection sampling in Algorithm 6.1, one requires about N/2 signatures to generate N samples per centered Gaussian.

We tested our algorithm on ten instances, and result is shown in Table 1. Producing the dataset of \(2^{36.5}\) samples for a given key took about 36 hours on our 48-core machine (two weeks for all 10 distinct keys).

In one instance, the recovery algorithm found millions of candidate solutions with Gram-Schmidt norms closer to the noisy ones than the correct solution, in the sense that they had a larger \(\tau \). This indicates that the recovery algorithm is relatively close to optimality.

Table 1. Experimental validation of the recovery of \(f\overline{f} + g\overline{g}\). The first column and row indicate the time limit and the logarithm of used sample size respectively. The remaining data shows how many instances out of 10 are solved correctly within the time limit and with given number of samples.

7.4 Precision Required on the Gram–Schmidt Norms

We try here to give a closed formula for the number of samples needed. We recall that the relative error with respect to the Gram-Schmidt norm (squared) is \(\varTheta (1/\sqrt{N})\) where N is the number of samples.

A fast recovery corresponds to the case where only one root is close to an integer; and in particular increasing by one the new coefficient must change by \(\varOmega (1/\sqrt{N})\) the Gram-Schmidt norm. This is not an equivalence because there is another root of the quadratic form, but we will assume this is enough.

Let \(b_1\) be the first row of \(\begin{pmatrix} \mathcal A_n(f)&\mathcal A_n(g)\end{pmatrix}\), and \(b_i\) the i-th row for \(i\ge 2\). We define \(pb_i\) as the projection of \(b_1\) orthogonally to \(b_2,\dots ,b_{i-1}\). We expect that \(\Vert pb_i\Vert \approx \sqrt{\frac{2n-i+2}{2n}}\Vert b_1\Vert \). Consider the Gram matrix of the family \(b_1,\dots ,b_{i-1},b_{i}\pm \frac{pb}{\Vert b_1\Vert ^2}\). We have indeed changed only the top right/bottom left coefficients by \(\pm 1\), beside the bottom right coordinate. Clearly this does not change the i-th Gram-Schmidt vector; so the absolute change in the i-th Gram-Schmidt norm squared is

$$\begin{aligned} \left\| b_i\pm \frac{pb_i}{\Vert b_1\Vert ^2}\right\| ^2-\Vert b_i\Vert ^2\approx \pm \frac{\langle b_i,pb_i\rangle }{\Vert b_1\Vert ^2}. \end{aligned}$$

The Gram-Schmidt norm squared is roughly \(\Vert pb_i\Vert ^2\).

Getting only one solution at each step with constant probability corresponds to

$$\begin{aligned} \langle b_i,pb_i\rangle \ge \frac{\Vert b_i\Vert \Vert pb_i\Vert }{\sqrt{2n-i+2}} \end{aligned}$$

(assuming the scalar product is distributed as a Gaussian) which means a total number of samples of

$$\begin{aligned} N=\varTheta \left( \frac{\sqrt{2n-i+2}\Vert pb_i\Vert \Vert b_1\Vert ^2}{\Vert b_i\Vert \Vert pb_i\Vert }\right) ^2=\varTheta (n\Vert b_1\Vert ^2)=\varTheta (nq^2). \end{aligned}$$

This gives roughly \(2^{29}\) samples, which is similar to what the search algorithm requires.

Getting only one solution at each step with probability \(1-1/n\) corresponds to

$$\begin{aligned} \langle b_i,pb_i\rangle \ge \frac{\Vert b_i\Vert \Vert pb_i\Vert }{n\sqrt{2n-i+2}} \end{aligned}$$

and \(N=\varTheta (n^3q^2)\). This would be \(2^{57}\) samples.

8 Conclusion and Future Work

In this paper, we have investigated the side-channel security of the two main efficient hash-and-sign lattice-based signature schemes: DLP and Falcon (focusing on their original implementations, although our results carry over to several later implementations as well). The two main takeaways of our analysis are that:

  1. 1.

    the Gram–Schmidt norms of the secret basis leak through timing side-channels; and

  2. 2.

    knowing the Gram–Schmidt norms allows to fully recover the secret key.

Interestingly, however, there is a slight mismatch between those two results: the side-channel leakage only provides approximate values of the Gram–Schmidt norms, whereas secret key recovery a priori requires exact values. We are able to bridge this gap in the case of DLP by combining the recovery algorithm with a pruned tree search. This lets us mount a concrete attack against DLP that recovers the key from \(2^{33}\) to \(2^{35}\) DLP traces in practice for the high security parameters of DLP (claiming 192 bits of security).

However, the gap remains in the case of Falcon: we do not know how to modify our recovery algorithm so as to deal with approximate inputs, and as a result apply it to a concrete attack. This is left as a challenging open problem for future work.

Also left for future work on the more theoretical side is the problem of giving an intrinsic description of our recovery algorithms in terms of algebraic quantities associated with the corresponding totally positive elements (or equivalently, to give an algebraic interpretation of the LDL decomposition for algebraically structured self-adjoint matrices). In particular, in the Falcon case, our approach shows that the Gram–Schmidt norms characterize the Galois conjugacy class of a totally positive element. This strongly suggests that they should admit a nice algebraic description, but it remains elusive for now.

The final recovery in our attack, that is computing f from \(f\bar{f} + g\bar{g}\), heavily relies on the property of NTRU. We need further investigations to understand the impact of Gram-Schmidt norm leakage in hash-and-sign schemes over other lattices. But for non-structured lattices, there appears to be a strong obstruction to at least a full key recovery attack, simply due to the dimension of the problem: there are only n Gram-Schmidt norms but \(O(n^2)\) secret coefficients to be recovered.

On a positive note, we finally recall that the problem of finding countermeasures against the leakage discussed in this paper is fortunately already solved, thanks to the recent work of Prest, Ricosset and Rossi [48]. And that countermeasure has very recently been implemented into Falcon [45], so the leak can be considered as patched! The overhead of that countermeasure is modest in the case of Falcon, thanks to the small range in which the possible standard deviations occur; however, it could become more costly for samplers that need to accommodate a wider range of standard deviations.

An alternate possible countermeasure could be to use Peikert’s convolution sampling [42] in preference to the KGPV approach, as it eliminates the need for varying standard deviations, and is easier to implement even without floating point arithmetic [9]. It does have the drawback of sampling wider Gaussians, however, and hence leads to less compact parameter choices.