Keywords

1 Introduction

In recent years, many property-preserving encryption (PPE) schemes and property-revealing encryption (PRE) schemes have been proposed with increased efficiency or/and security. This condition promotes the occurrence of encrypted database (EDB) systems. CryptDB [22] has been proposed by Popa et al. as the first practical EDB system for executing data manipulations on encrypted data. Because of its onion encryption model and its proxy architecture, CryptDB supports most of the basic operations on ciphertexts with acceptable efficiency.

As a kind of PPE, order-preserving encryption (OPE) has been gaining more and more attention and studies because of its applications on EDB. OPE was first proposed for numeric data by Agrawal et al. [3], where the order of plaintexts can be obtained by comparing their ciphertexts directly. Later, order-revealing encryption (ORE) was proposed by Boneh et al. [6] as the generalization of OPE, where the ciphertexts reveal their order by a special algorithm rather than comparing themselves directly.

Even though OPE and ORE aim at leaking nothing other than the order of ciphertexts, many attacks have been proposed against OPE and ORE in recent years. Naveed et al. [20] proposed several inference attacks against the deterministic encryption (DTE) and OPE in CryptDB. Durak et al. [10] showed that some ORE schemes, whose security is discussed on uniform inputs, could make the plaintext recovery of some known attacks more accurate on nonuniform data. They also proposed an attack, aiming at multiple encrypted columns of correlated data, which reveals more information than prior attacks against columns individually. Grubbs et al. [13] proposed new leakage-abuse attacks that achieve high-correctness recovery on OPE-encrypted data. They also presented the first attack on frequency-hiding OPE proposed in [16].

1.1 Our Contributions

In this paper, we first demonstrate the power of file-injection attacks (FIAs) on OPE/ORE, by developing two categories of FIA schemes (to the best of our knowledge, the first such attacks) against OPE/ORE. (The underlying assumptions and work flows of FIAs are briefly described in Sect. 3.1). Our FIA attacks are generic and powerful, in the sense that they only exploit the ideal leakage of OPE/ORE. Specifically, for our FIA attacks to work, the adversary only possesses the plaintext space, some old order by or range queries and the corresponding cipher result sets returned from EDB. In particular, the adversary does not need either the ability of comparing ciphertexts with the ORE comparison algorithm, or that of obtaining the ciphertexts outside of the result sets for order by and range queries.

In comparison with other attacks against OPE/ORE proposed in recent years, our FIA attacks rely upon less demanding conditions, and are more effective (particularly for attacking systems, like encrypted email systems, with the function of data sharing or transferring). For example, compared with the attacks against OPE/ORE proposed in [13, 20], our FIA attacks have the following features simultaneously: (1) no need of data denseness or frequency, and (2) generic against any OPE/ORE with ideal leakage. Furthermore, as shown in Appendix A, we compare and clarify in detail the advantages of our attacks over the chosen-plaintext attack (CPA) and the inference attack (IA). We present more details and experiments in the extended version [25] of this paper. Moreover, the experiment results show that our FIAs can cause an extreme hazard on most of the existing OPE and ORE schemes with high efficiency and 100% recovery rate.

The strong security property against FIA is forward security, which ensures that the previous data manipulations do not cause any leakage of the newly inserted data. In other words, it is infeasible for the server to produce a correct response when applying an old query to newly inserted ciphertexts encrypted by a forward secure scheme. To the best of our knowledge, no OPE/ORE construction offered the forward security to thwart FIAs up to now.

In this work, we give the formal definition of forward security for OPE/ORE, which might be of independent interest. Then, we propose a compilation framework for achieving forward secure ORE schemes against FIA attacks. Specifically, the compilation framework is applicable to most of the existing OPE/ORE schemes to transform them into forward secure ones. The resultant forward secure schemes leak nothing about newly inserted data that match the previous order by or range queries. Moreover, the compilation framework is constructed with the goal of minimizing the extra burden incurred on computation and storage. In particular, the compilation only uses some simple cryptographical tools like pseudo-random function (PRF), keyed hash function and trapdoor permutation (TDP). Finally, we execute some experiments to analyze the additional cost caused when applying our compilation framework to some prominent OPE/ORE schemes developed in recent year.

1.2 Related Work

Order-Preserving Encryption (OPE). Agrawal et al. [3] first proposed an OPE scheme for numeric data. Afterwards, OPE was formally studied by Boldyreva et al. [4], where, in particular, two leakage profiles were introduced. Boldyreva et al. [5] analyzed the one-wayness security of OPE, and showed that any OPE scheme must have immutable large ciphertexts if the scheme is constructed for leaking only order and frequency information. Popa et al. [21] proposed an OPE scheme in order tree structure, which is the first OPE scheme achieving the security of IND-OCPA (indistinguishability under ordered chosen-plaintext attack). Kerschbaum [16] proposed a frequency-hiding OPE scheme, which supports the security of IND-FA-OCPA (indistinguishability under frequency-analyzing ordered chosen-plaintext attack) for the first time. Later, a partial order preserving encryption (POPE), with a method for frequency-hiding, was developed by Roche et al. [23].

Order-Revealing Encryption (ORE). ORE was first generalized from OPE by Boneh et al. [6]. Their ORE scheme is built upon multilinear maps, which provides better security but at the cost of worse efficiency. Chenette et al. [9] proposed the first practical ORE, which achieves a simulation-based security w.r.t. some leakage functions that precisely quantify what is leaked by the scheme. Recently, Cash et al. [8] presented a general construction of ORE with reduced leakage as compared to [9], but at the cost of using a new type of “property-preserving” hash function based on bilinear maps.

File-Injection Attack on SSE. As a kind of query-recovery attack, Islam et al. [15] initiated the study of FIA attack against searchable symmetric encryption (SSE), by showing that a curious service provider can recover most of the keywords-search queries with high accuracy. Cash et al. [7] further improved the power of the attack initiated in [15], by assuming less knowledge about the files of clients even in a larger plaintext space. Except the encrypted email systems like Pmail [2], they also discussed how their active attacks (e.g., query recovery attacks, partial plaintext recovery attacks, FIAs) might be used to break through other systems such as the systems in [14, 17]. Zhang et al. [26] showed that FIA can recover the keywords-search queries with just a few injected files even for SSE of low leakage. Their attacks outperform the attacks proposed in [7, 15] in efficiency and in the prerequisite of adversary’s prior knowledge.

2 Preliminaries

In this section we introduce some fundamental knowledge of TDP, ORE and OPE. We use standard notations and conventions below for writing probabilistic algorithms, experiments and protocols. If \(\mathcal {D}\) denotes a domain, \(x \xleftarrow {\$} \mathcal {D}\) is the operation of picking an element uniformly at random from \(\mathcal {D}\). If \(\mathbf {S}\) is a set, then for any k, \(0\le k\le |\mathbf {S}|-1\), \(\mathbf {S}[k]\) denotes the \((k+1)\)-th element in \(\mathbf {S}\). If \(\alpha \) is neither an algorithm nor a set, then \(x\leftarrow \alpha \) is a simple assignment statement. If A is a probabilistic algorithm, then \(A(x_1, x_2, \cdots ; r)\) is the result of running A on inputs \(x_1, x_2, \cdots \) and coins r. We let \(A(x_1, x_2, \cdots )\rightarrow y\) denote the experiment of picking r at random and letting y be \(A(x_1, x_2, \cdots ; r)\). By \(\mathbb {P}[R_1; \cdots ; R_n: E]\) we denote the probability of event E, after the ordered execution of random processes \(R_1, \cdots , R_n\).

Definition 1

(Trapdoor Permutation). A tuple of polynomial-time algorithms (\(\mathsf {KeyGen}, \mathsf {\Pi }, \mathsf {Inv}\)) over a domain \(\mathcal {D}\) is a family of trapdoor permutations (or, sometimes, a trapdoor permutation informally), if it satisfies the following properties:

  • \( \mathsf {KeyGen}(1^\lambda ) \rightarrow (I, \mathrm {td}) \). On input a secure parameter \(\lambda \), the parameter generation algorithm outputs a pair of parameters (I, td). Each pair of the parameters defines a set \(\mathcal {D}_I = \mathcal {D}_{\mathrm {td}}\) with \( |I| \geqslant \lambda \). Informally, \(\textit{I}\) (resp., \(\mathrm {td}\)) is said to be the public key (resp., secret key) of TDP.

  • \(\mathsf {KeyGen_1}(1^\lambda ) \rightarrow I\). Let \(\mathsf {KeyGen_1}\) be the algorithm that executes \(\mathsf {KeyGen}\) and returns I as the only result. Then (\(\mathsf {KeyGen_1}\), \(\mathsf {\Pi }\)) is a family of one-way permutations.

  • \(\mathsf {Inv}_{\mathrm {td}}(y) \rightarrow x\). \(\mathsf {Inv}\) is a deterministic inverting algorithm such that, for every pair of \((I,\mathrm {td})\) output by \(\mathsf {KeyGen}\)(\(1^\lambda \)) and any \(x \in \mathcal {D}_{\mathrm {td}} = \mathcal {D}_I\) and \(y=\mathsf {\Pi }_I(x)\), it holds \(\mathsf {Inv}_{\mathrm {td}}(y) = x\). For presentation simplicity, we also write the algorithm \(\mathsf {Inv}_{\mathrm {td}}\) as \(\mathsf {\Pi }_{\mathrm {td}}^{-1}\), and denoted by \(\mathsf {\Pi }^k_{I}(x)=\overbrace{\mathsf {\Pi }_{I}(\mathsf {\Pi }_{I}(\cdots \mathsf {\Pi }_{I}(x)\cdots ))}^{k\ \mathrm {TDPs}}\) for some integer \(k\ge 1\).

2.1 Definition of ORE

Definition 2

(Order-Revealing Encryption). A secret-key encryption scheme is an order-revealing encryption (ORE), if the scheme can be expressed as a tuple of algorithms \(\mathsf {ORE} = (\mathsf {ORE{.}Setup}, \mathsf {ORE{.}Encrypt}, \mathsf {ORE{.}Compare})\) which is defined over a well-ordered domain \(\mathcal {M}\).

  • \(\mathsf {ORE{.}Setup}(1^\lambda ) \rightarrow (pp,sp)\). On input of a secure parameter \(\lambda \), the setup algorithm outputs the set of public parameters pp and the set of secret parameters sp which includes the secret key for encryption algorithm.

  • \(\mathsf {ORE{.}Encrypt}(pp,sp,m,\sigma _1) \rightarrow c\). On input of pp, sp and a set \(\sigma _1\) of other auxiliary parameters (that are not generated in the setup algorithm), the encryption algorithm encrypts the input plaintext \(m\in \{0,1\}^*\) to a ciphertext c that can reveal the correct order with other ciphertexts.

  • \(\mathsf {ORE{.}Compare}(pp,sp,c_1,c_2,\sigma _2) \rightarrow b\). On input of pp, sp, two ciphertexts \(c_1,c_2\), and the set \(\sigma _2\) of other auxiliary parameters, the comparison algorithm returns a bit \(b\in \{0,1\}\) as the result of order.

The ORE definition in other literature is simple and only remains the necessary parameters. Our definition above is more complex, and the additional parameters are used for better describing the latter framework. With the above formulation, we aim for a generic and basic definition of ORE, where \(\sigma _1\) and \(\sigma _2\) may depend upon and vary with the concrete implementations of ORE. As a consequence, we do not introduce many details (that may vary with different implementations) and components like clients for interactive queries (as our FIAs are w.r.t. the generic OPE/ORE structure).

Leakage Profiles. The ideal leakage profile, the random order-preserving function profile, the most significant-differing bit profile, the RtM profile and the MtR profile are five leakage profiles that have been proposed in the literature. The first two were described by Boldyreva et al. [4], and the others were described by Chenette et al. [9].

We remark that, in Sect. 3, our FIAs are generic in the sense that they are constructed only with the ideal leakage profile. The ideal leakage profile just reveals the order and the frequency of the plaintexts. More precisely, only the leakage of order is necessary for our FIAs.

An adversary is said to be adaptive, if it is allowed to adaptively select data to be encrypted by the clients and then stored back to the server. Roughly speaking, an ORE scheme is said to be \(\mathcal {L}\)-adaptively-secure, if any probabilistic polynomial-time (PPT) adaptive adversary cannot learn more than the leakage as described according to the leakage profile \(\mathcal {L}\).

2.2 Definition of OPE

Order-preserving encryption (OPE) is a simplified case of ORE. The ciphertext domain \(\mathcal {C}\) of OPE needs to be well-ordered exactly as the plaintext domain \(\mathcal {M}\).

Definition 3

(Order-Preserving Encryption). A secret-key encryption scheme is an order-preserving encryption (OPE), if the scheme can be expressed as a tuple of algorithms OPE = (\(\mathsf {OPE.Setup},\,\mathsf {OPE.Encrypt}\)), which is defined over a well-ordered plaintext domain \(\mathcal {M}\) and a well-ordered ciphertext domain \(\mathcal {C}\).

  • \(\mathsf {OPE{.}Setup}(1^\lambda ) \rightarrow (pp,sp)\). On input of a secure parameter \(\lambda \), the setup algorithm outputs the set of public parameters pp and the set of secret parameters sp which includes the secret key for encryption algorithm.

  • \(\mathsf {OPE{.}Encrypt}(pp,sp,m,\sigma _1)\rightarrow c\). On input of pp, sp, and a set \(\sigma _1\) of other auxiliary parameters, the encryption algorithm encrypts the input plaintext m to a ciphertext c that preserves the correct order with other ciphertexts.

3 File-Injection Attacks on OPE/ORE

3.1 Assumptions and Basic Workflow

File injection attack has the following five assumptions: (1) The target system has a dependable component used for data-sharing or data-transmitting; (2) The adversary possesses the plaintext space of the target ciphertexts, and can store correct ciphertexts by sending some forged data to the client without suspicion; (3) The adversary possesses some old encrypted queries and can obtain the correct result sets from the server; (4) The adversary can only get the ciphertexts included in the result sets. (If the plaintext injected by the adversary does not match the queries, the corresponding ciphertext will not be known to it;) (5) The adversary is unable to forge queries or execute any PPE/PRE algorithm.

The basic workflow of FIA is briefly described as following:

  • First, the adversary forges some data and sends them to the client from the server. After being encrypted by the client, the resultant ciphertexts of the forged data are sent back to the server for storing.

  • Second, the adversary replays some old queries and infers the responses from the database management system (DBMS) with the leakage of newly inserted data.

  • Third, the adversary adaptively executes the first two steps repeatedly. And the data will be recovered successfully when the adversary obtains enough leakage.

In some application scenarios like encrypted email system (e.g., Pmail [2]) or the systems in [14, 17], FIA can be easily executed. Assuming that the server has already responded many email-order requests and recorded many encrypted data manipulation statements, the adversary can forge some emails and send to the client. When the new emails are encrypted and sent back to the DBMS, the adversary can take advantage of the entire set of ciphertexts, as well as the old queries, to collect more leakage and infer the corresponding plaintexts.

Unlike the FIA attacks against SSE, our FIA attacks against OPE/ORE are data-recovery attacks, which are more powerful. Moreover, the forged data are less likely to be detected because of the smaller forged part. Furthermore, by extending the concept of FIA, with our FIA attacks files do not only represent the data elements in NoSQL database, but can also be any kind of data which fit the target system.

Table 1. Notations in Sect. 3

3.2 Notations

Table 1 lists the meaning of some simple notations, which is helpful to comprehend the two FIA algorithms against ideal-secure OPEs/OREs presented in Sect. 3. Let \(\mathbf {R}^i_{q}\) and \(\mathbf {R}_q\) denote the result set of query q before the \((i+1)\)-th file-injection and the current result set of query q. Let \(c\xleftarrow {file\ injection} m\) denote the process in which the adversary sends the forged plaintext m from the server to the client and the resultant ciphertext c is sent back (by client) and stored in the EDB. Let a and b denote the indices of data which show their locations in their domains or their sets. Let \(\mathsf {mid}(a,b)\) denote an arbitrary scheme for efficient median calculation, regardless of the round-off method. Let \(\mathbf {d}\) and \(\mathbf {dqueue}\) denote a structural body contains two indices (ab) and a queue of the structural body. Let \(m_l\) and \(m_r\) (resp., \(q.c_l\) and \(q.c_r\)) denote the left plaintext (resp., ciphertext) and right plaintext (resp., ciphertext) boundary values of range condition in a range query q. We use the composite notation to represent the main part which is related to the additional part. Hence, we let \(m_c\) denote the plaintext of the ciphertext c, let \(\mathcal {M}_{a,b}\) denote the plaintext space between a and b, let d.a and d.b denote the parameters a and b in the structural body d. Let \(\mathcal {M}[\mathsf {mid}(a,b)]\) denotes the \((k+1)\)-th element in \(\mathcal {M}\) for \(k=\mathsf {mid}(a,b)\).

Fig. 1.
figure 1

Depth first binary search (a) and breadth first binary search (b). (Color figure online)

3.3 Binary Search

The two FIA algorithms presented below are based on a common algorithm – binary search. The difference between the two FIA algorithms lies in the search types they employ: one uses the traditional binary search like the depth first traversal, and another uses the breadth first traversal. The traditional binary search is a kind of (depth-first like) search algorithm, which finds the position of a target value within a sorted array by testing the order of the target value and the median value. In this work, we import the idea of breadth first traversal in the second FIA algorithm, with which we can get the relatively near data (around the target) that does not match the range condition.

We show two types of binary search in Fig. 1, where the colored nodes are the passed nodes with their order marked, and the crosses mark the target nodes. In the second FIA algorithm, our FIA attacker, with the range query determined by \((m_l,m_r)\), needs to find a value \(m_1\) matching the range condition, and a pair of relatively near unmatched values \((m_2,m_3)\) in the file-injected dataset, such that \(m_2<m_l<m_1<m_r<m_3\). The details are presented in Sect. 3.5.

3.4 Basic FIA with order by Query

Our FIA attacks use two kinds of order queries respectively: order by queries and range queries. The order by query (e.g., select * from table_1 order by column_1, which ensures that the result data are ordered, is one of two Data Manipulation Languages (DMLs) that are based on the order of data. And the other one is the range query with relational operators like “<”, “>” and so on. In Sects. 3.4 and 3.5, we present the attack models and the FIA algorithms, assuming the attacker possesses these two kinds of order queries respectively.

The attack model of basic FIA, with order by queries, consists of the adversarial information (i.e., leakage) and the adversarial goal. As to the adversarial information, we limit the power of adversaries in order for more practical attacks in practice. Specifically, the adversary only possesses, as adversarial information, the plaintext space \(\mathcal {M}\), the set \(\mathbf {Q}\) of old order by queries, and the result sets of those queries with forged data. In particular, they do not have any information about the data not in the result sets of the old queries. About the adversarial goal, we partition it into two types: recovering the plaintext of a single ciphertext, and recovering the plaintexts of all the ciphertexts in the result sets. This partition facilitates the discussion of time complexity as we show later. We formalize the attack model as following:

$$\mathbf {Leakage{:}}\qquad \mathcal {L}(\mathcal {M},\ \mathbf {Q},\ \mathbf {R}_{\mathbf {Q}}=\{\bigcup _{q \in \mathbf {Q}, 0\le i\le \omega } \mathbf {R}^i_{q|\mathrm {ordered}}\} )$$
$$\mathbf {Goal{:}}\qquad m_c\ (c \in \mathbf {R}^0_{q|\mathrm {ordered}},\ q \in \mathbf {Q})\ or\ {\mathbf {M}_{\mathbf {C}}\ (\mathbf {C}=\bigcup _{q\in \mathbf {Q}} \mathbf {R}_q)}$$

where \(\mathbf {R}^i_{q|\mathrm {ordered}}\) denotes the ordered result set for order by query q before the \((i+1)\)-th file-injection, \(\mathbf {R}^0_{q|\mathrm {ordered}}\) denotes the original ordered result set for order by query q, \(\mathbf {M}_{\mathbf {C}}\ (\mathbf {C}=\bigcup _{q\in \mathbf {Q}} \mathbf {R}_q)\) denotes the plaintext set \(\mathbf {M}_{\mathbf {C}}\) corresponding to the ciphertext set \(\mathbf {C}\) in the current result sets for all the queries in \(\mathbf {Q}\). Here, \(\mathbf {M}_\mathbf {C}\) can also be expressed as a mapping relation precisely, denoted \(\mathcal {T}_{\mathbf {(M,C)}}\), between all the ciphertexts in \(\mathbf {C}\) (which includes all the original and forged data) and their corresponding plaintexts in \(\mathbf {M_C}\).

figure a

For ease of comprehension, Algorithm 1 describes the elementary FIA based on utilizing a single order by query over an entire dataset. The adversary will continually detect the plaintext of the target ciphertext c with an old query q by file-injections. We use \(\mathsf {Comp}(c_i,c)\) to express the order result of query q about the target ciphertext c and the i-th injected ciphertext \(c_i\), where the result expresses as following:

$$\mathsf {Comp}(c_i,c)=\left\{ \begin{array}{rl} 0 &{} c_i=c\\ 1 &{} c_i>c\\ -1 &{} c_i<c. \end{array} \right. $$

Time Complexity. The time complexity of Algorithm 1 is \(O(\mathrm {log|\mathcal {M}|})\) obviously in the worst condition for recovering one plaintext. When the adversarial goal is to recover all the N nonrepetitive ciphertexts in the entire result set, the time complexity is \(O(N\mathrm {log}|\mathcal {M}|-N\mathrm {log}N)\) in the worst case. This means, in this case, the average time complexity of recovering a single ciphertext becomes smaller because the order of a ciphertext can be used for both sides. In other words, a file-injection for a target will reveal some order information about other target ciphertexts as well.

In Algorithm 1, we only take advantage of the leakage \(\mathcal {L}_1(\mathcal {M},\ q,\ \mathbf {R}_{q}')\), where \(\mathbf {R}_{q}'=\mathbf {R}_q\setminus \mathbf {R}^0_{q}\) is the result set after file-injections excluding the original result set. Because the leakage of the original result set \(\mathbf {R}_{q}^{0}\) is in the ideal leakage profile, we can only get some order information between the target ciphertext \(c_{\mathrm {target}}\) and other ciphertexts. In other words, we can rewrite the original result set as

$$\mathbf {R}^0_{q}=\{\mathbf {C}_{\mathrm {ordered}}^{-},c_{\mathrm {target}},\mathbf {C}_{\mathrm {ordered}}^{+}\}$$

where \(\mathbf {C}_{\mathrm {ordered}}^{-}\) is the set of ordered ciphertexts which are smaller than the target, and \(\mathbf {C}_{\mathrm {ordered}}^{+}\) is the set of ordered ciphertexts which are greater than the target. Under the assumption of knowing nothing about the original ciphertexts except their order information, we can only take advantage of \(|\mathbf {C}_{\mathrm {ordered}}^{-}|\) and \(|\mathbf {C}_{\mathrm {ordered}}^{+}|\) to curtail the plaintext space. We delete the first \(|\mathbf {C}_{\mathrm {ordered}}^{-}|\) plaintexts and the last \(|\mathbf {C}_{\mathrm {ordered}}^{+}|\) plaintexts from the ordered plaintext space \(\mathcal {M}\), and then we get a smaller new plaintext space \(\mathcal {M}'\) for the target \(c_{\mathrm {target}}\). Thus, the time complexity of recovering a single ciphertext becomes \(O(\mathrm {log}|\mathcal {M}'|)\) which is even smaller now. In this way, the adversary can adaptively curtail the plaintext space according to the number of ciphertexts on both sides after each file-injection.

Moreover, an improved method with hierarchical idea is presented in the extended version [25] of this paper.

3.5 FIA with Range Queries

The attack model of FIA with range queries also consists of the adversarial information and the adversarial goal. As to the adversarial information, the adversary just has the plaintext space \(\mathcal {M}\), the old range queries in \(\mathbf {Q}\), and the result sets of those queries without inner order. In this condition, the leakage is less than that with order by queries, because the adversary only knows the result set matching the range conditions without knowing the inner order. As to the adversarial goal, the adversary needs to recover the boundary plaintexts of the range conditions as well as all the plaintexts matching the range conditions. We formalize the attack model as following:

$$\mathbf {Leakage{:}}\qquad \mathcal {L}(\mathcal {M},\ \mathbf {Q},\ \mathbf {R}_{\mathbf {Q}}=\{\bigcup _{q \in \mathbf {Q}, 0\le i\le \omega } \mathbf {R}^i_{q}\})$$
$$\mathbf {Goal{:}}\quad \mathbf {M}_l,\ \mathbf {M}_r,\ {\mathbf {M}_{\mathbf {C}}\ (\mathbf {C}=\{c\ |\ q.c_l<c<q.c_r,q\in \mathbf {Q}\})}$$

where \(\mathbf {M}_\mathbf {C}\) can be expressed as a mapping relation precisely, denoted \(\mathcal {T}_{\mathbf {(M,C)}}\), between all the ciphertexts in \(\mathbf {C}\) (which includes all the original and forged data) and their plaintexts in \(\mathbf {M}_{\mathbf {C}}\), \(\mathbf {R}_q^i\) is not ordered, \(\mathbf {M}_l\ \mathrm {and}\ \mathbf {M}_r\) contain all the boundary plaintexts of the range queries in Q. In our construction, we design 3 steps to achieve the goal as following:

  • First, the adversary must find a plaintext matching the range condition, whether its ciphertext is in the original EDB or not.

  • Second, the adversary recovers the boundary plaintexts using Algorithm 1.

  • Third, the adversary recovers all the plaintexts of the ciphertexts matching the range condition by several file-injections.

Here, to describe the FIA scheme briefly, Algorithm 2 is based on utilizing a single range query without any order by operation. In the following descriptions, q denotes the range query with the boundary ciphertexts denoted \(q{.}{c_l}\) and \(q{.}{c_r}\) respectively. \(\mathbf {M}_{\mathbf {R}_{q}}\) denotes the plaintext set corresponding to the cipher result set \(\mathbf {R}_{q}\) for query q.

In Algorithm 2, we adopt the breadth first search, because under the assumption of FIA the adversary does not know the order between file-injected data and the boundary ciphertexts in case the file-injected data do not match the range condition. With this limitation, the breadth first search is beneficial to find a plaintext matching the condition, and to get the relatively near unmatching plaintexts that are necessary for recovering the boundary plaintexts. Then, the boundary plaintexts \(m_l\) and \(m_r\) are recovered by calling Algorithm 1. Finally, the plaintext set \(\mathbf {M}_{\mathbf {R}_{q}}\) is recovered by several file-injections over the entire plaintext set matching the condition.

figure b

Most of the boundary values are very special in practice. For instance, the numbers, which are the multiple of \(10^{\gamma }(\gamma =0,1,2...)\), are frequently used for range query over numerical data; and the 26 letters are used for the same purpose over string data usually. Based on the different frequency of the plaintexts which are between every two adjacent common boundary plaintexts, the adversary may recover them more rapidly by several file-injections instead of the first step.

For space limitation, the analysis of time complexity, the discussions on FIA with both order by Queries and Range Queries, the description of our experiments and the FIA against Frequency-Hiding OPE are presented in the extended version [25] of this paper.

4 Formulating Forward Secure ORE

Forward security is a strong property of the dynamic SSE leakage profile. For a dynamic SSE scheme, its forward security means that: the previous data manipulations do not cause any leakage of the newly inserted data. Stefanov et al. [24] proposed this notion informally. Stefanov et al. [24] also proposed the concept of backward security, which ensures that the previous data manipulations do not leak any information about the newly deleted data. In this work, we extend this concept from SSE to OPE/ORE. Specifically, we give the definitions of forward security and backward security informally, as following:

Definition 4

(Forward/Backward Security). An \(\mathcal {L}\)-adaptively-secure ORE scheme is forward (resp., backward) secure if the leakage profile, denoted \(\mathcal {L}_\mathsf {update}\), of update operation for \(\mathsf {update}=\mathsf {add}\) (resp., \(\mathsf {update}=\mathsf {delete}\)) can be described as following:

$$\mathcal {L}_\mathsf {update}(\mathsf {update}, \mathbf {W}_\mathsf {update})=(\mathsf {update}, \mathbf {IND}_\mathsf {update})$$

where \(\mathsf {add}\) (resp., \(\mathsf {delete}\)) denotes the addition (resp., deletion) of data. \( \mathbf {W}_\mathsf {update}\) is the data set of the update operations, in which the data have their own data storage structure, indices, and constraints according to the database. \(\mathbf {IND}_\mathsf {update}\) is a set that only describes the modified column (in SQL database) or the document (in NoSQL database) and the indices of updated data.

Informally, a forward secure ORE ensures that the previous data order manipulations do not leak any information about the newly inserted data. Meanwhile, the new data order manipulations can be executed normally, and can correctly leak the order information about the newly inserted data. And in a forward secure ORE scheme, \(\mathbf {W}_\mathsf {update}\) of a simple insertion can be briefly described as \(\mathbf {W}_\mathsf {add}=(m,s)\), where s denotes the order space of the related data on which order queries may be executed. For SQL databases, s can represent a column of a table. And for NoSQL databases, s can represent a set of documents. \(\mathbf {IND}_\mathsf {update}\) of a simple insertion can be briefly described as \(\mathbf {IND}_\mathsf {add}=(j,s)\), where the incremental timestamp j is initially set to be 0 and is shared by all the manipulations.

Let e denote the intermediate ciphertext without forward security. Let \(\mathbf {op}(s)\) denote the order pattern of an order space s, which lists all the timestamps of the order queries. \(\mathbf {Hist}(s)\) contains all the data-updating histories of s as well as the index \(index_s\) of s. Here, we only use it to list all the data-addition histories over the time. More formally, they can be defined as:

$$\mathbf {op}(s)=\{j:(j,s,\mathsf {order}) \in \mathbf {List_{SQL}}\}$$
$$\mathbf {Hist}(s)=\{index_s,(j,\mathsf {add},e):(j,s,\mathsf {add},e) \in \mathbf {List_{SQL}}\}$$

where \(\mathsf {add}\) denotes the addition manipulation, \(\mathsf {order}\) denotes the ordering manipulation, \(\mathbf {List_{SQL}}\) denotes the list of data-manipulations. And we give the formal definition of forward secure ORE below.

Definition 5

(Forward Secure ORE). Let the algorithm tuple

$$\varGamma ={(\mathsf {ORE\_Setup},\mathsf {ORE\_Encrypt},\mathsf {ORE\_Compare})}$$

be an ORE scheme. Let \(\mathcal {A}\) denote a PPT adaptive adversary. Define a real security game FS-ORE-R\(_\mathcal {A}^\varGamma (\lambda )\), in which \(\mathcal {A}\) gets the public parameters output by \(\mathsf {ORE\_Setup}(\lambda )\) and gets access to the encryption oracle and the comparison oracle adaptively. Based on the given public parameters and all the answers received from the oracles, \(\mathcal {A}\) outputs a bit as the result of the game. Define an ideal security game FS-ORE-I\(_{\mathcal {A},\mathcal {S},\mathcal {L}_{\varGamma }}^\varGamma (\lambda )\), in which a PPT simulator \(\mathcal {S}\) only takes the leakage profile \(\mathcal {L}_{\varGamma }\) as input. \(\mathcal {L}_{\varGamma }\) has two parts as following:

$$\mathcal {L}_\mathsf {update}(\mathsf {add},(m,s))=(\mathsf {add},(j,s))$$
$$\mathcal {L}_\mathsf {compare}(c_1,c_2,s)=(\mathbf {op}(s),\mathbf {Hist}(s))$$

The simulator \(\mathcal {S}\) will output a bit as the result of the ideal game. The scheme \(\varGamma \) is said to be forward secure, if the following equation holds for any sufficient large \(\lambda \):

where \(\mathrm {negl}(\lambda )\) denotes a negligible function.

5 A Compilation Framework for Forward Secure ORE

To the best of our knowledge, all the existing OPE and ORE schemes in the literature do not have forward security precisely. Here, we use “precisely” with only the special case of POPE [23]. In [23], there is not any statement about whether the interactive processes need a client authorization or not. For the common application scenarios of OPE/ORE in practice, there is not any client authorization for querying. However, if the client authorization is mandated, POPE has forward security.

In the general case, the ciphertexts in EDB do not cover the entire ciphertext space. In other words, the ciphertexts in EDB are not dense usually. Thus, it is difficult to recover all the stored ciphertexts correctly with the limited leakage of OPE/ORE. However, according to our FIA constructions and experiments, FIA schemes are powerful and effective in recovering data encrypted by OPE/ORE without forward security in practice. Though forward security can be achieved with oblivious RAM (ORAM) [11, 12] in general, it incurs massive overburden of performance [19] (large bandwidth consumption, multiple data round-trips, and/or large client storage complexity). Thus, it is desirable to have practical forward secure OPE/ORE schemes.

Table 2. Notations in Sect. 5

In this section, we present a practical compilation framework that transforms most of the existing OPE/ORE schemes into forward secure ones. To ease the understanding of the framework, we first give the meaning of some notations in Table 2.

5.1 Basic Ideas

With forward security, the add operation should leak nothing to server. In other words, the server should not distinguish between the ciphertexts output by a forward secure ORE and the ciphertexts encrypted by a perfect encryption scheme, when they are just inserted to the database before undergoing any search operation. In order to realize this goal, the ciphertext e generated by original OPE/ORE should be salted in our compilation framework. And we use TDP to link the salts to reduce the bandwidth consumption.

The salt is a hash value of an order token OT in our construction. To insert a new datum to EDB (say, the \((i+1)\)-th insertion, \(i\ge 0\)), the client generates an order token OT\(_{i}\) based on the TDP scheme \(\mathsf {\Pi }\), its secret key sk, and the last order token OT\(_{i-1}\). If \(\mathrm {OT}_i\) (\(i=0\)) is the first order token in the order space, it will be randomly selected from the domain of order token \(\mathcal {OT}\). In order to reduce the client storage, the client only stores the latest order token \(\mathrm {OT}_i\) and the corresponding counter i in our basic construction. When an order query needs to be executed, the client sends the current order token \(\mathrm {OT}_i\) and the counter i to the server. The server can then calculate all the order tokens with the public key pk, and gets the original OPE/ORE ciphertexts by desalting operations. At last, the client will receive the correct comparison result which is calculated with the comparison algorithm of the original OPE/ORE by the server.

5.2 The Compilation Framework

Given any OPE or ORE scheme, denoted \(\varGamma =(\mathsf {ORE\_Setup},\mathsf {ORE\_Encrypt},\mathsf {ORE\_Compare})\), the compiled ORE scheme is described in Algorithm 3, which is denoted by \({\varGamma }_{fp}=(\mathsf {Setup},\mathsf {Encrypt},\mathsf {Compare})\). In Algorithm 3, the parts of the original OPE/ORE are only briefly described.

figure c

In our construction, we let \(\lambda \) denote the secure parameter. Let \(k_0\) denote the main key of our compilation framework. For each order space s, the key \(k_s\) of keyed hash function \(\mathsf {H}\) is calculated by pseudo-random function \(\mathsf {PRF}_{k_0}[s]\). Let \(\mathsf {add}\) denote the addition/insertion of data. The order tokens are calculated with TDP one by one in sequence, and the hash values of these tokens will xor the original OPE/ORE ciphertexts to generate the final ciphertexts without extra storage consumption at the server side. The salt of the final ciphertext is of \(\lambda \) bits, and will be desalted in the comparison algorithm. In the comparison algorithm, we let \(c_{s_\alpha }\) and \(c_{s_\beta }\) denote two ciphertexts to be compared in the order space s with their indices \(\alpha \) and \(\beta \) respectively. We let \(e_{s_\alpha }\) and \(e_{s_\beta }\) denote their intermediate ciphertexts output by the original OPE/ORE respectively.

For space limitation, the methods of data deletion and batch encryption is postponed to Appendix B.

5.3 Analysis of Forward Security

In our framework, the ciphertexts output by the original OPE/ORE xor the one-way generated salts. Hence, the newly inserted data leak nothing to the server if they have not been queried. Once the data have been queried and desalted, the ciphertexts turn into the security level of the original OPE/ORE scheme for the adversary with continuous monitoring. Hence, the security of the composite forward secure ORE cannot be weaker than that of the original OPE/ORE. On the other hand, our compilation framework is powerful against FIAs, because the forged data will not leak any information with the old queries. The data need a new credible order query from the client to desalt.

For space limitation, the formal proof of forward security is presented in the extended version [25] of this paper. Moreover, the description of our experiments is postponed to Appendix C.

6 Conclusion and Future Work

In this work, we study the leakage of OPE and ORE. We propose generic yet devastating FIA attacks which only exploit the ideal leakage of OPE/ORE. We also propose various improved methods to further boost the efficiency. Compared with existing attacks against OPE/ORE, our FIA attacks rely upon less demanding conditions, and can be more effective. We executed some experiments on real datasets to test the performance, and the results show that our FIA attacks can cause an extreme hazard on most of the existing OPE and ORE schemes with high efficiency and 100% recovery rate.

We then formulate forward-secure ORE, which may be of independent interest. In order to resist the disastrous effectiveness of FIA, we propose a practical compilation framework for transforming most existing OPE/ORE schemes into forward-secure ones. Finally, we execute experiments on some prominent OPE/ORE schemes developed in recent years, and the results show that our compilation framework is practical and useful for most of the systems.

Our compilation framework does not fit the OPE/ORE schemes which store the inserted data in order trees. Achieving forward security for these OPE/ORE schemes is an interesting direction for future research.