1 Introduction

Steganographic protocols enable one to “embed” covert messages into inconspicuous data over a public communication channel in such a way that no one, aside from the sender and the intended receiver, can even detect the presence of the secret message. The steganographic communication problem can be described using Simmons’ [13] formulation of the problem: In this scenario, prisoners Alice and Bob wish to communicate securely in the presence of an adversary, called the “Warden,” who monitors whether they exchange “conspicuous” messages. In particular, Alice and Bob may exchange messages that adhere to a certain channel distribution that represents “inconspicuous” communication. By controlling the messages that are transmitted over such a channel, Alice and Bob may exchange messages that cannot be detected by the Warden. There have been two approaches in formalizing this problem, one based on information theory [2, 8, 15] and one based on complexity theory [5]. Most steganographic constructions supported by provable security guarantees are instantiations of the following basic procedure (often referred to as “rejection sampling”).

The problem specifies a family of message distributions (the “channel distributions”) that provide a number of possible options for a so-called “covertext” to be transmitted. Additionally, the sender and the receiver possess some sort of private function indexed by the shared secret key (typically a keyed hash function, MAC, or other similar function) that maps channel messages to a single bit. In order to send a message bit m, the sender draws a covertext from the channel distribution, applies the function to the covertext and checks whether it happens to produce the “stegotext” m she originally wished to transmit. If this is the case, the covertext is transmitted. In case of failure, this procedure is repeated. While this is a fairly concrete procedure, there are a number of choices to be made with both practical and theoretical significance. From the security viewpoint, one is primarily interested in the choice of the function that is shared between the sender and the receiver. From a practical viewpoint, one is primarily interested in how the channel is implemented and whether it conforms to the various constraints that are imposed on it by the steganographic protocol specifications (e.g., are independent draws from the channel allowed? does the channel remember previous draws? etc.).

The shared key between Alice and Bob can be an expensive resource; to focus on this parameter, we define a notion of overhead equal to the ratio of the length of the secret key to the length of the message. Prior work in statistically secure steganography either gives high overhead or considers restricted covertext distributions. For instance, in the information-theoretic model, Cachin [2] demonstrated a steganographic protocol that works on restricted covertext distributions where the channel is a stationary distribution produced by a sequence of independent repetitions of the same experiment. Under this uniformity assumption, he uses sequences of covertexts to encode the message and obtains optimal overhead. In the complexity-theoretic setting, Hopper et al. [5, 6] provided a provably secure stegosystem that pairs rejection sampling with a pseudorandom function family to offer security for general (history-dependent) channel distributions with constant min-entropy. However, this protocol has a few drawbacks. First, casting their result in the information-theoretic setting, the length of the secret key shared by Alice and Bob yields an overhead polynomial in the length of the message as this is the overhead required to share a suitable random function. In the complexity-theoretic setting, from an efficiency viewpoint, their construction required about two evaluations of a pseudorandom function per bit transmission. Constructing efficient pseudorandom functions is possible either generically [4] or, more efficiently, based on specific number-theoretic assumptions [10]. Nevertheless, pseudorandom function families are a conceptually complex and fairly expensive cryptographic primitive. For example, the evaluation of the Naor–Reingold pseudorandom function on an input x requires O(|x|) modular exponentiations. Similarly, the generic construction [4] requires O(k) PRG doublings of the input string where k is the length of the key.

Our protocol remedies these shortcomings. We show how it is possible to attain constant overhead for general channel distributions with constant min-entropy. The only assumptions employed in our analysis are merely that the channel alphabet is polynomial in the length of the message m and the security required is 2−|m|. Furthermore, our protocol in the computational setting is much more efficient: in particular, while the Hopper et al. stegosystem requires 2 evaluations per bit of a pseudorandom function, amounting to a linear (in the key-size) number of applications of the underlying PRG (in the standard construction for pseudorandom functions of [4]), in our stegosystem we require a constant number of PRG applications per bit. So the number of cryptographic operations per bit transmitted drops from linear to constant.

Central to our approach for improving the efficiency and overhead for general distributions is the use of combinatorial constructs such as almost t-wise independent function families given by Alon et al. [1]. Our protocol is based on the rejection-sampling technique outlined above in combination with an explicit almost t-wise independent family of functions. We note that such combinatorial constructions have been extremely useful for derandomization methods and here, to the best of our knowledge, are employed for the first time in the design of steganographic protocols. The present paper is an extended version based on preliminary work that appeared in [7]; the present version includes a full security analysis that works for any constant min-entropy (as opposed to min-entropy of 1 bit that was assumed in this previous work).

2 Definitions and Tools

The security of a steganography protocol is measured by the adversary’s ability to distinguish between “normal” and “covert” messages over a communication channel. To characterize normal communication we need to define and formalize the communication channel. We follow the standard terminology used in the literature [2, 5, 6, 14]: Let Σ={σ 1,…,σ s } denote an alphabet and treat the channel as a family of random variables \(\mathcal{C} = \{C_{h}\}_{h \in {\varSigma}^{\ast}}\); each C h is supported on Σ. These channel distributions model a history-dependent notion of channel data that captures the notion of real-life communication. Such a channel induces a natural distribution on Σ n for any n: σ 1 is drawn from C ϵ , and each subsequent σ i is drawn from \(C_{\sigma_{1} \ldots \sigma_{i-1}}\). (Here we let ϵ denote the empty string.) Recall that the min-entropy of a random variable X, taking values in a set V, is the quantity

$$H_\infty(X) \triangleq \min_{v \in V} \bigl(-\log \Pr[X = v] \bigr)\,. $$

We say that a channel \(\mathcal{C}\) has min-entropy δ if for all hΣ , H (C h )≥δ.

2.1 One-time Stegosystems; The Steganographic Models

Steganography has been studied in two natural (but implicit up to now) communication models differing in Alice’s ability to sample from the channel.

Current History Model

The first model we study was that adopted by Hopper et al. [5]. In this model, Alice—and consequently the steganographic encoding protocol—has access to a channel oracle that provides samples from the channel for the current history. Alice is given no means of sampling from C h for other histories. We call this the current history model. In this case, one can imagine that the channel is determined by a complex environment: while Alice is permitted to sample from the channel determined by the current environment, she cannot simulate potential future environments. Naturally, the communication history is updated when a symbol is transmitted on the wire from Alice to Bob. Formally, if h 1,h 2,…,h Σ have been transmitted along the channel thus far, Alice may sample solely from \(C_{h_{1} \circ \cdots \circ h_{\ell}}\) and send an element of her choice.

Look-Ahead Model

The second model we study—the look-ahead model—was adopted by von Ahn and Hopper [14]. This model is a relaxation of the “current history” model: Alice is now provided with a means for sampling “deep into the channel.” In particular, Alice and, consequently, the steganographic encoding protocol, has access to a channel oracle that can sample from the channel for any history. Formally, during the embedding process, Alice may sample from \(C_{h_{1}\circ \cdots \circ h_{\ell}}\) for any future history h=h 1∘⋯∘h she wishes (though Alice is constrained to be efficient and so can make no more than polynomially many queries of polynomial length). This more generous model allows Alice to transform a channel C with min-entropy δ into a channel C (τ) with min-entropy τδ. Specifically, the channel C (τ) is defined over the alphabet Σ τ, whose elements we write as vectors h=(h 1,…,h τ ). The distribution \(C^{(\tau)}_{\mathbf{h}^{1}, \ldots, \mathbf{h}^{n}}\) is determined by the channel C with history \(\overline{h} = h^{1}_{1} \circ \cdots \circ h_{\tau}^{1} \circ h^{2}_{1} \circ \cdots \circ h^{n}_{\tau}\). Below we give the definition of a one-time stegosystem that works in either of the above models and is a steganographic system that enables the one-time steganographic transmission of a message provided that the two parties share a suitable key.

Definition 1

A one-time stegosystem consists of three probabilistic polynomial-time algorithms

$$S = (\mathit {SK}, \mathit {SE}, \mathit {SD}), $$

where:

  • SK is the key generation algorithm; we write SK(1k)=κ. It produces a key κ of length k.

  • SE is the embedding procedure and has access to the channel; \(\mathit {SE}(\kappa,m; \mathcal{O}) = s\in {\varSigma}^{*}\). The embedding procedure takes into account the history h of communication that has taken place between Alice and Bob thus far and begins its operation corresponding to this history. It takes as input the key κ of length k, a message m of length =(k) and a (probabilistic) oracle \(\mathcal{O}\) that allows SE to draw independent samples repeatedly from C h in the current history model. In the look-ahead model, the oracle \(\mathcal{O}\) accepts as input a (polynomial-length) history h′∈Σ and allows SE to draw independent samples repeatedly from C hh. The output is the stegotext sΣ . Observe that in a one-time stegosystem, once a security parameter k is chosen, the length of the message is a fixed function of k. As described above, the access that SE has to the channel is dictated by the model of communication.

  • SD is the extraction procedure; SD(κ,c)=m or fail. It takes as input the key κ of length k, and some cΣ . The output is a message m or the token fail.

We next define a notion of correctness for a one-time stegosystem.

Definition 2

(Correctness)

A one-time stegosystem (SK,SE,SD) is said to be (ϵ,δ)-correct provided that for all channels \(\mathcal{C}\) of min-entropy δ, we have ∀hΣ

$$\forall m \in\{0,1\}^{\ell(k)} \Pr\bigl[\mathit {SD}\bigl(\kappa, \mathit {SE}(\kappa,m; \mathcal{O})\bigr)\neq m \mid \kappa \gets \mathit {SK}\bigl(1^k\bigr) \bigr] \leq \epsilon. $$

In general, we treat both ϵ=ϵ(k) and δ=δ(k) as functions of k, the security parameter and the oracle \(\mathcal{O}\) as a function of the history h.

In the following paragraphs, we talk about the security for a one-time stegosystem. One-time stegosystem security is based on the indistinguishability between a transmission that contains a steganographically embedded message and a transmission that contains no embedded messages. The adversarial game discussed next is meant to model the behavior of a warden in the Simmons’ formulation of the problem discussed earlier.

An adversary \(\mathcal{A}\) against a one-time stegosystem S=(SK,SE,SD) is a pair of algorithms \(\mathcal{A}=(\mathit {SA}_{1}, \mathit {SA}_{2})\), that plays the following game, denoted \(G^{\mathcal{A}}(1^{k})\):

  1. 1.

    A key κ is generated by SK(1k).

  2. 2.

    Algorithm SA 1 receives as input the security parameter k and outputs a triple \((m , \textrm{aux}, h_{\sf c}) \in M_{\ell} \times \{0,1\}^{\ast}\times {\varSigma}^{*}\), where m is the challenge plaintext, \(h_{\sf c}\) is the history of the channel that the adversary wishes to use for the steganographic embedding to start, and aux is some auxiliary information that will be passed to SA 2. Note that SA 1 is provided access to \(\mathcal{C}\) via an oracle \(\mathcal{O}(h)\), which takes the history h as input. \(\mathcal{O}(\cdot)\), on input h, returns to SA 1 an element c selected according to C h . This way, the warden can learn about the channel distribution for any history.

  3. 3.

    A bit b is chosen uniformly at random.

    • If b=0 let \(c^{\ast}\gets \mathit {SE}(\kappa,m; \mathcal{O})\) where \(\mathcal{O}\) samples covertexts from \(C_{h_{\mathsf{c}}}\).

    • If b=1 let c =c 1∘⋯∘c λ where \(\lambda= | \mathit {SE}(\kappa,m; \mathcal{O})|\) and \(c_{i} \stackrel{r}{\gets} C_{h_{c} \circ \mathsf{c}_{1} \circ \cdots \circ c_{i-1}}\).

  4. 4.

    The input for SA 2 is 1k, h c , c and aux. SA 2 outputs a bit b′. If b′=b then we say that (SA 1,SA 2) succeeded and write \(G^{\mathcal{A}}(1^{k}) = \mathrm{success}\).

The advantage of the adversary \(\mathcal{A}\) over a stegosystem S is defined as

$$\mbox{{\bf Adv}}_S^\mathcal{A}(k) = \biggl\vert \Pr \bigl[ G^\mathcal{A}\bigl(1^k\bigr) = \mathrm{success} \bigr] - \frac{1}{2} \biggr\vert. $$

The probability includes the coin tosses of \(\mathcal{A}\) and SE, as well as the coin tosses of \(G^{\mathcal{A}}(1^{k})\). The (information-theoretic) insecurity of the stegosystem is defined as

$$\mbox{\bf{InSec}}_{S}(k) = \max_{\mathcal{A} }\bigl\{\mbox{ \bf{Adv}}_S^\mathcal{A}(k)\bigr\}\,, $$

this maximum taken over all (time unbounded) adversaries \(\mathcal{A}\).

Definition 3

(Security)

We say that a stegosystem is (ϵ,δ)-secure if for all channels with min-entropy δ we have \(\mbox{\bf{InSec}}_{S}(k) \leq \epsilon\).

As above, in general we treat both ϵ=ϵ(k) and δ=δ(k) as functions of k, the security parameter.

Overhead

The overhead of a one-time stegosystem expresses the relation of the key length k and message length variables; specifically, we adopt the ratio β=k/ as a measure for overhead.

This paper is an extended version of a previous abstract that appeared in [7]. The work presented by Kiayias et al. [7] considers the scenario where the communication channel has min-entropy at least 1 in the current history model. In this paper, we present steganography protocols both in the current history and the look-ahead model. We also present explicit constructions of error-correcting codes using Forney [3] concatenation scheme. Furthermore, our protocols operate on any communication channel with min-entropy δ>0.

2.2 Error-Correcting Codes

Our steganographic construction requires an efficient family of codes that can recover from errors introduced by certain binary symmetric channels. In particular, we require a version of the Shannon coding theorem [11, 12] that yields explicit control on the various parameters of the code as the rate approaches the capacity of the channel. We present this theorem in this section.

For an element x∈{0,1}n, we let B p (x) be the random variable equal to xe, where e∈{0,1}n is a random error vector defined by independently assigning each e i =1 with probability p. (Here xe denotes the vector with the ith coordinate equal to x i e i .) The classical coding theorem [12] asserts that for every pair of real numbers 0<R<C≤1 and n∈ℕ, there is a binary code A⊂{0,1}n, with log|A|/nR and θ∈ℝ, so that for each aA, maximum-likelihood decoding recovers a from B p (a) with probability 1−e θn, where p is determined from C as

$$H(p) = p \log p^{-1} + (1-p) \log (1-p)^{-1} = 1-C. $$

The quantity C is called the capacity of the binary symmetric channel and determines the random variable B p ; the quantity R=log|A|/n is the rate of the code A. In this language, the coding theorem asserts that at transmission rates lower than the capacity of the channel, there exist codes that correct random errors with exponentially decaying failure probability (in n, the length of the code). We formalize our requirements below.

Definition 4

An error-correcting code of rate r is a pair of functions E=(Enc,Dec), where Enc:{0,1}rn→{0,1}n is the encoding algorithm and Dec:{0,1}n→{0,1}rn the corresponding decoding algorithm. Specifically, we say that E is a (r,p,ϵ)-code if for all m∈{0,1}rn,

$$\Pr\bigl[\mathrm {Dec}\bigl(\mathrm {Enc}(m) \oplus {e}\bigr) = m\bigr] \geq 1 - \epsilon, $$

where e=(e 1,…,e n ) and each e i is independently distributed in {0,1} so that Pr[e i =1]≤p. We say that E is efficient if both Enc and Dec are computable in polynomial time in n.

We record the theorem below with the proof in Appendix A.

Theorem 1

(Based on Shannon [11, 12], Forney [3])

For any p∈[1/4,1/2] and any R<1−H(p), there is an efficient (R,p,ϵ)-code for which there is θ∈[0,1],n 0∈ℕ such that ϵ≤2θn/logn for any nn 0. Furthermore, we have θ −1=Θ(Z −1) and logn 0=Θ(Z −2logZ −1) as Z→0 where Z=1−H(p)−R.

2.3 Function Families and Almost t-Wise Independence

We will employ the notion of (almost) t-wise independent function families (cf. [1, 9]).

Definition 5

A family \(\mathcal{F}\) of Boolean functions on {0,1}v is said to be ϵ-away from t-wise independent or (v,t,ϵ)-independent if for any t distinct domain elements q 1,q 2,…,q t we have

$$ \sum_{\alpha\in \{0,1\}^t} \biggl\vert \Pr_{f}\bigl[f(q_1)f(q_2) \cdots f(q_t)=\alpha\bigr]-\frac{1}{2^t}\biggr\vert \leq \epsilon, $$
(1)

where f is chosen uniformly from \(\mathcal{F}\).

The above is equivalent to the following formulation quantified over all computationally unbounded adversaries \(\mathcal{A}\):

$$ \Bigl|\Pr_{f \stackrel{r}{\gets}\mathcal{F}}\bigl[ \mathcal{A}^{f[t]} \bigl(1^v\bigr) = 1\bigr] - \Pr_{f \stackrel{r}{\gets}\mathcal{R}}\bigl[ \mathcal{A}^{f[t]}\bigl(1^v\bigr)=1\bigr]\Bigr|\leq \epsilon, $$
(2)

where \(\mathcal{R}\) is the collection of all functions from {0,1}v to {0,1} and \(\mathcal{A}^{f[t]}\) is an unbounded adversary that is allowed to determine up to t queries to the function f before it outputs its bit. The equivalence is formally stated (without proof) as follows.

Lemma 2

\(\mathcal{F}_{\kappa}\) is ϵ′-away from t-wise independence according to (1) if and only if \(\mathcal{F}_{\kappa}\) is ϵ′-away from t-wise independence according to (2) above.

We employ the construction of almost t-wise independent sample spaces given by Naor and Naor [9], and Alon et al. [1]. The following theorem is a restatement of Theorem 3 from Alon et al. [1].

Theorem 3

([1, 9])

There exist families of Boolean functions \(\mathcal{F}^{v}_{t,\epsilon}\) on {0,1}v that are ϵ-away from t-wise independent, are indexed by keys of length (2+o(1))(logv+t/2+log(ϵ −1)), and are computable in polynomial time.

2.4 Rejection Sampling

A common method used in steganography employing a channel distribution is that of rejection sampling, described below (cf. [2, 5]).

Rejection Sampling in the Current History Model

In the current history model, assuming that one wishes to transmit a single bit m and employs a random function f:{0,1}d×Σ→{0,1} that is secret from the adversary, one performs the following “rejection-sampling” process:

figure a

Here, Σ denotes the output alphabet of the channel, h denotes the history of the channel at the start of the process, and C h denotes the distribution on Σ given by the channel with history h. The receiver (also privy to the function f) applies the function to the received message cΣ and recovers m with probability greater than 1/2. The sender and the receiver may employ a joint state (e.g., a counter), that need not be secret from the adversary. Note that the above process performs only two draws from the channel with the same history (more draws could, in principle, be performed, but we justify our choice of two draws in Lemma 8 of Sect. 3.1.2). These draws are assumed to be independent. One basic property of rejection sampling that we will prove and is helpful for our construction is the following:

Lemma 4

If f is drawn uniformly at random from the collection of all functions \(\mathcal{R} = \{ f : {\varSigma} \to \{0,1\}\;\}\) and \(\mathcal{C}\) has min-entropy δ, then

$$\Pr_{f \gets \mathcal{R}}\bigl[f\bigl({\operatorname {rejsam}}_h^{f}(m)\bigr) \neq m\bigr] \leq p, $$

where p=(1+2δ)/4.

Proof

Define the event E to be

$$E = \bigl[f({\mathsf{c}}_1 ) = m\bigr] \lor \bigl[f({\mathsf{c}}_1) \neq m \land f({\mathsf{c}}_2 ) = m \bigr]; $$

thus E is the event that rejection sampling is successful for m.

Here c 1,c 2 are two independent random variables distributed according to the channel distribution C h and h is determined by the history of channel usage. Recalling that Σ={σ 1,…,σ s } is the support of the channel distribution C h , let p i =Pr[C h =σ i ] denote the probability that σ i occurs. As f is chosen uniformly at random,

$$\Pr\bigl[f({\mathsf{c}}_1 ) = m\bigr] = \frac{1}{2}. $$

Then Pr[E]=1/2+Pr[A], where A is the event that f(c 1)≠mf(c 2)=m. To bound Pr[A], let D denote the event that c 1c 2. Observe that conditioned on D, A occurs with probability exactly 1/4; on the other hand, A cannot occur simultaneously with \(\overline{D}\). Thus

$$\Pr[E] = \frac{1}{2} + \Pr[A\mid D] \cdot \Pr[D] + \Pr[A \mid \overline{D}]\cdot \Pr[\overline{D}] = \frac{1}{2} + \frac{1}{4} \Pr[D]. $$

To bound Pr[D], note that

$$\Pr[\bar{D}] = \sum_i p_i^2 \leq \max_i p_i \sum_i p_i = \max_i p_i $$

and hence that Pr[D]≥1−max i p i . Considering that H (C)≥δ, we have max i p i ≤1/2δ and hence the success probability is

$$\Pr[E] \geq \frac{1}{2} +\frac{1}{4}\cdot \Bigl(1- \max_i p_i\Bigr) \geq \frac{1}{2} + \frac{1}{4} \biggl(1-\frac{1}{2^\delta} \biggr) = 1 - p, $$

from which the statement of the lemma follows. □

The above lemma is a generalization of a similar result that appeared in [5] and dealt with the special case that the min-entropy is 1 bit. The application of the rejection-sampling procedure as described above in our stegosystem implies that a message bit transmitted over the communication channel can be flipped with a certain probability p. This can be viewed as an overlayed binary symmetric channel with a cross-over probability p. The error-correcting code is introduced to recover from these cross-over errors.

Rejection Sampling in the Look-Ahead Model

In the look-ahead model, the rejection-sampling procedure above can be coupled with the channel transformation described in Sect. 2.1. In particular, transforming a channel C with min-entropy δ into a channel C (τ) with min-entropy δτ, one can carry out the rejection-sampling process above with samples drawn from Σ τ. The binary symmetric channel cross-over probability is then p=(1+2δτ)/4.

3 The Construction

In this section we outline our construction of a one-time stegosystem as an interaction between Alice (the sender) and Bob (the receiver). First, we focus on the construction in the current history model and defer the discussion for the look-ahead model to Sect. 3.2.

3.1 Our Stegosystem for the Current History Model

In the current history model, Alice and Bob wish to communicate over a channel with distribution \(\mathcal{C}\) over an alphabet Σ. We assume that \(\mathcal{C}\) has min-entropy δ, so that ∀hΣ , H (C h )≥δ. As in the statement of Lemma 4, let p=(1+2δ)/4. The construction we describe below uses two parameters: r∈(0,1−H(p)) and \(\epsilon_{\mathcal{F}}\in (0,1)\). Alice and Bob agree on the following:

An error-correcting code.:

Let E=(Enc,Dec) be an efficient (r,p,ϵ enc)-code of length n from Theorem 1. The theorem asserts that p and r determine a bound for the decoding error probability ϵ enc assuming that n is suitably large.

A pseudorandom function family.:

Let \(\mathcal{F}\) be the function family that is \((\log n + \log |{\varSigma}|, 2n, \epsilon_{\mathcal{F}})\)-independent indexed by keys of length k from Theorem 3. Recall from the theorem that \(\epsilon_{\mathcal{F}}\) together with our choices of v=logn+log|Σ| and t=2n determine the required key length k. We treat elements of \(\mathcal{F}\) as Boolean functions on {1,…,nΣ and, for such a function f we let f i :Σ→{0,1} denote the function f i (σ)=f(i,σ).

We will analyze the stegosystem below in terms of the parameters n,r,δ, \(\epsilon_{\mathcal{F}}\), relegating the discussion of how these parameters determine the overall efficiency, correctness and security of the system to Sect. 3.3.

Key generation consists of selecting an element \(f \in \mathcal{F}\). This will be facilitated by sharing a random bit string κ of length k. Alice and Bob then communicate using the algorithms SE for embedding and SD for extracting as described in Fig. 1.

Fig. 1.
figure 1

Encryption and Decryption algorithms for the one-time stegosystem in the current history model

In SE, after applying the error-correcting code E, we use \(\operatorname {rejsam}^{f_{i}}_{h}(m_{i})\) to obtain an element c i of the channel for each bit m i of the message and update the history h. The resulting stegotext c 1c n is denoted \(c_{\operatorname {stego}}\). In SD, the received stegotext is parsed block by block by evaluating the key function f i at c i ; this results in a message bit. After performing this for each received block, a message of size n is received, which is subject to decoding via Dec. Note that we sample at most twice from the channel for each bit we wish to send. The error-correcting code is needed to recover from the errors introduced by this process. The detailed correctness, security and overhead analysis for both models follow in the next sections.

3.1.1 Correctness

In this section we argue about the correctness of our one-time stegosystem in the current history model. We examine the minimum message length needed for achieving (ϵ,δ)-correctness for any choice of these parameters. We are particularly interested in the case when δ is small (perhaps even approaching 0 as a function of k) as the difficulty of parameter selection is amplified in this case (in contrast when δ is bounded away from 0, the cross-over probability p is bounded away from 1/2 and thus the parameter selection is simplified).

Theorem 5

For any ϵ,δ>0, consider the current history model stegosystem (SK,SE,SD) of Sect3.1 under the parameter constraints r=Ω(1−H(p)) and \(\epsilon_{\mathcal{F}} \leq \epsilon/2\) where p=(1+2δ)/4. Then the stegosystem is (ϵ,δ)-correct so long as the message has length

$${\varOmega} \bigl(\delta^{-2} \cdot \log \bigl({\epsilon}^{-1} \bigr)\log \log \bigl({\epsilon}^{-1}\bigr) \bigr) + 2^{O(\delta^{-4})} $$

as δ→0 while the dependency on δ vanishes when δ is bounded away from 0.

Proof

Let us first consider the case where the function f corresponding to the shared key between the two participants is a truly random function. In this case, by Lemma 4, the underlying communication channel simulates a binary symmetric channel with cross-over probability p=(1+2δ)/4. Based on this fact and Theorem 1, the probability of error in reception would be at most 2θn/logn for sufficiently large n. Specifically, it should hold that nn 0 for some n 0 that satisfies logn 0=Θ(Z −2logZ −1) where Z=1−H(p)−r. Also recall that θ=Θ(Z). Given the statement of the theorem we can postulate that Z=Ω(1−H(p)) and as a result Z −1=O((1−H(p))−1). Observe now that the choice of p=(1+2δ)/4 implies that

$$\bigl(1-H(p)\bigr)^{-1} = O\bigl(\bigl(1 - 2^{-\delta} \bigr)^{-2}\bigr) $$

in the light of Proposition 15. It follows that Z −1=O((1−2δ)−2) and thus \(n_{0} = 2^{O ((1 - 2^{-\delta})^{-4})} Z^{-1}\). From this we see that the minimum message length is of the form \(2^{O ((1 - 2^{-\delta})^{-4})}\) in order to attain an error correction bound of the form 2θn/logn. To force this latter function to be below, say, ϵ/2 we need to select n/logn=Ω(θ −1log(1/ϵ)) which implies a lower bound for n of the form Ω((1−2δ)−2⋅log(1/ϵ)loglog(1/ϵ)). The above guarantees an error of at most ϵ/2 when the function f corresponding to the shared key between the two participants is a truly random function. Now we consider the case where the selection of f is based on an ϵ-away from t-wise independent family of functions. Given that the postulated distance of our function family from truly random functions is at most ϵ/2, we see that

$$\forall m \in\{0,1\}^{\ell},\quad \Pr\bigl[\mathit {SD}\bigl(\kappa, \mathit {SE}( \kappa,m; \mathcal{O})\bigr) \neq m \mid \kappa \gets \mathit {SK}\bigl(1^k \bigr) \bigr] \leq \epsilon $$

which establishes the correctness of the stegosystem for messages of length that are suitably large as postulated since (1−2δ)−2=O(δ −2) for small values of δ≤1 while for larger δ we have (1−2δ)−2=O(1). □

3.1.2 Security

In this section we argue about the security of our one-time stegosystem in the current history model. First, we will observe that the output of the rejection-sampling function \({\operatorname {rejsam}}^{f}_{h}\), with a truly random function f, is indistinguishable from the channel distribution C h (this folklore result was implicit in previous work—we prove it formally below). We then show that if f is selected from a family that is \(\epsilon_{\mathcal{F}}\)-away from 2n-wise independent, the advantage of an adversary \(\mathcal{A}\) to distinguish between the output of the steganographic embedding protocol SE and the channel \(\mathcal{C}_{h}\) is bounded above by \(\epsilon_{\mathcal{F}}\). Let \(\mathcal{R} =\{f : {\varSigma} \to \{0,1\} \}\). We will show the following:

Theorem 6

For any ϵ,δ>0, consider the current history stegosystem (SK,SE,SD) of Sect3.1 under the parameter constraint \(\epsilon_{\mathcal{F}} \leq \epsilon\). Then the stegosystem is (ϵ,δ)-secure so long as the key has length

$$\bigl(2 + o(1)\bigr) \biggl(\frac{1}{r}\cdot \ell + \log\log |{\varSigma}| + \log \bigl({\epsilon}^{-1}\bigr) \biggr), $$

where is the message length and r is the rate of the error-correcting code employed by the stegosystem.

Anticipating the proof of the theorem we start with some preliminary results. First, we characterize the probability distribution of the rejection-sampling function:

Proposition 7

Fix some function f:Σ→{0,1} and channel history hΣ . The function \({\operatorname {rejsam}}^{f}_{h}(m)\) is a random variable with probability distribution expressed by the following function: Let cΣ and m∈{0,1}. Let \(\mathsf {miss}_{f}(m) = \Pr_{c' \gets \mathcal{C}_{h}} [f(c') \neq m]\) and \(p_{c} = \Pr_{c' \gets \mathcal{C}_{h}}[c' = c]\). Then

Proof

Let c 1 and c 2 be the two (independent) samples drawn from C h during rejection sampling. (For simplicity, we treat the process as having drawn two samples even in the case where it succeeds on the first draw.) Note, now, that in the case where f(c)≠m, the value c is the result of the rejection-sampling process precisely when f(c 1)≠m and c 2=c; as these samples are independent, this occurs with probability miss f (m)⋅p c . In the case where f(c)=m, however, we observe c whenever c 1=c or f(c 1)≠m andc 2=c. As these events are disjoint, their union occurs with probability p c ⋅(miss f (m)+1), as desired. □

Lemma 8

For any hΣ ,m∈{0,1}, the random variable \({\operatorname {rejsam}}^{f}_{h}(m)\) is perfectly indistinguishable from the channel distribution C h when f is drawn uniformly at random from the space of \(\mathcal{R}\).

Proof

Let f be a random function, as described in the statement of the lemma. Fixing the elements c, and m, we condition on the event E , that f(c)≠m. In light of Proposition 7, for any f drawn under this conditioning we shall see that \(\Pr[ {\operatorname {rejsam}}^{f}_{h}(m) = c]\) is equal to

$$\Pr_{c' \gets \mathcal{C}_h}\bigl[c' = c\bigr] \cdot \mathsf {miss}_{f}(m) = p_c \cdot \mathsf {miss}_{f}(m), $$

where we have written \(\mathsf {miss}_{f}(m) = \Pr_{c' \gets \mathcal{C}_{h}} [f(c') \neq m]\) and \(p_{c} = \Pr_{c' \gets \mathcal{C}_{h}}[c' = c]\). Conditioned on E , then, the probability of observing c is

$$\mathbf{E}_f \bigl[p_c \cdot \mathsf {miss}_{f}(m)\,{\big|}\,E_{\neq}\bigr] = p_c \biggl( p_c + \frac{1}{2}(1 - p_c) \biggr), $$

where the above follows from the fact that in the conditional space we can expand miss f (m) as

Letting E = be the event that f(c)=m, we similarly compute

$$\mathbf{E}_f \bigl[p_c \cdot \bigl(1+ \mathsf {miss}_{f}(m)\bigr)\,{\big|}\,E_{=}\bigr] = p_c \biggl(1 + \frac{1}{2}(1 - p_c) \biggr). $$

As Pr[E =]=Pr[E ]=1/2, we conclude that the probability of observing c is exactly

$$\frac{1}{2} \biggl(p_c \biggl( p_c + \frac{1 - p_c}{2} \biggr) + p_c \biggl(1 + \frac{1 - p_c}{2} \biggr) \biggr) = p_c, $$

as desired. □

Having established the behavior of the rejection-sampling function when a truly random function is used, we proceed to examine the behavior of rejection sampling in our setting where the function is drawn from a function family that is \(\epsilon_{\mathcal{F}}\)-away from 2n-wise independent. In particular we will show that the insecurity of the defined stegosystem is characterized as follows:

Proof of Theorem 6

Consider the following two games \(G_{1}^{\mathcal{A}}\) and \(G_{2}^{\mathcal{A}}\), which can be played with the adversary \(\mathcal{A}\). Here \(\lambda= | \mathit {SE}(\kappa,m; \mathcal{O})|\).

\(G_{1}^{\mathcal{A}}(1^{k})\)

 

1.

κ←{0,1}k

2.

\((m^{\ast},s)\gets \mathit {SA}^{\mathcal{O}(h)}_{1}(1^{\kappa},h)\), m ∈{0,1}

3.

\(b \stackrel{r}{\gets} \{0,1\}\)

4.

5.

b SA 2(c ,s)

6.

if b=b then success

\(G_{2}^{\mathcal{A}}(1^{k})\)

 

1.

\(f \gets \mathcal{R}\)

2.

\((m^{\ast},s)\gets \mathit {SA}^{\mathcal{O}(h)}_{1}(1^{\kappa},h)\), m ∈{0,1}

3.

\(b \stackrel{r}{\gets} \{0,1\}\)

4.

5.

b SA 2(c ,s)

6.

if b=b then success

and the theorem follows by the definition of insecurity. From Theorem 3 we find that the minimum key length required for security is (2+o(1))(/r+loglog|Σ|+log(ϵ −1)), where is the message length and r is the rate of the error-correcting code employed by the stegosystem. □

3.2 Adapting to the Look-Ahead Model

In this section we note the differences in the construction for the look-ahead model from the current history model. In this model, Alice and Bob agree to communicate over a channel with distribution \(C_{h}^{\tau}\) over an alphabet Σ τ where τ=δ −1. The min-entropy is now \(H_{\infty}(C_{h}^{ (\delta^{-1} )}) \geq 1\). The binary symmetric channel cross-over probability p is no more than 3/8. To recover from the cross-over error, they use the error-correcting code E=(Enc,Dec) which is an efficient (r,3/8,ϵ enc)-code of length n from Theorem 1. For the look-ahead model, we record the corollary below which follows directly from Theorem 5.

Corollary 9

For any ϵ,δ>0, consider the look-ahead model stegosystem (SK,SE,SD) under the parameter constraints r=Ω(1−H(p)) and \(\epsilon_{\mathcal{F}} \leq \epsilon/2\) where p=3/8. Then the stegosystem is (ϵ,δ)-correct so long as the message has length Ω(log(1/ϵ)loglog(1/ϵ)).

In the current history model, as δ→0, the rejection-sampling procedure has a high probability of failure. This is because the overlayed binary symmetric channel cross-over probability converges to a 1/2 very quickly since p=(1+2δ)/4. With p converging to 1/2, the binary symmetric channel becomes informationless. Consequently, we would have to employ error-correcting codes that can recover from very high error rates. This translates to a very high minimum message length requirement and explains the exponential dependence on δ −1. In the look-ahead model, we amplify the entropy of the channel up to 1, thereby removing the minimum message length’s exponential dependence on δ −1. In either model, if we want to transmit a message of length shorter than the minimum message length, this can be accomplished by padding the original message to attain the required length. The rejection-sampling procedure in the two models only differ in the size of their domain space. Observe that this does not affect the security analysis. The corollary recorded below follows directly from Theorem 6:

Corollary 10

For any ϵ,δ>0, consider the look-ahead stegosystem (SK,SE,SD) under the parameter constraint \(\epsilon_{\mathcal{F}} \leq \epsilon\). Then the stegosystem is (ϵ,δ)-secure so long as the key has length (2+o(1))(/r+loglog(|Σ|⋅δ −1)+log(1/ϵ)), where is the message length and r is the rate of the error-correcting code employed by the stegosystem.

3.3 Putting It All Together

The objective of this section to combine the results of the previous sections and illustrate the results for our stegosystem in the two channel models. As our system is built over two-sample rejection sampling, a process that faithfully transmits each bit with cross-over probability p=(1+2δ)/4, the target rate that we may approximate is 1−H(p). In the case of the look-ahead model, we have the cross-over probability p≤3/8 and the target rate that we may approximate is 1−H(3/8). Indeed, as described below, the system asymptotically converges to the rate of this underlying rejection-sampling channel. We remark that with sufficiently large channel entropy, there are ways for one to draw more samples during rejection sampling to reduce the error rate without compromising security, but nevertheless this would not have any (asymptotic) bearing to our overhead objective.

Theorem 11

For any ϵ,δ>0, the stegosystem (SK,SE,SD) of Sect3.1 in the current history model under the parameter constraints r=Ω(1−H(p)) and \(\epsilon_{\mathcal{F}} \leq \epsilon/2\) where p=(1+2δ)/4 is (ϵ,δ)-correct and (ϵ,δ)-secure so long as the message has length \({\varOmega}(\delta^{-2} \cdot \log (1/\epsilon)\log \log (1/\epsilon)) + 2^{O(\delta^{-2})}\) as δ→0 while the dependency on δ vanishes for δ→∞. When the size of the channel alphabet is polynomial in the length of the message m and ϵ=2−|m|, (SK,SE,SD) has overhead O((1−H(p))−1)=O(δ −2) as δ→0 while the dependency on δ vanishes for δ→∞.

The above theorem implies that for any fixed δ our stegosystem exhibits O(1) overhead, i.e., the ratio of the key length over the message length is constant.

Theorem 12

For any ϵ,δ>0, the stegosystem (SK,SE,SD) of Sect3.1 in the look-ahead model under the parameter constraints r=Ω(1−H(p)) and \(\epsilon_{\mathcal{F}} \leq \epsilon/2\) where p=3/8 is (ϵ,δ)-correct and (ϵ,δ)-secure so long as the message has length Ω(log(1/ϵ)loglog(1/ϵ)). When the size of the channel alphabet is polynomial in the length of the message m and ϵ=2−|m|, (SK,SE,SD) exhibits O(1) overhead.

4 A Provably Secure Stegosystem for Longer Messages

In this section we show how to apply the “one-time” stegosystem of Sect. 3.1 together with a pseudorandom generator so that longer messages can be transmitted.

Definition 6

Let U k denote the uniform distribution over {0,1}k. A polynomial-time deterministic algorithm G is a pseudorandom generator (PRG) if the following conditions are satisfied:

Variable output:

For all seeds x∈{0,1} and y∈ℕ, |G(x,1y)|=y.

Pseudorandomness:

For every polynomial p the set of random variables {G(U k ,1p(k))} k∈N is computationally indistinguishable from the uniform distribution \(\{U_{p(k)}\}_{k \in {\rm N}}\).

For a PRG G and 0<k<k′, if A is some statistical test, we define the advantage of A over the PRG as follows:

$$\mathbf{Adv}_{G}^A\bigl(k,k^{\prime}\bigr) = \Bigl\vert \Pr_{w \gets G(U_k, 1^{k^{\prime}})} \bigl[A(w) = 1\bigr] - \Pr_{w \gets U_{k^{\prime}}} \bigl[A(w) = 1\bigr]\Bigr\vert. $$

The insecurity of the above PRG G against all statistical tests A computable by circuits of size ≤P is then defined as

$$\mathbf{InSec}_{G}\bigl(k,k^{\prime};P\bigr) = \max_{A \in \mathcal{A}_P} \bigl\{\mbox{\bf{Adv}}_{G}^{A} \bigl(k,k^{\prime}\bigr)\bigr\} $$

where \(\mathcal{A}_{P}\) is the collection of statistical tests computable by circuits of size ≤P.

It is convenient for our application that typical PRGs have a procedure G′ such that if z=G(x,1y), we have G(x,1y+y)=G′(x,z,1y) (i.e., if one maintains z, one can extract the y′ bits that follow the first y bits without starting from the beginning).

Consider now the following stegosystem S′=(SK′,SE′,SD′) that can be used for steganographic transmission of longer messages using the one-time stegosystem S=(SK,SE,SD) defined in Sect. 3.1. S′ can handle messages of length polynomial in the security parameter k and employs a PRG G. The two players Alice and Bob, share a key of length k denoted by x. The function SE′ is given input x and the message m∈{0,1}ν to be transmitted of length ν=p(k) for some fixed polynomial p. SE′ in turn employs the PRG G to extract k′ bits (it computes κ=G(x,1k), |κ|=k′). The length k′ is selected to match the number of key bits that are required to transmit the message m using the one-time stegosystem of Sect. 3.1. Once the key κ of length k′ is produced by the PRG, the procedure SE′ invokes the one-time stegosystem on input κ,m,h. The function SD′ is defined in a straightforward way based on SD.

The computational insecurity of the stegosystem S′ is defined by adapting the definition of information-theoretic stegosystem security from Sect. 2.1 for the computationally bounded adversary as follows:

$$\mathbf{InSec}_{S^{\prime}}\bigl(k,k^{\prime};P\bigr) = \max_{\mathcal{A} \in \mathcal{A}_P} \bigl\{\mbox{\bf{Adv}}_{S'}^\mathcal{A} \bigl(k,k^{\prime}\bigr)\bigr\}, $$

this maximum taken over all adversaries \(\mathcal{A}\), where SA 1 and SA 2 have circuit size ≤P and the definition of advantage \(\mbox{\bf{Adv}}_{S'}^{\mathcal{A}}(k,k^{\prime})\) is obtained by suitably modifying the definition of \(\mbox{{\bf Adv}}_{S}^{\mathcal{A}}(k)\) in Sect. 2.1. In particular, we define a new adversarial game \(G^{\mathcal{A}}(1^{k},1^{k^{\prime}})\) which proceeds as the previous game \(G^{\mathcal{A}}(1^{k})\) in Sect. 2.1 except that in this new game \(G^{\mathcal{A}}(1^{k},1^{k^{\prime}})\), algorithms SA 1 and SA 2 receive as input the security parameter k′ and SE′ invokes SE as \(\mathit {SE}(\kappa,m ; \mathcal{O})\) where κ=G(x,1k). This matches the model of [5] which referred to such schemes as steganographically secret against chosen hiddentext attacks.

Theorem 13

The stegosystem S′=(SK′,SE′,SD′) is steganographically secret against chosen hiddentext attacks. In particular employing a PRG G to transmit a message m we get

$$\mathbf{InSec}_{S'}\bigl(k,k^{\prime};P\bigr) \leq \mathbf{InSec}_{G}\bigl(k,k^{\prime};P\bigr)+\mathbf{InSec}_{S'}\bigl(k^{\prime}\bigr) $$

where InSec S(k′) is the information-theoretic insecurity defined in Sect2.1 and |m|=(k′).

Performance Comparison of the Stegosystem S′ and the Hopper, Langford, von Ahn System

The system of Hopper et al. [5] concerns a current history model where the min-entropy of all C h is at least 1. In this case, we may select an (p,3/8,ϵ enc)-error-correcting code. Then the system of Hopper et al. correctly decodes a given message with probability at least 1−ϵ enc and makes no more than 2n calls to a pseudorandom function family. Were one to use the pseudorandom function family of Goldreich et al. [4], then this involves production of Θ(⋅log(⋅|Σ|)) pseudorandom bits, where is the message length. Of course, the security of the system depends on the security of the underlying pseudorandom generator with parameter k. On the other hand, with the same error-correcting code, the steganographic system described in this work utilizes O(+loglog|Σ|+log(1/ϵ)) pseudorandom bits, correctly decodes a given message with probability 1−ϵ, while it possesses insecurity no more than ϵ. In order to compare the two schemes, note that by selecting ϵ=2k, both the decoding error and the security of the two systems differ by at most 2k, a negligible function in terms of the security parameter k. (Note also that pseudorandom functions utilized in the above scheme have security no better than 2k with security parameter k.) In this case, the number of pseudorandom bits used by our system is

$$\bigl(2 + o(1)\bigr) \bigl(\ell + \log\log |{\varSigma}| + k \bigr) = {\varTheta} \bigl( \ell + k + \log \log |{\varSigma}|\bigr)\enskip, $$

a non-trivial improvement over the Θ(⋅log(⋅|Σ|)) bits of the scheme above. In the look-ahead model, the number of pseudorandom bits used by our system is Θ(+loglog(|Σ|⋅δ −1)+k) as we operate on the concatenated channel \(C_{h}^{ (\delta^{-1} )}\).