A One-Time Stegosystem and Applications to Efficient Covert Communication

Kiayias, Aggelos; Raekow, Yona; Russell, Alexander; Shashidhar, Narasimha

doi:10.1007/s00145-012-9135-4

A One-Time Stegosystem and Applications to Efficient Covert Communication

Published: 25 October 2012

Volume 27, pages 23–44, (2014)
Cite this article

Download PDF

Journal of Cryptology Aims and scope Submit manuscript

A One-Time Stegosystem and Applications to Efficient Covert Communication

Download PDF

Aggelos Kiayias¹,
Yona Raekow²,
Alexander Russell¹ &
…
Narasimha Shashidhar³

1165 Accesses
4 Citations
2 Altmetric
Explore all metrics

Abstract

We present the first information-theoretic steganographic protocol with an asymptotically optimal ratio of key length to message length that operates on arbitrary covertext distributions with constant min-entropy. Our results are also applicable to the computational setting: our stegosystem can be composed over a pseudorandom generator to send longer messages in a computationally secure fashion. In this respect our scheme offers a significant improvement in terms of the number of pseudorandom bits generated by the two parties in comparison to previous results known in the computational setting. Central to our approach for improving the overhead for general distributions is the use of combinatorial constructions that have been found to be useful in other contexts for derandomization: almost t-wise independent function families.

Key-Efficient Steganography

On the Gold Standard for Security of Universal Steganography

Stegogames

1 Introduction

Steganographic protocols enable one to “embed” covert messages into inconspicuous data over a public communication channel in such a way that no one, aside from the sender and the intended receiver, can even detect the presence of the secret message. The steganographic communication problem can be described using Simmons’ [13] formulation of the problem: In this scenario, prisoners Alice and Bob wish to communicate securely in the presence of an adversary, called the “Warden,” who monitors whether they exchange “conspicuous” messages. In particular, Alice and Bob may exchange messages that adhere to a certain channel distribution that represents “inconspicuous” communication. By controlling the messages that are transmitted over such a channel, Alice and Bob may exchange messages that cannot be detected by the Warden. There have been two approaches in formalizing this problem, one based on information theory [2, 8, 15] and one based on complexity theory [5]. Most steganographic constructions supported by provable security guarantees are instantiations of the following basic procedure (often referred to as “rejection sampling”).

The problem specifies a family of message distributions (the “channel distributions”) that provide a number of possible options for a so-called “covertext” to be transmitted. Additionally, the sender and the receiver possess some sort of private function indexed by the shared secret key (typically a keyed hash function, MAC, or other similar function) that maps channel messages to a single bit. In order to send a message bit m, the sender draws a covertext from the channel distribution, applies the function to the covertext and checks whether it happens to produce the “stegotext” m she originally wished to transmit. If this is the case, the covertext is transmitted. In case of failure, this procedure is repeated. While this is a fairly concrete procedure, there are a number of choices to be made with both practical and theoretical significance. From the security viewpoint, one is primarily interested in the choice of the function that is shared between the sender and the receiver. From a practical viewpoint, one is primarily interested in how the channel is implemented and whether it conforms to the various constraints that are imposed on it by the steganographic protocol specifications (e.g., are independent draws from the channel allowed? does the channel remember previous draws? etc.).

The shared key between Alice and Bob can be an expensive resource; to focus on this parameter, we define a notion of overhead equal to the ratio of the length of the secret key to the length of the message. Prior work in statistically secure steganography either gives high overhead or considers restricted covertext distributions. For instance, in the information-theoretic model, Cachin [2] demonstrated a steganographic protocol that works on restricted covertext distributions where the channel is a stationary distribution produced by a sequence of independent repetitions of the same experiment. Under this uniformity assumption, he uses sequences of covertexts to encode the message and obtains optimal overhead. In the complexity-theoretic setting, Hopper et al. [5, 6] provided a provably secure stegosystem that pairs rejection sampling with a pseudorandom function family to offer security for general (history-dependent) channel distributions with constant min-entropy. However, this protocol has a few drawbacks. First, casting their result in the information-theoretic setting, the length of the secret key shared by Alice and Bob yields an overhead polynomial in the length of the message as this is the overhead required to share a suitable random function. In the complexity-theoretic setting, from an efficiency viewpoint, their construction required about two evaluations of a pseudorandom function per bit transmission. Constructing efficient pseudorandom functions is possible either generically [4] or, more efficiently, based on specific number-theoretic assumptions [10]. Nevertheless, pseudorandom function families are a conceptually complex and fairly expensive cryptographic primitive. For example, the evaluation of the Naor–Reingold pseudorandom function on an input x requires O(|x|) modular exponentiations. Similarly, the generic construction [4] requires O(k) PRG doublings of the input string where k is the length of the key.

Our protocol remedies these shortcomings. We show how it is possible to attain constant overhead for general channel distributions with constant min-entropy. The only assumptions employed in our analysis are merely that the channel alphabet is polynomial in the length of the message m and the security required is 2^−|m|. Furthermore, our protocol in the computational setting is much more efficient: in particular, while the Hopper et al. stegosystem requires 2 evaluations per bit of a pseudorandom function, amounting to a linear (in the key-size) number of applications of the underlying PRG (in the standard construction for pseudorandom functions of [4]), in our stegosystem we require a constant number of PRG applications per bit. So the number of cryptographic operations per bit transmitted drops from linear to constant.

Central to our approach for improving the efficiency and overhead for general distributions is the use of combinatorial constructs such as almost t-wise independent function families given by Alon et al. [1]. Our protocol is based on the rejection-sampling technique outlined above in combination with an explicit almost t-wise independent family of functions. We note that such combinatorial constructions have been extremely useful for derandomization methods and here, to the best of our knowledge, are employed for the first time in the design of steganographic protocols. The present paper is an extended version based on preliminary work that appeared in [7]; the present version includes a full security analysis that works for any constant min-entropy (as opposed to min-entropy of 1 bit that was assumed in this previous work).

2 Definitions and Tools

The security of a steganography protocol is measured by the adversary’s ability to distinguish between “normal” and “covert” messages over a communication channel. To characterize normal communication we need to define and formalize the communication channel. We follow the standard terminology used in the literature [2, 5, 6, 14]: Let Σ={σ ₁,…,σ _s} denote an alphabet and treat the channel as a family of random variables $\mathcal{C} = \{C_{h}\}_{h \in {\varSigma}^{\ast}}$; each C _h is supported on Σ. These channel distributions model a history-dependent notion of channel data that captures the notion of real-life communication. Such a channel induces a natural distribution on Σ ⁿ for any n: σ ₁ is drawn from C _ϵ, and each subsequent σ _i is drawn from $C_{\sigma_{1} \ldots \sigma_{i-1}}$. (Here we let ϵ denote the empty string.) Recall that the min-entropy of a random variable X, taking values in a set V, is the quantity

$$H_\infty(X) \triangleq \min_{v \in V} \bigl(-\log \Pr[X = v] \bigr)\,. $$

We say that a channel $\mathcal{C}$ has min-entropy δ if for all h∈Σ ^∗, H _∞(C _h)≥δ.

2.1 One-time Stegosystems; The Steganographic Models

Steganography has been studied in two natural (but implicit up to now) communication models differing in Alice’s ability to sample from the channel.

Current History Model

The first model we study was that adopted by Hopper et al. [5]. In this model, Alice—and consequently the steganographic encoding protocol—has access to a channel oracle that provides samples from the channel for the current history. Alice is given no means of sampling from C _h for other histories. We call this the current history model. In this case, one can imagine that the channel is determined by a complex environment: while Alice is permitted to sample from the channel determined by the current environment, she cannot simulate potential future environments. Naturally, the communication history is updated when a symbol is transmitted on the wire from Alice to Bob. Formally, if h ₁,h ₂,…,h _ℓ∈Σ have been transmitted along the channel thus far, Alice may sample solely from $C_{h_{1} \circ \cdots \circ h_{\ell}}$ and send an element of her choice.

Look-Ahead Model

The second model we study—the look-ahead model—was adopted by von Ahn and Hopper [14]. This model is a relaxation of the “current history” model: Alice is now provided with a means for sampling “deep into the channel.” In particular, Alice and, consequently, the steganographic encoding protocol, has access to a channel oracle that can sample from the channel for any history. Formally, during the embedding process, Alice may sample from $C_{h_{1}\circ \cdots \circ h_{\ell}}$ for any future history h=h ₁∘⋯∘h _ℓ she wishes (though Alice is constrained to be efficient and so can make no more than polynomially many queries of polynomial length). This more generous model allows Alice to transform a channel C with min-entropy δ into a channel C ^(τ) with min-entropy τδ. Specifically, the channel C ^(τ) is defined over the alphabet Σ ^τ, whose elements we write as vectors h=(h ₁,…,h _τ). The distribution $C^{(\tau)}_{\mathbf{h}^{1}, \ldots, \mathbf{h}^{n}}$ is determined by the channel C with history $\overline{h} = h^{1}_{1} \circ \cdots \circ h_{\tau}^{1} \circ h^{2}_{1} \circ \cdots \circ h^{n}_{\tau}$. Below we give the definition of a one-time stegosystem that works in either of the above models and is a steganographic system that enables the one-time steganographic transmission of a message provided that the two parties share a suitable key.

Definition 1

A one-time stegosystem consists of three probabilistic polynomial-time algorithms

$$S = (\mathit {SK}, \mathit {SE}, \mathit {SD}), $$

where:

SK is the key generation algorithm; we write SK(1^k)=κ. It produces a key κ of length k.
SE is the embedding procedure and has access to the channel; $\mathit {SE}(\kappa,m; \mathcal{O}) = s\in {\varSigma}^{*}$. The embedding procedure takes into account the history h of communication that has taken place between Alice and Bob thus far and begins its operation corresponding to this history. It takes as input the key κ of length k, a message m of length ℓ=ℓ(k) and a (probabilistic) oracle $\mathcal{O}$ that allows SE to draw independent samples repeatedly from C _h in the current history model. In the look-ahead model, the oracle $\mathcal{O}$ accepts as input a (polynomial-length) history h′∈Σ ^∗ and allows SE to draw independent samples repeatedly from C _h∘h′. The output is the stegotext s∈Σ ^∗. Observe that in a one-time stegosystem, once a security parameter k is chosen, the length of the message ℓ is a fixed function of k. As described above, the access that SE has to the channel is dictated by the model of communication.
SD is the extraction procedure; SD(κ,c)=m or fail. It takes as input the key κ of length k, and some c∈Σ ^∗. The output is a message m or the token fail.

We next define a notion of correctness for a one-time stegosystem.

Definition 2

(Correctness)

A one-time stegosystem (SK,SE,SD) is said to be (ϵ,δ)-correct provided that for all channels $\mathcal{C}$ of min-entropy δ, we have ∀h∈Σ ^∗

$$\forall m \in\{0,1\}^{\ell(k)} \Pr\bigl[\mathit {SD}\bigl(\kappa, \mathit {SE}(\kappa,m; \mathcal{O})\bigr)\neq m \mid \kappa \gets \mathit {SK}\bigl(1^k\bigr) \bigr] \leq \epsilon. $$

In general, we treat both ϵ=ϵ(k) and δ=δ(k) as functions of k, the security parameter and the oracle $\mathcal{O}$ as a function of the history h.

In the following paragraphs, we talk about the security for a one-time stegosystem. One-time stegosystem security is based on the indistinguishability between a transmission that contains a steganographically embedded message and a transmission that contains no embedded messages. The adversarial game discussed next is meant to model the behavior of a warden in the Simmons’ formulation of the problem discussed earlier.

An adversary $\mathcal{A}$ against a one-time stegosystem S=(SK,SE,SD) is a pair of algorithms $\mathcal{A}=(\mathit {SA}_{1}, \mathit {SA}_{2})$, that plays the following game, denoted $G^{\mathcal{A}}(1^{k})$:

1.
A key κ is generated by SK(1^k).
2.
Algorithm SA ₁ receives as input the security parameter k and outputs a triple $(m , \textrm{aux}, h_{\sf c}) \in M_{\ell} \times \{0,1\}^{\ast}\times {\varSigma}^{*}$, where m is the challenge plaintext, $h_{\sf c}$ is the history of the channel that the adversary wishes to use for the steganographic embedding to start, and aux is some auxiliary information that will be passed to SA ₂. Note that SA ₁ is provided access to $\mathcal{C}$ via an oracle $\mathcal{O}(h)$, which takes the history h as input. $\mathcal{O}(\cdot)$, on input h, returns to SA ₁ an element c selected according to C _h. This way, the warden can learn about the channel distribution for any history.
3.
A bit b is chosen uniformly at random.
- If b=0 let $c^{\ast}\gets \mathit {SE}(\kappa,m; \mathcal{O})$ where $\mathcal{O}$ samples covertexts from $C_{h_{\mathsf{c}}}$.
- If b=1 let c ^∗=c ₁∘⋯∘c _λ where $\lambda= | \mathit {SE}(\kappa,m; \mathcal{O})|$ and $c_{i} \stackrel{r}{\gets} C_{h_{c} \circ \mathsf{c}_{1} \circ \cdots \circ c_{i-1}}$.
4.
The input for SA ₂ is 1^k, h _c, c ^∗ and aux. SA ₂ outputs a bit b′. If b′=b then we say that (SA ₁,SA ₂) succeeded and write $G^{\mathcal{A}}(1^{k}) = \mathrm{success}$.

The advantage of the adversary $\mathcal{A}$ over a stegosystem S is defined as

$$\mbox{{\bf Adv}}_S^\mathcal{A}(k) = \biggl\vert \Pr \bigl[ G^\mathcal{A}\bigl(1^k\bigr) = \mathrm{success} \bigr] - \frac{1}{2} \biggr\vert. $$

The probability includes the coin tosses of $\mathcal{A}$ and SE, as well as the coin tosses of $G^{\mathcal{A}}(1^{k})$. The (information-theoretic) insecurity of the stegosystem is defined as

$$\mbox{\bf{InSec}}_{S}(k) = \max_{\mathcal{A} }\bigl\{\mbox{ \bf{Adv}}_S^\mathcal{A}(k)\bigr\}\,, $$

this maximum taken over all (time unbounded) adversaries $\mathcal{A}$.

Definition 3

(Security)

We say that a stegosystem is (ϵ,δ)-secure if for all channels with min-entropy δ we have $\mbox{\bf{InSec}}_{S}(k) \leq \epsilon$.

As above, in general we treat both ϵ=ϵ(k) and δ=δ(k) as functions of k, the security parameter.

Overhead

The overhead of a one-time stegosystem expresses the relation of the key length k and message length ℓ variables; specifically, we adopt the ratio β=k/ℓ as a measure for overhead.

This paper is an extended version of a previous abstract that appeared in [7]. The work presented by Kiayias et al. [7] considers the scenario where the communication channel has min-entropy at least 1 in the current history model. In this paper, we present steganography protocols both in the current history and the look-ahead model. We also present explicit constructions of error-correcting codes using Forney [3] concatenation scheme. Furthermore, our protocols operate on any communication channel with min-entropy δ>0.

2.2 Error-Correcting Codes

Our steganographic construction requires an efficient family of codes that can recover from errors introduced by certain binary symmetric channels. In particular, we require a version of the Shannon coding theorem [11, 12] that yields explicit control on the various parameters of the code as the rate approaches the capacity of the channel. We present this theorem in this section.

For an element x∈{0,1}ⁿ, we let B _p(x) be the random variable equal to x⊕e, where e∈{0,1}ⁿ is a random error vector defined by independently assigning each e _i=1 with probability p. (Here x⊕e denotes the vector with the ith coordinate equal to x _i⊕e _i.) The classical coding theorem [12] asserts that for every pair of real numbers 0<R<C≤1 and n∈ℕ, there is a binary code A⊂{0,1}ⁿ, with log|A|/n≥R and θ∈ℝ, so that for each a∈A, maximum-likelihood decoding recovers a from B _p(a) with probability 1−e ^−θ⋅n, where p is determined from C as

$$H(p) = p \log p^{-1} + (1-p) \log (1-p)^{-1} = 1-C. $$

The quantity C is called the capacity of the binary symmetric channel and determines the random variable B _p; the quantity R=log|A|/n is the rate of the code A. In this language, the coding theorem asserts that at transmission rates lower than the capacity of the channel, there exist codes that correct random errors with exponentially decaying failure probability (in n, the length of the code). We formalize our requirements below.

Definition 4

An error-correcting code of rate r is a pair of functions E=(Enc,Dec), where Enc:{0,1}^r⋅n→{0,1}ⁿ is the encoding algorithm and Dec:{0,1}ⁿ→{0,1}^r⋅n the corresponding decoding algorithm. Specifically, we say that E is a (r,p,ϵ)-code if for all m∈{0,1}^r⋅n,

$$\Pr\bigl[\mathrm {Dec}\bigl(\mathrm {Enc}(m) \oplus {e}\bigr) = m\bigr] \geq 1 - \epsilon, $$

where e=(e ₁,…,e _n) and each e _i is independently distributed in {0,1} so that Pr[e _i=1]≤p. We say that E is efficient if both Enc and Dec are computable in polynomial time in n.

We record the theorem below with the proof in Appendix A.

Theorem 1

(Based on Shannon [11, 12], Forney [3])

For any p∈[1/4,1/2] and any R<1−H(p), there is an efficient (R,p,ϵ)-code for which there is θ∈[0,1],n ₀∈ℕ such that ϵ≤2^−θn/logn for any n≥n ₀. Furthermore, we have θ ⁻¹=Θ(Z ⁻¹) and logn ₀=Θ(Z ⁻²logZ ⁻¹) as Z→0 where Z=1−H(p)−R.

2.3 Function Families and Almost t-Wise Independence

We will employ the notion of (almost) t-wise independent function families (cf. [1, 9]).

Definition 5

A family $\mathcal{F}$ of Boolean functions on {0,1}^v is said to be ϵ-away from t-wise independent or (v,t,ϵ)-independent if for any t distinct domain elements q ₁,q ₂,…,q _t we have

$$ \sum_{\alpha\in \{0,1\}^t} \biggl\vert \Pr_{f}\bigl[f(q_1)f(q_2) \cdots f(q_t)=\alpha\bigr]-\frac{1}{2^t}\biggr\vert \leq \epsilon, $$

(1)

where f is chosen uniformly from $\mathcal{F}$.

The above is equivalent to the following formulation quantified over all computationally unbounded adversaries $\mathcal{A}$:

$$ \Bigl|\Pr_{f \stackrel{r}{\gets}\mathcal{F}}\bigl[ \mathcal{A}^{f[t]} \bigl(1^v\bigr) = 1\bigr] - \Pr_{f \stackrel{r}{\gets}\mathcal{R}}\bigl[ \mathcal{A}^{f[t]}\bigl(1^v\bigr)=1\bigr]\Bigr|\leq \epsilon, $$

(2)

where $\mathcal{R}$ is the collection of all functions from {0,1}^v to {0,1} and $\mathcal{A}^{f[t]}$ is an unbounded adversary that is allowed to determine up to t queries to the function f before it outputs its bit. The equivalence is formally stated (without proof) as follows.

Lemma 2

$\mathcal{F}_{\kappa}$ is ϵ′-away from t-wise independence according to (1) if and only if $\mathcal{F}_{\kappa}$ is ϵ′-away from t-wise independence according to (2) above.

We employ the construction of almost t-wise independent sample spaces given by Naor and Naor [9], and Alon et al. [1]. The following theorem is a restatement of Theorem 3 from Alon et al. [1].

Theorem 3

([1, 9])

There exist families of Boolean functions $\mathcal{F}^{v}_{t,\epsilon}$ on {0,1}^v that are ϵ-away from t-wise independent, are indexed by keys of length (2+o(1))(logv+t/2+log(ϵ ⁻¹)), and are computable in polynomial time.

2.4 Rejection Sampling

A common method used in steganography employing a channel distribution is that of rejection sampling, described below (cf. [2, 5]).

Rejection Sampling in the Current History Model

In the current history model, assuming that one wishes to transmit a single bit m and employs a random function f:{0,1}^d×Σ→{0,1} that is secret from the adversary, one performs the following “rejection-sampling” process:

Here, Σ denotes the output alphabet of the channel, h denotes the history of the channel at the start of the process, and C _h denotes the distribution on Σ given by the channel with history h. The receiver (also privy to the function f) applies the function to the received message c∈Σ and recovers m with probability greater than 1/2. The sender and the receiver may employ a joint state (e.g., a counter), that need not be secret from the adversary. Note that the above process performs only two draws from the channel with the same history (more draws could, in principle, be performed, but we justify our choice of two draws in Lemma 8 of Sect. 3.1.2). These draws are assumed to be independent. One basic property of rejection sampling that we will prove and is helpful for our construction is the following:

Lemma 4

If f is drawn uniformly at random from the collection of all functions $\mathcal{R} = \{ f : {\varSigma} \to \{0,1\}\;\}$ and $\mathcal{C}$ has min-entropy δ, then

$$\Pr_{f \gets \mathcal{R}}\bigl[f\bigl({\operatorname {rejsam}}_h^{f}(m)\bigr) \neq m\bigr] \leq p, $$

where p=(1+2^−δ)/4.

Proof

Define the event E to be

$$E = \bigl[f({\mathsf{c}}_1 ) = m\bigr] \lor \bigl[f({\mathsf{c}}_1) \neq m \land f({\mathsf{c}}_2 ) = m \bigr]; $$

thus E is the event that rejection sampling is successful for m.

Here c ₁,c ₂ are two independent random variables distributed according to the channel distribution C _h and h is determined by the history of channel usage. Recalling that Σ={σ ₁,…,σ _s} is the support of the channel distribution C _h, let p _i=Pr[C _h=σ _i] denote the probability that σ _i occurs. As f is chosen uniformly at random,

$$\Pr\bigl[f({\mathsf{c}}_1 ) = m\bigr] = \frac{1}{2}. $$

Then Pr[E]=1/2+Pr[A], where A is the event that f(c ₁)≠m∧f(c ₂)=m. To bound Pr[A], let D denote the event that c ₁≠c ₂. Observe that conditioned on D, A occurs with probability exactly 1/4; on the other hand, A cannot occur simultaneously with $\overline{D}$. Thus

$$\Pr[E] = \frac{1}{2} + \Pr[A\mid D] \cdot \Pr[D] + \Pr[A \mid \overline{D}]\cdot \Pr[\overline{D}] = \frac{1}{2} + \frac{1}{4} \Pr[D]. $$

To bound Pr[D], note that

$$\Pr[\bar{D}] = \sum_i p_i^2 \leq \max_i p_i \sum_i p_i = \max_i p_i $$

and hence that Pr[D]≥1−max_i p _i. Considering that H _∞(C)≥δ, we have max_i p _i≤1/2^δ and hence the success probability is

$$\Pr[E] \geq \frac{1}{2} +\frac{1}{4}\cdot \Bigl(1- \max_i p_i\Bigr) \geq \frac{1}{2} + \frac{1}{4} \biggl(1-\frac{1}{2^\delta} \biggr) = 1 - p, $$

from which the statement of the lemma follows. □

The above lemma is a generalization of a similar result that appeared in [5] and dealt with the special case that the min-entropy is 1 bit. The application of the rejection-sampling procedure as described above in our stegosystem implies that a message bit transmitted over the communication channel can be flipped with a certain probability p. This can be viewed as an overlayed binary symmetric channel with a cross-over probability p. The error-correcting code is introduced to recover from these cross-over errors.

Rejection Sampling in the Look-Ahead Model

In the look-ahead model, the rejection-sampling procedure above can be coupled with the channel transformation described in Sect. 2.1. In particular, transforming a channel C with min-entropy δ into a channel C ^(τ) with min-entropy δτ, one can carry out the rejection-sampling process above with samples drawn from Σ ^τ. The binary symmetric channel cross-over probability is then p=(1+2^−δτ)/4.

3 The Construction

In this section we outline our construction of a one-time stegosystem as an interaction between Alice (the sender) and Bob (the receiver). First, we focus on the construction in the current history model and defer the discussion for the look-ahead model to Sect. 3.2.

3.1 Our Stegosystem for the Current History Model

In the current history model, Alice and Bob wish to communicate over a channel with distribution $\mathcal{C}$ over an alphabet Σ. We assume that $\mathcal{C}$ has min-entropy δ, so that ∀h∈Σ ^∗, H _∞(C _h)≥δ. As in the statement of Lemma 4, let p=(1+2^−δ)/4. The construction we describe below uses two parameters: r∈(0,1−H(p)) and $\epsilon_{\mathcal{F}}\in (0,1)$. Alice and Bob agree on the following:

An error-correcting code.:: Let E=(Enc,Dec) be an efficient (r,p,ϵ _enc)-code of length n from Theorem 1. The theorem asserts that p and r determine a bound for the decoding error probability ϵ _enc assuming that n is suitably large.
A pseudorandom function family.:: Let $\mathcal{F}$ be the function family that is $(\log n + \log |{\varSigma}|, 2n, \epsilon_{\mathcal{F}})$-independent indexed by keys of length k from Theorem 3. Recall from the theorem that $\epsilon_{\mathcal{F}}$ together with our choices of v=logn+log|Σ| and t=2n determine the required key length k. We treat elements of $\mathcal{F}$ as Boolean functions on {1,…,n}×Σ and, for such a function f we let f _i:Σ→{0,1} denote the function f _i(σ)=f(i,σ).

We will analyze the stegosystem below in terms of the parameters n,r,δ, $\epsilon_{\mathcal{F}}$, relegating the discussion of how these parameters determine the overall efficiency, correctness and security of the system to Sect. 3.3.

Key generation consists of selecting an element $f \in \mathcal{F}$. This will be facilitated by sharing a random bit string κ of length k. Alice and Bob then communicate using the algorithms SE for embedding and SD for extracting as described in Fig. 1.

In SE, after applying the error-correcting code E, we use $\operatorname {rejsam}^{f_{i}}_{h}(m_{i})$ to obtain an element c _i of the channel for each bit m _i of the message and update the history h. The resulting stegotext c ₁…c _n is denoted $c_{\operatorname {stego}}$. In SD, the received stegotext is parsed block by block by evaluating the key function f _i at c _i; this results in a message bit. After performing this for each received block, a message of size n is received, which is subject to decoding via Dec. Note that we sample at most twice from the channel for each bit we wish to send. The error-correcting code is needed to recover from the errors introduced by this process. The detailed correctness, security and overhead analysis for both models follow in the next sections.

3.1.1 Correctness

In this section we argue about the correctness of our one-time stegosystem in the current history model. We examine the minimum message length needed for achieving (ϵ,δ)-correctness for any choice of these parameters. We are particularly interested in the case when δ is small (perhaps even approaching 0 as a function of k) as the difficulty of parameter selection is amplified in this case (in contrast when δ is bounded away from 0, the cross-over probability p is bounded away from 1/2 and thus the parameter selection is simplified).

Theorem 5

For any ϵ,δ>0, consider the current history model stegosystem (SK,SE,SD) of Sect. 3.1 under the parameter constraints r=Ω(1−H(p)) and $\epsilon_{\mathcal{F}} \leq \epsilon/2$ where p=(1+2^−δ)/4. Then the stegosystem is (ϵ,δ)-correct so long as the message has length

$${\varOmega} \bigl(\delta^{-2} \cdot \log \bigl({\epsilon}^{-1} \bigr)\log \log \bigl({\epsilon}^{-1}\bigr) \bigr) + 2^{O(\delta^{-4})} $$

as δ→0 while the dependency on δ vanishes when δ is bounded away from 0.

Proof

Let us first consider the case where the function f corresponding to the shared key between the two participants is a truly random function. In this case, by Lemma 4, the underlying communication channel simulates a binary symmetric channel with cross-over probability p=(1+2^−δ)/4. Based on this fact and Theorem 1, the probability of error in reception would be at most 2^−θn/logn for sufficiently large n. Specifically, it should hold that n≥n ₀ for some n ₀ that satisfies logn ₀=Θ(Z ⁻²logZ ⁻¹) where Z=1−H(p)−r. Also recall that θ=Θ(Z). Given the statement of the theorem we can postulate that Z=Ω(1−H(p)) and as a result Z ⁻¹=O((1−H(p))⁻¹). Observe now that the choice of p=(1+2^−δ)/4 implies that

$$\bigl(1-H(p)\bigr)^{-1} = O\bigl(\bigl(1 - 2^{-\delta} \bigr)^{-2}\bigr) $$

in the light of Proposition 15. It follows that Z ⁻¹=O((1−2^−δ)⁻²) and thus $n_{0} = 2^{O ((1 - 2^{-\delta})^{-4})} Z^{-1}$. From this we see that the minimum message length is of the form $2^{O ((1 - 2^{-\delta})^{-4})}$ in order to attain an error correction bound of the form 2^−θn/logn. To force this latter function to be below, say, ϵ/2 we need to select n/logn=Ω(θ ⁻¹log(1/ϵ)) which implies a lower bound for n of the form Ω((1−2^−δ)⁻²⋅log(1/ϵ)loglog(1/ϵ)). The above guarantees an error of at most ϵ/2 when the function f corresponding to the shared key between the two participants is a truly random function. Now we consider the case where the selection of f is based on an ϵ-away from t-wise independent family of functions. Given that the postulated distance of our function family from truly random functions is at most ϵ/2, we see that

$$\forall m \in\{0,1\}^{\ell},\quad \Pr\bigl[\mathit {SD}\bigl(\kappa, \mathit {SE}( \kappa,m; \mathcal{O})\bigr) \neq m \mid \kappa \gets \mathit {SK}\bigl(1^k \bigr) \bigr] \leq \epsilon $$

which establishes the correctness of the stegosystem for messages of length ℓ that are suitably large as postulated since (1−2^−δ)⁻²=O(δ ⁻²) for small values of δ≤1 while for larger δ we have (1−2^−δ)⁻²=O(1). □

3.1.2 Security

In this section we argue about the security of our one-time stegosystem in the current history model. First, we will observe that the output of the rejection-sampling function ${\operatorname {rejsam}}^{f}_{h}$, with a truly random function f, is indistinguishable from the channel distribution C _h (this folklore result was implicit in previous work—we prove it formally below). We then show that if f is selected from a family that is $\epsilon_{\mathcal{F}}$-away from 2n-wise independent, the advantage of an adversary $\mathcal{A}$ to distinguish between the output of the steganographic embedding protocol SE and the channel $\mathcal{C}_{h}$ is bounded above by $\epsilon_{\mathcal{F}}$. Let $\mathcal{R} =\{f : {\varSigma} \to \{0,1\} \}$. We will show the following:

Theorem 6

For any ϵ,δ>0, consider the current history stegosystem (SK,SE,SD) of Sect. 3.1 under the parameter constraint $\epsilon_{\mathcal{F}} \leq \epsilon$. Then the stegosystem is (ϵ,δ)-secure so long as the key has length

$$\bigl(2 + o(1)\bigr) \biggl(\frac{1}{r}\cdot \ell + \log\log |{\varSigma}| + \log \bigl({\epsilon}^{-1}\bigr) \biggr), $$

where ℓ is the message length and r is the rate of the error-correcting code employed by the stegosystem.

Anticipating the proof of the theorem we start with some preliminary results. First, we characterize the probability distribution of the rejection-sampling function:

Proposition 7

Fix some function f:Σ→{0,1} and channel history h∈Σ ^∗. The function ${\operatorname {rejsam}}^{f}_{h}(m)$ is a random variable with probability distribution expressed by the following function: Let c∈Σ and m∈{0,1}. Let $\mathsf {miss}_{f}(m) = \Pr_{c' \gets \mathcal{C}_{h}} [f(c') \neq m]$ and $p_{c} = \Pr_{c' \gets \mathcal{C}_{h}}[c' = c]$. Then

Proof

Let c ₁ and c ₂ be the two (independent) samples drawn from C _h during rejection sampling. (For simplicity, we treat the process as having drawn two samples even in the case where it succeeds on the first draw.) Note, now, that in the case where f(c)≠m, the value c is the result of the rejection-sampling process precisely when f(c ₁)≠m and c ₂=c; as these samples are independent, this occurs with probability miss _f(m)⋅p _c. In the case where f(c)=m, however, we observe c whenever c ₁=c or f(c ₁)≠m andc ₂=c. As these events are disjoint, their union occurs with probability p _c⋅(miss _f(m)+1), as desired. □

Lemma 8

For any h∈Σ ^∗,m∈{0,1}, the random variable ${\operatorname {rejsam}}^{f}_{h}(m)$ is perfectly indistinguishable from the channel distribution C _h when f is drawn uniformly at random from the space of $\mathcal{R}$.

Proof

Let f be a random function, as described in the statement of the lemma. Fixing the elements c, and m, we condition on the event E _≠, that f(c)≠m. In light of Proposition 7, for any f drawn under this conditioning we shall see that $\Pr[ {\operatorname {rejsam}}^{f}_{h}(m) = c]$ is equal to

$$\Pr_{c' \gets \mathcal{C}_h}\bigl[c' = c\bigr] \cdot \mathsf {miss}_{f}(m) = p_c \cdot \mathsf {miss}_{f}(m), $$

where we have written $\mathsf {miss}_{f}(m) = \Pr_{c' \gets \mathcal{C}_{h}} [f(c') \neq m]$ and $p_{c} = \Pr_{c' \gets \mathcal{C}_{h}}[c' = c]$. Conditioned on E _≠, then, the probability of observing c is

$$\mathbf{E}_f \bigl[p_c \cdot \mathsf {miss}_{f}(m)\,{\big|}\,E_{\neq}\bigr] = p_c \biggl( p_c + \frac{1}{2}(1 - p_c) \biggr), $$

where the above follows from the fact that in the conditional space we can expand miss _f(m) as

Letting E ₌ be the event that f(c)=m, we similarly compute

$$\mathbf{E}_f \bigl[p_c \cdot \bigl(1+ \mathsf {miss}_{f}(m)\bigr)\,{\big|}\,E_{=}\bigr] = p_c \biggl(1 + \frac{1}{2}(1 - p_c) \biggr). $$

As Pr[E ₌]=Pr[E _≠]=1/2, we conclude that the probability of observing c is exactly

$$\frac{1}{2} \biggl(p_c \biggl( p_c + \frac{1 - p_c}{2} \biggr) + p_c \biggl(1 + \frac{1 - p_c}{2} \biggr) \biggr) = p_c, $$

as desired. □

Having established the behavior of the rejection-sampling function when a truly random function is used, we proceed to examine the behavior of rejection sampling in our setting where the function is drawn from a function family that is $\epsilon_{\mathcal{F}}$-away from 2n-wise independent. In particular we will show that the insecurity of the defined stegosystem is characterized as follows:

Proof of Theorem 6

Consider the following two games $G_{1}^{\mathcal{A}}$ and $G_{2}^{\mathcal{A}}$, which can be played with the adversary $\mathcal{A}$. Here $\lambda= | \mathit {SE}(\kappa,m; \mathcal{O})|$.

$G_{1}^{\mathcal{A}}(1^{k})$
1.	κ←{0,1}^k
2.	$(m^{\ast},s)\gets \mathit {SA}^{\mathcal{O}(h)}_{1}(1^{\kappa},h)$, m ^∗∈{0,1}^ℓ
3.	$b \stackrel{r}{\gets} \{0,1\}$
4.
5.	b ^∗←SA ₂(c ^∗,s)
6.	if b=b ^∗ then success

$G_{2}^{\mathcal{A}}(1^{k})$
1.	$f \gets \mathcal{R}$
2.	$(m^{\ast},s)\gets \mathit {SA}^{\mathcal{O}(h)}_{1}(1^{\kappa},h)$, m ^∗∈{0,1}^ℓ
3.	$b \stackrel{r}{\gets} \{0,1\}$
4.
5.	b ^∗←SA ₂(c ^∗,s)
6.	if b=b ^∗ then success

and the theorem follows by the definition of insecurity. From Theorem 3 we find that the minimum key length required for security is (2+o(1))(ℓ/r+loglog|Σ|+log(ϵ ⁻¹)), where ℓ is the message length and r is the rate of the error-correcting code employed by the stegosystem. □

3.2 Adapting to the Look-Ahead Model

In this section we note the differences in the construction for the look-ahead model from the current history model. In this model, Alice and Bob agree to communicate over a channel with distribution $C_{h}^{\tau}$ over an alphabet Σ ^τ where τ=δ ⁻¹. The min-entropy is now $H_{\infty}(C_{h}^{ (\delta^{-1} )}) \geq 1$. The binary symmetric channel cross-over probability p is no more than 3/8. To recover from the cross-over error, they use the error-correcting code E=(Enc,Dec) which is an efficient (r,3/8,ϵ _enc)-code of length n from Theorem 1. For the look-ahead model, we record the corollary below which follows directly from Theorem 5.

Corollary 9

For any ϵ,δ>0, consider the look-ahead model stegosystem (SK,SE,SD) under the parameter constraints r=Ω(1−H(p)) and $\epsilon_{\mathcal{F}} \leq \epsilon/2$ where p=3/8. Then the stegosystem is (ϵ,δ)-correct so long as the message has length Ω(log(1/ϵ)loglog(1/ϵ)).

In the current history model, as δ→0, the rejection-sampling procedure has a high probability of failure. This is because the overlayed binary symmetric channel cross-over probability converges to a 1/2 very quickly since p=(1+2^−δ)/4. With p converging to 1/2, the binary symmetric channel becomes informationless. Consequently, we would have to employ error-correcting codes that can recover from very high error rates. This translates to a very high minimum message length requirement and explains the exponential dependence on δ ⁻¹. In the look-ahead model, we amplify the entropy of the channel up to 1, thereby removing the minimum message length’s exponential dependence on δ ⁻¹. In either model, if we want to transmit a message of length shorter than the minimum message length, this can be accomplished by padding the original message to attain the required length. The rejection-sampling procedure in the two models only differ in the size of their domain space. Observe that this does not affect the security analysis. The corollary recorded below follows directly from Theorem 6:

Corollary 10

For any ϵ,δ>0, consider the look-ahead stegosystem (SK,SE,SD) under the parameter constraint $\epsilon_{\mathcal{F}} \leq \epsilon$. Then the stegosystem is (ϵ,δ)-secure so long as the key has length (2+o(1))(ℓ/r+loglog(|Σ|⋅δ ⁻¹)+log(1/ϵ)), where ℓ is the message length and r is the rate of the error-correcting code employed by the stegosystem.

3.3 Putting It All Together

The objective of this section to combine the results of the previous sections and illustrate the results for our stegosystem in the two channel models. As our system is built over two-sample rejection sampling, a process that faithfully transmits each bit with cross-over probability p=(1+2^−δ)/4, the target rate that we may approximate is 1−H(p). In the case of the look-ahead model, we have the cross-over probability p≤3/8 and the target rate that we may approximate is 1−H(3/8). Indeed, as described below, the system asymptotically converges to the rate of this underlying rejection-sampling channel. We remark that with sufficiently large channel entropy, there are ways for one to draw more samples during rejection sampling to reduce the error rate without compromising security, but nevertheless this would not have any (asymptotic) bearing to our overhead objective.

Theorem 11

For any ϵ,δ>0, the stegosystem (SK,SE,SD) of Sect. 3.1 in the current history model under the parameter constraints r=Ω(1−H(p)) and $\epsilon_{\mathcal{F}} \leq \epsilon/2$ where p=(1+2^−δ)/4 is (ϵ,δ)-correct and (ϵ,δ)-secure so long as the message has length ${\varOmega}(\delta^{-2} \cdot \log (1/\epsilon)\log \log (1/\epsilon)) + 2^{O(\delta^{-2})}$ as δ→0 while the dependency on δ vanishes for δ→∞. When the size of the channel alphabet is polynomial in the length of the message m and ϵ=2^−|m|, (SK,SE,SD) has overhead O((1−H(p))⁻¹)=O(δ ⁻²) as δ→0 while the dependency on δ vanishes for δ→∞.

The above theorem implies that for any fixed δ our stegosystem exhibits O(1) overhead, i.e., the ratio of the key length over the message length is constant.

Theorem 12

For any ϵ,δ>0, the stegosystem (SK,SE,SD) of Sect. 3.1 in the look-ahead model under the parameter constraints r=Ω(1−H(p)) and $\epsilon_{\mathcal{F}} \leq \epsilon/2$ where p=3/8 is (ϵ,δ)-correct and (ϵ,δ)-secure so long as the message has length Ω(log(1/ϵ)loglog(1/ϵ)). When the size of the channel alphabet is polynomial in the length of the message m and ϵ=2^−|m|, (SK,SE,SD) exhibits O(1) overhead.

4 A Provably Secure Stegosystem for Longer Messages

In this section we show how to apply the “one-time” stegosystem of Sect. 3.1 together with a pseudorandom generator so that longer messages can be transmitted.

Definition 6

Let U _k denote the uniform distribution over {0,1}^k. A polynomial-time deterministic algorithm G is a pseudorandom generator (PRG) if the following conditions are satisfied:

Variable output:: For all seeds x∈{0,1}^∗ and y∈ℕ, |G(x,1^y)|=y.
Pseudorandomness:: For every polynomial p the set of random variables {G(U _k,1^p(k))}_k∈N is computationally indistinguishable from the uniform distribution $\{U_{p(k)}\}_{k \in {\rm N}}$.

For a PRG G and 0<k<k′, if A is some statistical test, we define the advantage of A over the PRG as follows:

$$\mathbf{Adv}_{G}^A\bigl(k,k^{\prime}\bigr) = \Bigl\vert \Pr_{w \gets G(U_k, 1^{k^{\prime}})} \bigl[A(w) = 1\bigr] - \Pr_{w \gets U_{k^{\prime}}} \bigl[A(w) = 1\bigr]\Bigr\vert. $$

The insecurity of the above PRG G against all statistical tests A computable by circuits of size ≤P is then defined as

$$\mathbf{InSec}_{G}\bigl(k,k^{\prime};P\bigr) = \max_{A \in \mathcal{A}_P} \bigl\{\mbox{\bf{Adv}}_{G}^{A} \bigl(k,k^{\prime}\bigr)\bigr\} $$

where $\mathcal{A}_{P}$ is the collection of statistical tests computable by circuits of size ≤P.

It is convenient for our application that typical PRGs have a procedure G′ such that if z=G(x,1^y), we have G(x,1^y+y′)=G′(x,z,1^y′) (i.e., if one maintains z, one can extract the y′ bits that follow the first y bits without starting from the beginning).

Consider now the following stegosystem S′=(SK′,SE′,SD′) that can be used for steganographic transmission of longer messages using the one-time stegosystem S=(SK,SE,SD) defined in Sect. 3.1. S′ can handle messages of length polynomial in the security parameter k and employs a PRG G. The two players Alice and Bob, share a key of length k denoted by x. The function SE′ is given input x and the message m∈{0,1}^ν to be transmitted of length ν=p(k) for some fixed polynomial p. SE′ in turn employs the PRG G to extract k′ bits (it computes κ=G(x,1^k′), |κ|=k′). The length k′ is selected to match the number of key bits that are required to transmit the message m using the one-time stegosystem of Sect. 3.1. Once the key κ of length k′ is produced by the PRG, the procedure SE′ invokes the one-time stegosystem on input κ,m,h. The function SD′ is defined in a straightforward way based on SD.

The computational insecurity of the stegosystem S′ is defined by adapting the definition of information-theoretic stegosystem security from Sect. 2.1 for the computationally bounded adversary as follows:

$$\mathbf{InSec}_{S^{\prime}}\bigl(k,k^{\prime};P\bigr) = \max_{\mathcal{A} \in \mathcal{A}_P} \bigl\{\mbox{\bf{Adv}}_{S'}^\mathcal{A} \bigl(k,k^{\prime}\bigr)\bigr\}, $$

this maximum taken over all adversaries $\mathcal{A}$, where SA ₁ and SA ₂ have circuit size ≤P and the definition of advantage $\mbox{\bf{Adv}}_{S'}^{\mathcal{A}}(k,k^{\prime})$ is obtained by suitably modifying the definition of $\mbox{{\bf Adv}}_{S}^{\mathcal{A}}(k)$ in Sect. 2.1. In particular, we define a new adversarial game $G^{\mathcal{A}}(1^{k},1^{k^{\prime}})$ which proceeds as the previous game $G^{\mathcal{A}}(1^{k})$ in Sect. 2.1 except that in this new game $G^{\mathcal{A}}(1^{k},1^{k^{\prime}})$, algorithms SA ₁ and SA ₂ receive as input the security parameter k′ and SE′ invokes SE as $\mathit {SE}(\kappa,m ; \mathcal{O})$ where κ=G(x,1^k′). This matches the model of [5] which referred to such schemes as steganographically secret against chosen hiddentext attacks.

Theorem 13

The stegosystem S′=(SK′,SE′,SD′) is steganographically secret against chosen hiddentext attacks. In particular employing a PRG G to transmit a message m we get

$$\mathbf{InSec}_{S'}\bigl(k,k^{\prime};P\bigr) \leq \mathbf{InSec}_{G}\bigl(k,k^{\prime};P\bigr)+\mathbf{InSec}_{S'}\bigl(k^{\prime}\bigr) $$

where InSec _S′(k′) is the information-theoretic insecurity defined in Sect. 2.1 and |m|=ℓ(k′).

Performance Comparison of the Stegosystem S′ and the Hopper, Langford, von Ahn System

The system of Hopper et al. [5] concerns a current history model where the min-entropy of all C _h is at least 1. In this case, we may select an (p,3/8,ϵ _enc)-error-correcting code. Then the system of Hopper et al. correctly decodes a given message with probability at least 1−ϵ _enc and makes no more than 2n calls to a pseudorandom function family. Were one to use the pseudorandom function family of Goldreich et al. [4], then this involves production of Θ(ℓ⋅log(ℓ⋅|Σ|)) pseudorandom bits, where ℓ is the message length. Of course, the security of the system depends on the security of the underlying pseudorandom generator with parameter k. On the other hand, with the same error-correcting code, the steganographic system described in this work utilizes O(ℓ+loglog|Σ|+log(1/ϵ)) pseudorandom bits, correctly decodes a given message with probability 1−ϵ, while it possesses insecurity no more than ϵ. In order to compare the two schemes, note that by selecting ϵ=2^−k, both the decoding error and the security of the two systems differ by at most 2^−k, a negligible function in terms of the security parameter k. (Note also that pseudorandom functions utilized in the above scheme have security no better than 2^−k with security parameter k.) In this case, the number of pseudorandom bits used by our system is

$$\bigl(2 + o(1)\bigr) \bigl(\ell + \log\log |{\varSigma}| + k \bigr) = {\varTheta} \bigl( \ell + k + \log \log |{\varSigma}|\bigr)\enskip, $$

a non-trivial improvement over the Θ(ℓ⋅log(ℓ⋅|Σ|)) bits of the scheme above. In the look-ahead model, the number of pseudorandom bits used by our system is Θ(ℓ+loglog(|Σ|⋅δ ⁻¹)+k) as we operate on the concatenated channel $C_{h}^{ (\delta^{-1} )}$.

Notes

Consider e ₁,…,e _N random variables taking values in {0,1} with Pr[e _i=1]=p for i=1,…,N. We have
$$\Pr \biggl[\sum_{i=1}^{N}e_i > (1+\zeta)\mu \biggr] < e^{-\mu\zeta^2/3} $$
for any 0<ζ<1 where $\mu = \sum_{i=1}^{N}\Pr[e_{i}=1]$.

References

N. Alon, O. Goldreich, J. Håstad, R. Peralta, Simple construction of almost k-wise independent random variables. Random Struct. Algorithms 3(3), 289–304 (1992)
Article MATH Google Scholar
C. Cachin, An information-theoretic model for steganography. Inf. Comput. 192(1), 41–56 (2004)
Article MATH MathSciNet Google Scholar
G.D. Forney, Jr., Concatenated Codes. Research Monograph, vol. 37 (MIT Press, Cambridge 1966)
Google Scholar
O. Goldreich, S. Goldwasser, S. Micali, How to construct random functions. J. ACM 33(4), 792–807 (1986)
Article MathSciNet Google Scholar
N.J. Hopper, J. Langford, L. von Ahn, Provably secure steganography, in CRYPTO (2002), pp. 77–92
Google Scholar
N.J. Hopper, L. von Ahn, J. Langford, Provably secure steganography. IEEE Trans. Comput. 58(5), 662–676 (2009)
Article MathSciNet Google Scholar
A. Kiayias, Y. Raekow, A. Russell, Efficient steganography with provable security guarantees, in Information Hiding, ed. by M. Barni, J. Herrera-Joancomartí, S. Katzenbeisser, F. Pérez-González. Lecture Notes in Computer Science, vol. 3727 (Springer, Berlin, 2005), pp. 118–130. ISBN 3-540-29039-7
Chapter Google Scholar
T. Mittelholzer, An information-theoretic approach to steganography and watermarking, in Information Hiding (1999), pp. 1–16
Google Scholar
J. Naor, M. Naor, Small-bias probability spaces: Efficient constructions and applications. SIAM J. Comput. 22(4), 838–856 (1993)
Article MATH MathSciNet Google Scholar
M. Naor, O. Reingold, Number-theoretic constructions of efficient pseudo-random functions. J. ACM 51(2), 231–262 (2004)
Article MATH MathSciNet Google Scholar
C.E. Shannon, A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423, July and October (1948), also see pp. 623–656
Article MATH MathSciNet Google Scholar
C.E. Shannon, W. Weaver, The Mathematical Theory of Communication (University of Illinois Press, Urbana, 1949)
MATH Google Scholar
G.J. Simmons, The prisoners’ problem and the subliminal channel, in CRYPTO (1983), pp. 51–67
Google Scholar
L. von Ahn, N.J. Hopper, Public-key steganography, in Advances in Cryptology—Proceedings of Eurocrypt ’04 (Springer, Berlin, 2004), pp. 323–341
Chapter Google Scholar
J. Zöllner, H. Federrath, H. Klimant, A. Pfitzmann, R. Piotraschke, A. Westfeld, G. Wicke, G. Wolf, Modeling the security of steganographic systems, in Information Hiding (1998), pp. 344–354
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
Aggelos Kiayias & Alexander Russell
Fraunhofer Institute for Algorithms and Scientific Computing, St. Augustin, Germany
Yona Raekow
Department of Computer Science, Sam Houston State University, Huntsville, TX, USA
Narasimha Shashidhar

Authors

Aggelos Kiayias
View author publications
You can also search for this author in PubMed Google Scholar
Yona Raekow
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Russell
View author publications
You can also search for this author in PubMed Google Scholar
Narasimha Shashidhar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Narasimha Shashidhar.

Additional information

Communicated by Stefan Wolf

A. Kiayias is supported by NSF CAREER grant CCR-0447808 and NSF grants CNS-0831304, CNS-0831306.

This work was done while Y. Raekow was in the Department of Computer Science and Engineering, University of Connecticut.

A. Russell is supported by NSF CAREER grant CCR-0093065, and NSF grants CCR-0121277, CCR-0220264, CCR-0311368, and EIA-0218443.

This work was done while N. Shashidhar was in the Department of Computer Science and Engineering, University of Connecticut.

Appendices

Appendix A. Error-Correcting Codes

In this section, we provide the proof for Theorem 1 from Sect. 2.2.

Theorem 14

(Based on Shannon [11, 12], Forney [3])

For any p∈[1/4,1/2] and any R<1−H(p), there is an efficient (R,p,ϵ)-code for which there is θ∈[0,1],n ₀∈ℕ such that ϵ≤2^−θn/logn for any n≥n ₀. Furthermore, we have θ ⁻¹=Θ(Z ⁻¹) and logn ₀=Θ(Z ⁻²logZ ⁻¹) as Z→0 where Z=1−H(p)−R.

Proof

We provide the details below of the classic construction for such error-correcting codes E based on concatenated codes. In the standard notation used for codes, q stands for the alphabet size, n for the block length, k for the message length (where |C|=q ^k) and d for the minimum distance of the code. A code with the above parameters is called an (n,k,d)_q code and an [n,k,d]_q code if it is linear. Thus, k/n is the rate of the code and d/n is the relative distance.

Here, we take advantage of the concatenation C ₁⋅C ₂ of two codes C ₁ and C ₂, called the “outer” and “inner” code, respectively. The procedure is as follows:

The alphabet of the outer code is in one-to-one correspondence with the codewords of the inner code. Given a message in Q ^K, first apply C ₁ onto this message to get a string s∈Q ^N. Then, on every symbol of s, viewed as a message in q ^k, apply C ₂ and concatenate the results. The resulting codeword is C ₂(s ₁)⋅C ₂(s ₂)⋯C ₂(s _N) and the resulting code has Q ^K=(q ^k)^K=q ^kK messages. The length of the codewords is nN and the distance of concatenation is at least dD. Hence,

$$C_1\cdot C_2 = (N,K,D)_{q^{k}}\cdot (n,k,d )_q \Rightarrow (nN,kK,dD )_q. $$

This operation was due to Forney [3]. We now show how we implement Forney’s code for constructing an asymptotically good code.

The inner code. In the inner-code schema, we will be transmitting binary strings of length n ₁=c⋅logn over a binary symmetric channel with cross-over probability p∈[1/4,1/2].

The encoding schema uses a set $\mathcal{C}$ of $2^{\lfloor r n_{1} \rfloor}$ random codewords drawn from $\{0,1\}^{n_{1}}$ where r is a parameter to be determined later that corresponds to the rate of the inner code. These codewords are mapped arbitrarily to the elements of $\{0,1\}^{\lfloor r n_{1}\rfloor}$. The decoding procedure is a maximum-likelihood decoder: given a received word, the message corresponding to the codeword closest to it, is determined and returned, with ties broken arbitrarily.

We next analyze the probability the decoding procedure fails. In what follows $e \in \{0,1\}^{n_{1}}$ is selected at random such that Pr[e _i=1]=p; we would like to bound the probability Pr[c _i⊕e is closest to c _i] from below. Note that Pr[c _i⊕e is closest to c _i]=Pr[d(c _i⊕e,c _i)<d(c _i⊕e,c _j)] for all c _j with j≠i, where d is the Hamming distance. Without loss of generality, we can let c _i=0 and proceed as follows. Pr[e is closest to 0] = $1 - \Pr[\hbox{Some}\ c\in \mathcal{C}\ \hbox{is closer to}\ \vec{e}\ \hbox{than}\ \mathbf{0}]$. To this end, let us first proceed to upper bound the probability $\Pr[\hbox{Some}\ c\in \mathcal{C}\ \hbox{is closer to}\ \vec{e}\ \hbox{than}\ \vec{0}]$ which is the decoding error probability of the inner code.

Let |B _w(x)| be the set of words y with d(x,y)≤w. The set B _w(x) is called the ball with radius w and center x. $|B_{w}(x)| = \sum_{0 \leq k \leq w} \binom{n_{1}}{k}$. Let α be a constant with 1/2>α>p.

Here, |B _i(x)| is the set of words y with d(x,y)≤i. Then, for 0≤α≤1/2,

$$\bigl|B_{\alpha n_1}(x)\bigr| = \sum_{0\leq k\leq \alpha n_1} \binom{n_1}{k} \leq 2^{n_1 H(\alpha)}. $$

Thus,

Here the final inequality follows from the Chernoff bound.^{Footnote 1} The above indicates that as long as we choose α,r such that α>p and r<1−H(α) we see that the error probability of decoding in the inner code is at most ϵ ₁ provided that

$$n_1\geq \max \bigl\{ 3p \ln\bigl(2 \epsilon_1^{-1} \bigr) (\alpha-p)^{-2}, \log\bigl(2 \epsilon_1^{-1} \bigr) \bigl(1-H(\alpha) -r\bigr)^{-1} \bigr\}. $$

The outer code. We next describe the outer code that is a Reed–Solomon code over the binary extension field $\mathit{GF}(2^{n_{1}})$. The code has length $n_{2} < 2^{n_{1}}$ symbols and is of rate κ. We can correct a number of errors up to a rate of (1−κ)/2. We want to ensure that we can correct the message with probability 1−ϵ. Let u be the number of errors of the inner code. The expected value of u is ϵ ₁ n ₂. We would like to bound the probability p′=Pr[u>(1−κ)n ₂/2]. Applying Chernoff bound and setting $\zeta = \epsilon_{1}^{-1} (1-\kappa)/2 -1$ we have $p'< e^{ - \epsilon_{1} n_{2} ( \epsilon_{1}^{-1}(1-\kappa)/2 -1)^{2} / 3 } \leq \epsilon$ provided that

$$n_2 \geq \ln\bigl( \epsilon^{-1} \bigr) \epsilon_1^{-1} \cdot \bigl(\epsilon_1^{-1} (1-\kappa)/2 - 1\bigr)^{-2}. $$

We may decode the Reed–Solomon code using the Berlekamp–Welch algorithm and in the event that the error rate is greater than (1−κ)/2, the output of the decoding procedure will be a failure symbol ⊥.

The concatenation construction. The objective is to transmit with a given transmission rate R<1−H(p) for a transmission length of n bits while having a decoding error probability of ϵ=e ^−θn/logn for suitable θ (which is independent of n). The recovery of the R⋅n bits of message should be achieved in time polynomial in n. Since we employ a concatenated code with inner-code rate r and the outer code rate κ, we can achieve our objective provided that r=1−H(p)−f ₁, κ=1−f ₂ as long as f ₁+f ₂≤Z where Z=1−H(p)−R for some suitably chosen values f ₁,f ₂∈[0,1].

To account for the fact that the decoding of the inner code is done in a brute-force manner we need that n ₁≤c ₁⋅1/rlogn for some constant c ₁ (as in this case we maintain polynomial-time dependency in n in terms of running time).

Let g ₁<f ₁ be some constant; we set α=H ⁻¹(H(p)+g ₁). Observe that α>p and α<1/2, i.e., this is a valid choice for the selection of α in the inner-code design. Furthermore, we have

$$1-H(\alpha) - r = 1-H(p) -g_1 - \bigl(1 - H(p) - f_1\bigr) = f_1 - g_1\quad \hbox{and}\quad \alpha - p \geq \frac{g_1}{2}. $$

This latter inequality follows from the fact that H(p)+z≥H(p+z/2) for all p∈[1/4,1/2] and z∈[0,1]. Based on the above we may restate the bounds on n ₁ as follows:

$$n_1 \geq \max \bigl\{ 5\cdot \log\bigl(2 \epsilon_1^{-1} \bigr) g_1^{-2}, \log\bigl(2 \epsilon_1^{-1} \bigr) (f_1-g_1)^{-1} \bigr\}. $$

We set g ₁=f ₁/2 and f ₁=Z/2 and we obtain the requirement that $n_{1} = {\varOmega}( Z^{-2}\log \epsilon_{1}^{-1})$. Next observe that (1−κ)/2ϵ ₁−1=f ₂/(2ϵ ₁)−1, by setting f ₂=2ϵ ₁(1+g ₂) for some constant g ₂, the bound on n ₂ can be expressed as

$$n_2 \geq \ln\bigl(\epsilon^{-1}\bigr) \epsilon_1^{-1} g_2^{-2}. $$

Note that assuming f ₂=Z/2 we obtain g ₂=Z/(4ϵ ₁)−1, by setting ϵ ₁=Z/5 we obtain g ₂=1/4. This results in the bound n ₂≥80⋅Z ⁻¹ln(ϵ ⁻¹), i.e., n ₂=Ω(Z ⁻¹log(ϵ ⁻¹)). Combining the above results with the fact that n ₁=O(logn), n=n ₁ n ₂ we find that the error probability is 2^−θn/logn where θ=Θ(Z) assuming that n≥n ₀ where n ₀ is such that logn ₀=Ω(Z ⁻²logZ ⁻¹). □

Appendix B. Lower Bound on Rate Given Upper Bound on Error

Proposition 15

Let 0≤τ<1/4 be a constant. Let R′=1−H(1/2−τ). Then, R′≥τ ². Here, H(⋅) is the Shannon entropy function.

Proof

We want to lower bound the rate R′=1−H(1/2−τ). To this end, let us upper bound H(1/2−τ).

From the definition of Shannon entropy,

$$H\biggl(\frac{1}{2}-\tau\biggr)=- \biggl[ \biggl(\frac{1}{2}-\tau \biggr)\log_{2} \biggl(\frac{1}{2}-\tau \biggr)+ \biggl( \frac{1}{2}+\tau \biggr)\log_{2} \biggl(\frac{1}{2}+\tau \biggr) \biggr]. $$

Rewriting the log terms as below,

we get

We now lower bound the terms ln(1+(−2τ)) and ln(1+(2τ)) using the natural logarithm power series:

$$\ln(1+x)=x-\frac{1}{2}x^{2}+\frac{1}{3}x^{3}- \frac{1}{4}x^{4}\ldots\quad (-1<x\leq1). $$

In our case, we have 0≤τ<1/4. So, −1/2<−2τ≤0 and 0≤2τ<1/2.

When −1/2<x≤0,

When 0≤x<1/2,

$$\ln (1+x )=x-\frac{1}{2}x^{2}+\frac{1}{3}x^{3}- \frac{1}{4}x^{4}\geq x-\frac{1}{2}x^{2}. $$

Hence, we have

and

This gives us

$$R^{\prime} = 1-H \biggl(\frac{1}{2}-\tau \biggr) \geq 1- \bigl(1-1.44\tau^{2} \bigr) \geq \tau^{2}. $$

□

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kiayias, A., Raekow, Y., Russell, A. et al. A One-Time Stegosystem and Applications to Efficient Covert Communication. J Cryptol 27, 23–44 (2014). https://doi.org/10.1007/s00145-012-9135-4

Download citation

Received: 05 January 2011
Published: 25 October 2012
Issue Date: January 2014
DOI: https://doi.org/10.1007/s00145-012-9135-4

Key words

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A One-Time Stegosystem and Applications to Efficient Covert Communication

Abstract

Similar content being viewed by others

Key-Efficient Steganography

On the Gold Standard for Security of Universal Steganography

Stegogames

1 Introduction

2 Definitions and Tools

2.1 One-time Stegosystems; The Steganographic Models

Current History Model

Look-Ahead Model

Definition 1

Definition 2

Definition 3

Overhead

2.2 Error-Correcting Codes

Definition 4

Theorem 1

2.3 Function Families and Almost t-Wise Independence

Definition 5

Lemma 2

Theorem 3

2.4 Rejection Sampling

Rejection Sampling in the Current History Model

Lemma 4

Proof

Rejection Sampling in the Look-Ahead Model

3 The Construction

3.1 Our Stegosystem for the Current History Model

3.1.1 Correctness

Theorem 5

Proof

3.1.2 Security

Theorem 6

Proposition 7

Proof

Lemma 8

Proof

Proof of Theorem 6

3.2 Adapting to the Look-Ahead Model

Corollary 9

Corollary 10

3.3 Putting It All Together

Theorem 11

Theorem 12

4 A Provably Secure Stegosystem for Longer Messages

Definition 6

Theorem 13

Performance Comparison of the Stegosystem S′ and the Hopper, Langford, von Ahn System

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A. Error-Correcting Codes

Theorem 14

Proof

Appendix B. Lower Bound on Rate Given Upper Bound on Error

Proposition 15

Proof

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation