Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Electronic voting has been adopted in several countries, such as the United States, Estonia, Australia, Norway, Switzerland, and France, to conduct legally binding elections (or at least trials for some of them). Electronic voting systems should ensure the same properties than the traditional paper ballots systems, despite the fact that malicious users may easily intercept ballots and try to forge fake ones. One crucial property is vote privacy: no one should know how a particular voter voted. Symbolic models have been very successful in the analysis of more traditional protocols that aim at confidentiality or authentication. Many decision techniques and several tools have been developed (see [13] to cite only a few) which have been successfully applied to a large number of case studies including widely deployed protocols such as TLS [4]. Vote privacy in symbolic models can be expressed through a rather simple and natural property [5]: an attacker should not be able to distinguish the situation where Alice votes 0 and Bob votes 1 from the situation where the votes are swapped:

$$\begin{aligned} V_{\mathsf {Alice}}(0)\mid V_{\mathsf {Bob}}(1)\approx V_{\mathsf {Alice}}(1)\mid V_{\mathsf {Bob}}(0) \end{aligned}$$

Despite its apparent simplicity, this property is difficult to check for several reasons. Firstly, most existing decision techniques apply to reachability properties (such as authentication and confidentiality) but not to indistinguishability properties. Another major difficulty comes from the fact that e-voting systems involve less standard cryptographic primitives and sometimes even specially designed, ad-hoc primitives (like for the protocol used in Norway [6]). Typical primitives in e-voting are homomorphic encryption, zero-knowledge proofs, reencryption mixnets, etc. Some techniques and tools [710] for indistinguishability properties have recently been developed to automatically check indistinguishability properties and some of them can handle part of the primitives needed in e-voting. For example, ProVerif and Akiss have both been successfully applied to analyse some voting protocols [5, 1014]. However, a third source of difficulty is the fact that voting systems are typically parametrized by the number of voters: both the bulletin board and the tally processes have to process as many ballots as they receive. This is typically modeled by considering processes parametrized by the number of voters. Even though parameterized protocols can be encoded in a formalism such as the applied pi calculus, such encodings are complicated and generally beyond the capabilities of what automated tools support. ProVerif, which to the best of our knowledge is the only tool that supports verification of indistinguishability properties for an unbounded number of sessions (i.e. allowing replication) generally fails to prove vote privacy. One exception is a case study of the Civitas voting system by Backes et al. [11] using ProVerif. The other tools for indistinguishability (e.g. SPEC [8], Akiss [10], and APTE [9]) can only handle a finite number of sessions. So case studies have to consider a finite number of voters [10, 1214] unless proofs are conducted by hand [13, 15].

Contributions. Our main contribution is a reduction result for a reasonably large class of voting protocols. If there is an attack on privacy for n voters, we show that there also exists one that only requires 3 voters: 2 honest voters are necessary to state the privacy property and then 1 dishonest is sufficient to find all existing attacks. This result significantly simplifies security proofs: there is no longer need to consider arbitrarily many voters, even in manual proofs. Moreover, this result allows the use of automated tools for checking equivalence properties and justifies previous proofs conducted for a fixed number of voters (provided at least one dishonest voter was considered).

Several protocols assume voters may revote several times. This is for example the case of Helios or Civitas. Revoting is actually crucial for coercion-resistance in Civitas. When revoting is allowed, this should be reflected in the model by letting the ballot box accept an unbounded number of ballots, and retaining only the valid ones according to the revote policy. This aspect is typically abstracted in any existing formal analysis. We show that we can simplify the analysis by reducing the total number of ballots to 10 for typical revoting policies (e.g. the last vote counts) and typical tally functions. Altogether, our result amounts in a finite model property: if there is an attack on privacy on n voters that may vote arbitrarily, then there is an attack that only requires 3 voters and at most 10 ballots. We can further reduce the number of ballots to 7 for a class of protocols that has identifiable ballots, that is ballots that reveal the corresponding public credentials. Of course, only 3 ballots are sufficient when revoting is disallowed.

Our result holds in a rather general setting provided that the e-voting system can be modeled as a process in the applied-pi calculus [16]. Of course, this reduction result cannot hold for arbitrary systems. For example, if the tally phase checks that at least 4 ballots are present then at least 4 voters are necessary to conduct an attack. So we model what we think to represent a “reasonable” class of e-voting systems. The process modeling the voter may be an arbitrary process as long as it does not depend on credentials of other voters and provided voters do not need to interact once the tally phase has started. This corresponds to the “vote and go” property, that is often desirable for practical reasons, but also excludes some protocols such as [17]. Once the vote is casted the authorities proceed as follows.

  • The bulletin board (if there is one) performs only public actions such as publishing a received ballot, possibly removing some parts and possibly after some public tests, i.e. tests that anyone could do as well. Typical public tests are checks of signature validity, well-formedness of the ballots, or validity of zero-knowledge proofs. Alternatively, we may consider an arbitrary bulletin board in case it is corrupted since it is then part of the adversarial environment.

  • Next, a revote policy is applied. We consider two particular revote policies: the policy which selects the last ballot, which is the most common one, and the policy that selects the first one, which encodes the situation where revoting is prohibited.

  • Finally, the tally is computed according to some counting function. We consider in particular two very common functions: the multiset and the additive counting functions. The multiset counting function returns the votes in an arbitrary order and corresponds for example to the output of a decryption mixnet. The additive counting function returns the number of votes received by each candidate.

We believe that these conditions are general enough to capture many existing e-voting schemes.

Applications. To illustrate the applicability of our result, we re-investigate several existing analyses of e-voting protocols. First, we consider several versions of the Helios protocol [18], both in its mixnet and homomorphic versions. These versions also include the Belenios [19] protocol. We are able to use the ProVerif tool to show privacy for the mixnet versions of these protocols for a bounded number of voters and ballots. Our reduction result allows immediately to conclude that vote privacy also holds for an arbitrary number of voters. The homomorphic version of Helios is out of reach of existing tools due to the presence of associative and commutative symbols. However, our reduction result does apply, which means that the manual proof of Helios conducted in [13] did not need to consider arbitrarily many voters and could be simplified. In case one wishes to adapt this proof to Belenios [19], our reduction result would alleviate the proof. The Prêt-à-Voter [20] protocol (PaV) has been analysed using ProVerif for 2 honest voters [12]. Adding a third, dishonest, voter, we can apply our result and obtain the first proof of vote privacy for an arbitrary number of voters. Unfortunately, ProVerif did not scale up to verify automatically the protocol in presence of a dishonest voter. We were also able to apply our result (and a proof using ProVerif) to a protocol by Moran and Naor and to the JCJ protocol implemented in Civitas (without a ProVerif proof).

Related work. To our knowledge, the only other reduction result applying to voting protocols was proposed by Dreier et al. [21]. Their result states that it is sufficient to prove vote privacy for two honest voters when the protocol is observationally equivalent to a protocol consisting of the parallel composition (not sharing any secret) of a partition of the set of voters. Applicability has however only been shown to examples where this trivially holds, e.g. [17, 22] as these protocols use completely public tallying mechanisms. In general, proving the required equivalence does not seem easier than proving directly vote secrecy. Moreover, it does not apply to some well known protocols such as Helios since a dishonest voter is needed to mount the vote replay attack [13].

The results of [23, 24] show how to reduce the number of agents, in the case of trace properties [23] and equivalence properties [24]. The major difference with our work is that [23, 24] simply reduce the number of agent identities while the number of sessions (or processes) remains the same. In contrast, we do not only reduce the number of voter identities but also the number of ballots the ballot box needs to process, yielding a simpler process.

2 Modelling Security Protocols

As usual in symbolic protocol analysis we model protocol messages as terms. Protocols are modelled in a process calculus, similar to the applied pi calculus [16].

2.1 Messages

We assume an infinite set of names \(\mathcal {N}= \{a,b,k,n,\ldots \}\) (which are used to represent keys, nonces, ...) and an infinite set of channels \(\mathcal {C}{h}= \{c,c_1, ch , ch _1,\ldots \}\) (which are used to represent communication channels). We also consider a set of variables \(\mathcal {X}= \{x, y, \ldots \}\), and a signature \(\varSigma \) consisting of a finite set of function symbols.

Terms are defined as names, variables, and function symbols applied to other terms. In particular, a channel is not a term. Let \(\mathsf {N} \subseteq \mathcal {N}\) and \(\mathsf {X} \subseteq \mathcal {X}\), the set of terms built from \(\mathsf {N}\) and \(\mathsf {X}\) by applying function symbols in \(\varSigma \) is denoted by \(\mathcal {T}(\varSigma , \mathsf {N} \cup \mathsf {X})\). We write \( fv (t)\) (resp. \( fn (t)\)) for the set of variables (resp. names) occurring in a term t. A term is ground if it does not contain any variable.

Example 1

We model asymmetric encryption, signatures, and pairs by the signature

$$\begin{aligned} \varSigma _{\mathsf {aenc}} \mathop {=}\limits ^{\text {def}} \{\mathsf {aenc}{/3}, \mathsf {adec}{/2}, \mathsf {pk}{/1}, \mathsf {sig}{/2},\mathsf {checksig}{/2},\mathsf {getmsg}{/1}, \mathsf {vk}{/1}, \langle \cdot ,\cdot \rangle {/2}, {\mathsf {\pi }_1}{/1}, {\mathsf {\pi }_2}{/1}\} \end{aligned}$$

where f / i denotes that f has arity i. Consider term \(t \mathop {=}\limits ^{\text {def}} \langle \mathsf {pk}(sk), \mathsf {aenc}(\mathsf {pk}(sk), r, m) \rangle \) where \(sk, r, m \in \mathcal {N}\). The term t represents a pair consisting of the public key \(\mathsf {pk}(sk)\) associated to the private key sk and the encryption of message m with public key \(\mathsf {pk}(sk)\) using randomness r. To improve readability, we may sometimes write \(\langle t_1, \ldots , t_n\rangle \) instead of \(\langle t_1, \langle \ldots \langle t_{n-1}, t_n \rangle \ldots \rangle \rangle \).

We denote by \(\ell =[t_1, \ldots , t_n]\) the list of terms \(t_1, \ldots , t_n\) and by \(t_0\,{:}{:}\,\ell \) the list obtained by adding the term \(t_0\) to the head of the list, i.e., \(t_0\,{:}{:}\,\ell =[t_0,t_1, \ldots , t_n]\). Sometimes we interpret lists as multisets and we write \(\ell _1 =^\#\ell _2\) for the equality of the multisets corresponding to these lists.

A substitution is a partial function from variables to terms. The substitution \(\sigma \) that maps \(x_i\) to \(t_i\) \((1 \le i \le n)\) is denoted \(\{ x_1 \mapsto t_1, \ldots , x_n \mapsto t_n \}\) and we write \({\text {dom}}(\sigma ) = \{x_1, \ldots , x_n\}\) for the domain of \(\sigma \). We denote by \(\emptyset \) the substitution whose domain is empty. We always suppose that substitutions are acyclic. As usual we extend substitutions to terms and write \(t\sigma \) for the application of \(\sigma \) to term t.

To model algebraic properties of cryptographic primitives, we define an equational theory by a finite set \(\mathsf {E}\) of equations \(u = v\) with \(u,v \in \mathcal {T}(\varSigma ,\mathcal {X})\). We define \(=_\mathsf {E}\) to be the smallest equivalence relation on terms, that contains \(\mathsf {E}\) and that is closed under application of function symbols and substitutions of terms for variables.

Example 2

Continuing Example 1 we define the equational theory \(\mathsf {E}_{\mathsf {aenc}}\) by the following equations.

$$\begin{aligned}{}\begin{array}[c]{rclr rcl} \mathsf {adec}(x_k, \mathsf {aenc}(\mathsf {pk}(x_k), x_r, x_m)) &{} = &{} x_m &{} &{} \mathsf {checksig}(\mathsf {sig}(x,y),\mathsf {vk}(y)) &{} = &{} \mathsf {ok}\\ \mathsf {\pi }_i(\langle x_1, x_2\rangle ) &{} = &{} x_i &{} (i\in \{1, 2\}) &{} \mathsf {getmsg}(\mathsf {sig}(x,y)) &{} = &{} x\\ \end{array} \end{aligned}$$

Then we have that \(\mathsf {adec}( sk, \mathsf {\pi }_2 (t)) =_{\mathsf {E}_{\mathsf {aenc}}} m\).

To illustrate our calculus we consider the Helios e-voting protocol as running example. The Helios protocol relies on zero knowledge proofs. We next specify the equational theory for the particular zero knowledge proofs built by the Helios participants.

Example 3

The Helios zero knowledge proofs can be modelled by the signature

figure a

In case of homomorphic tally, the voters should also prove that their vote is valid, which can be modeled in a similar way. When submitting an encrypted vote, voters are required to prove that the encryption is well-formed, that is to say, that they know the corresponding plaintext and randomness. This is reflected by the following equation.

$$\begin{aligned} \mathsf {checkzkp}_\mathsf{E}(\mathsf {zkp}_\mathsf{E}(xr, xv, \mathsf {aenc}(xpk, xr, xv)), \mathsf {aenc}(xpk, xr, xv)) = \mathsf {okzkp}_\mathsf{E}. \end{aligned}$$

In the decryption mixnets-based variant of the Helios protocol, the talliers output a zero knowledge proof of correct mix and decryption. Such a proof establishes that the output of the decryption mixnet is indeed a permutation of the content of the encrypted ballots received as input. This is captured by the following infinite set of equations. For all \(m\in \mathbb N\), and all \(\{i_1, \dots , i_m\} = \{1, \dots , m\}\),

$$\begin{aligned} \mathsf {checkzkp}^m_\mathsf{DM}(\mathsf {zkp}^m_\mathsf{DM}(xk, xciph, xplain), xciph, xplain) = \mathsf {okzkp}^m_\mathsf{DM} \end{aligned}$$

with \(xciph = (\mathsf {aenc}(\mathsf {pub}(xk), xr_1, xv_1), \dots , \mathsf {aenc}(\mathsf {pub}(xk), xr_m, xv_m))\) and \(xplain = (xv_{i_1}, \dots , xv_{i_m})\).

In all the examples of this section, we will consider the signature \(\varSigma = \varSigma _{\mathsf {aenc}} \cup \varSigma _{\mathsf {zkp}}\) and the equational theory \(\mathsf {E}= \mathsf {E}_{\mathsf {aenc}} \cup \mathsf {E}_{\mathsf {zkp}}\).

We say that a symbol \(+\) is associative and commutative (AC in short) w.r.t. an equational theory E if E contains the two equations:

$$\begin{aligned} x+y = y+x \quad \quad x+(y+z) = (x+y)+z \end{aligned}$$

2.2 Processes

We model protocols using a process calculus. Our plain processes are similar to plain processes in applied pi calculus [16] and are defined through the grammar given in Fig. 1 where c is a channel, \(t, t_1, t_2\) are terms, x is a variable, n is either a name or a channel, and \(i\in \mathbb {N}\) is an integer. The terms \(t, t_1, t_2\) may contain variables.

Fig. 1.
figure 1

Syntax of plain processes

The process 0 does nothing. \(P \mid Q\) behaves as the parallel execution of processes P and Q. \(\nu n. P\) restricts the scope of n. When n is a name, it typically represents a freshly generated, secret value, e.g., a key or a nonce, in P. When n is a channel, it declares a private channel, that cannot be accessed by the adversary. Replication ! P behaves as an unbounded number of copies of P. The conditional \( \mathsf {if}\ \ \ t_1=t_2\ \mathsf {then}\ P\ \mathsf {else}\ Q\ \) behaves as P if \(t_1\) and \(t_2\) are equal in the equational theory and as Q otherwise. The process \(c(x). P\) inputs a message t on channel c, binds it to x and then behaves as P where x has been replaced by t. \(\overline{c}\langle t\rangle .Q \) outputs message t on channel c before behaving as Q. Our calculus also introduces a phase instruction, in the spirit of [24, 25], denoted i : P. We denote by \( Phase (P)\) the set of phases that appears in P, that is the set of j such that j : Q occurs in P. By a slight abuse of notations, we write \( Phase (P)< Phase (Q)\) if any phase in \( Phase (P)\) is smaller than any phase in \( Phase (Q)\).

As usual, names and variables have scopes, which are delimited by restrictions and inputs. We write \( fv (P)\), \( bv (P)\), \( fn (P)\) and \( bn (P)\) for the sets of free and bound variables, and free and bound names of a plain process P respectively.

Example 4

A voter in Helios proceeds as follows. She computes her ballot by encrypting her vote with the public key \(\mathsf {pk}(skE)\) of the election. The corresponding secret key is shared among several election authorities, which is not modeled here. Then she casts her ballot together with her identity and a zero knowledge proof through an authenticated channel. All this information will be published on a public bulletin board. The process \(V(\mathsf {pk}(skE), cred, id, v)\) models the actions of a voter with identity id and credential cred casting a ballot for candidate v:

$$\begin{aligned} V(\mathsf {pk}(skE), cred, id, v) \mathop {=}\limits ^{\text {def}} \nu r.\ \overline{bb}\langle \langle id, \mathsf {sig}(bal,cred), prf \rangle \rangle \end{aligned}$$

where \(bal = \mathsf {aenc}(\mathsf {pk}(skE), r, v)\) and \( prf = \mathsf {zkp}_\mathsf{E}(r, v, bal)\). The authenticated channel is modelled by a signature although Helios relies on a login/password mechanism.

Extended processes keep track of additional information during an execution: the names that have been bound, the currently active processes that are running in parallel, the history of messages that were output by the process and the current phase.

Definition 1

(Extended process). An extended process is a tuple \((\mathcal {E} ; \mathcal {P} ; \varPhi ; i)\) where:

  • \(\mathcal {E}\) is a set of names and channels that are restricted in \(\mathcal {P}\) and \(\varPhi \);

  • \(\mathcal {P}\) is a multiset of plain processes with \( fv (\mathcal {P}) = \emptyset \);

  • \(\varPhi = \{x_1 \mapsto u_1, \ldots , x_n \mapsto u_n\}\) is a ground substitution where \(u_1, \ldots , u_n\) represent the messages previously output to the environment.

  • i is an integer denoting the current phase.

Example 5

The following extended process models two honest Helios voters \(id_A\) and \(id_B\) ready to cast their ballots \(v_A\) and \(v_B\) respectively in a first phase, and the Helios tallying authorities Tal ready to tally the cast ballots in a second phase

$$\begin{aligned} \mathsf {Helios}(v_A,v_B) \mathop {=}\limits ^{\text {def}} (\mathcal {E}_0, 1:V_A \mid 1:V_B \mid 2:Tal, \emptyset , 1) \end{aligned}$$

where \(\mathcal {E}_0\) is a set of names with \(cred_A,cred_B \in \mathcal {E}_0\),

$$\begin{aligned} V_A \mathop {=}\limits ^{\text {def}} V(\mathsf {pk}(skE), cred_A, id_A, v_A) \text{ and } V_B \mathop {=}\limits ^{\text {def}} V(\mathsf {pk}(skE), cred_B, id_B, v_B) \end{aligned}$$

model the two honest voters where V is defined in Example 4, and

$$\begin{aligned} Tal \mathop {=}\limits ^{\text {def}} bb(xb_A).bb(xb_B). T \end{aligned}$$

for some process T modelling the tallying authorities.

Given \(A = (\mathcal {E} ; \mathcal {P} ; \varPhi ; i)\), we define the set of free and bound names of A as \( fn (A) = ( fn (\mathcal {P})\cup fn (\varPhi )) \smallsetminus \mathcal {E}\), and \( bn (A) = bn (\mathcal {P}) \cup \mathcal {E}\). Similarly free and bound variables are defined as \( fv (A) = ( fv (\mathcal {P}) \cup {\text {dom}}(\varPhi ))\), and \( bv (A) = bv (\mathcal {P})\). An extended process A is closed if \( fv (A) ={\text {dom}}(\varPhi )\).

The operational semantics of our calculus is defined by a labelled transition system which allows to reason about processes that interact with their environment. The transition relation \(A \xrightarrow {\ell } B\) relates two ground extended processes A and B and is decorated by a label \(\ell \), which is either an input (\(c(M)\)), an output (\(\nu x. \overline{c}\langle x\rangle \)), or a silent action (\(\tau \)). Silent actions are standard, while visible input and output actions are interactions with the adversary on public channels. An output label \(\nu x. \overline{c}\langle x\rangle \) reflects that messages are output “by reference”: the label contains the variable added to \({\text {dom}}(\varPhi )\) which maps to the ground message that was output. The input label \(c(M)\) contains the term M used by the adversary to compute the message: M may be constructed from previous outputs (addressed through variables in \({\text {dom}}(\varPhi )\)), but is not allowed to use private names. The transition relation is formally defined in the companion technical report [26].

Notations. Given a set \(\mathcal S\) we denote by \(\mathcal S^*\) the set of all finite sequences of elements in \(\mathcal S\). We may also write \(\tilde{u}\) for the finite sequence \(u_1,\ldots ,u_n\). Let \(\mathcal {A}\) be the alphabet of actions (in our case this alphabet is infinite and contains the special symbol \(\tau \)). For every \(w \in \mathcal {A}^*\), the relation \(\xrightarrow {w}\) on processes is defined in the usual way, i.e., we write \(A \xrightarrow {w} A'\) when \(w = \ell _1\ell _2 \cdots \ell _n\) and \(A \xrightarrow {\ell _1} A_1 \xrightarrow {\ell _2} \ldots \xrightarrow {\ell _n} A'\). For \(s \in (\mathcal {A}\smallsetminus \{\tau \})^*\), the relation on processes is defined by: if, and only if there exists \(w \in \mathcal {A}^*\) such that \(A \xrightarrow {w} B\) and s is obtained by erasing all occurrences of \(\tau \) from w.

Example 6

Continuing our running example we illustrate the operational semantics by the following transitions

  • \(\mathcal {E}= \mathcal {E}_0 \cup \{r_A, r_B\}\),

  • \(\varPhi =\{y_A \mapsto \langle id_A, \mathsf {sig}(bal_A,cred_A), prf_A \rangle , y_B \mapsto \langle id_B, \mathsf {sig}(bal_B,cred_B), prf_B \rangle \}\) where \(bal_C = \mathsf {aenc}(\mathsf {pk}(skE), r_C, v_C)\) and \( prf _C = \mathsf {zkp}_\mathsf{E}(r_C, v_C, bal_C)\) for \(C\in \{A, B\}\).

A frame \(\varphi = \nu \mathcal {E}. \varPhi \) consists of a set of names \(\mathcal {E}\) and a substitution \(\varPhi = \{x_1 \mapsto u_1, \dots , x_n\mapsto u_n\}\). The names \(\mathcal {E}\) are bound in \(\varphi \) and can be \(\alpha \)-converted. Moreover names can be added (or removed) to (from) \(\mathcal {E}\) as long as they do not appear in \(\varPhi \). We write \(\varphi =_\alpha \varphi '\) when frames \(\varphi \) and \(\varphi '\) are equal up to \(\alpha \)-conversion and addition/removal of unused names. In this way two frames can always be rewritten to have the same set of bound names. When \(A = (\mathcal {E} ; \mathcal {P} ; \varPhi ; i)\) is an extended process, we define \(\phi (A) \mathop {=}\limits ^{\text {def}} \nu \mathcal {E}. \varPhi \).

Given a frame \(\varphi = \nu \mathcal {E}. \varPhi \) an attacker can construct new terms building on the terms exposed by \(\varphi \). For this the attacker applies a recipe on the frame. A recipe R for a frame \(\varphi \) is any term such that \( fn (R)\cap \mathcal {E}= \emptyset \) and \( fv (R)\subseteq {\text {dom}}(\varPhi )\). An attacker is unable to distinguish two sequences of messages if he cannot construct a test that distinguishes them. This notion is formally captured by static equivalence [16] of frames.

Definition 2

(Static equivalence). Two frames \(\varphi _1 =_\alpha \nu \mathcal {E}. \varPhi _1\) and \(\varphi _2 =_\alpha \nu \mathcal {E}. \varPhi _2\) are statically equivalent, noted \(\varphi _1\sim \varphi _2\) when \({\text {dom}}(\varPhi _1) = {\text {dom}}(\varPhi _2)\), and for all recipes M and N of \(\varphi _1\) we have that \(M\varPhi _1 =_\mathsf {E}N\varPhi _1 {~iff~} M\varPhi _2 =_\mathsf {E}N\varPhi _2\).

Note that in the above definition the frames \(\varphi _1\) and \(\varphi _2\) have the same set of recipes as they bind the same names \(\mathcal {E}\) and their substitutions have the same domain.

Example 7

Let \(\varPhi \) be the substitution of Example 6 and

$$\begin{aligned} \varPhi ' = \{y_A \mapsto \langle id_A, \mathsf {sig}(bal'_A,cred_A), prf'_A \rangle ,y_B \mapsto \langle id_B, \mathsf {sig}(bal'_B,cred_B), prf'_B \rangle \} \end{aligned}$$

where \(bal'_C = \mathsf {aenc}(\mathsf {pk}(skE), r_C, v_D)\) and \( prf'_C = \mathsf {zkp}_\mathsf{E}(r_C, v_D, bal'_C)\) for \(C, D\in \{A, B\}\) with \(C\not = D\). Since \(\mathsf {adec}(skE, \pi _1(\pi _1(\mathsf {getmsg}(y_A))))\varPhi =_\mathsf {E}v_A\), but \(\mathsf {adec}(skE, \pi _1(\pi _1(\mathsf {getmsg}(y_A))))\varPhi ' \not =_\mathsf {E}v_A\), we have that

$$\begin{aligned} \nu skE. \nu r_A. \nu r_B. \varPhi \sim _\mathsf {E}\nu skE. \nu r_A. \nu r_B. \varPhi ' \text { while } \nu r_A. \nu r_B. \varPhi \not \sim _\mathsf {E}\nu r_A. \nu r_B. \varPhi ' \end{aligned}$$

Indeed, an attacker may distinguish between these two frames as soon as he has the secret key skE, by simply decrypting the ballots.

Given two extended processes \(A_1\) and \(A_2\), we often write \(A_1\sim A_2\) for \(\phi (A_1)\sim \phi (A_2)\). Given an extended process A we define its set of traces as

We can now define what it means for an attacker to be unable to distinguish two processes even if he is allowed to actively interact with them. This notion of indistinguishability is naturally modelled by trace equivalence.

Definition 3

(Trace equivalence). Let A and B be two closed extended processes. A is trace included in B, written \(A \sqsubseteq B\), if for every trace \((tr, A')\in \mathsf {traces}(A)\) there exists \(B'\) such that \((tr, B')\in \mathsf {traces}(B)\) and \(A' \sim B'\). A and B are trace equivalent, denoted \(A\approx B\), if \(A \sqsubseteq B\) and \(B \sqsubseteq A\).

Intuitively, as the sequence of visible actions in the labels encode the adversary’s actions the definition requires that for the same interaction with the adversary the protocols produce indistinguishable outputs.

3 Modelling E-Voting Protocols

In this section we explain how we formally model e-voting protocols and state the assumptions needed for our results.

Since many e-voting protocols use zero-knowledge proofs, we consider a signature \(\varSigma \) with \(\mathsf {zkp}, \mathsf {checkzkp}, \mathsf {okzkp}\in \varSigma \) and we assume an equational theory that can be described by an AC-convergent (possibly infinite) rewrite theory such that the only rules in which \(\mathsf {zkp}\), \(\mathsf {checkzkp}\), and \(\mathsf {okzkp}\) occur, are of the form:

$$\begin{aligned} \mathsf {checkzkp}(\mathsf {zkp}(U_1, \dots , U_m), V_1, \dots , V_n) \rightarrow \mathsf {okzkp}\end{aligned}$$

where \(\mathsf {zkp}, \mathsf {checkzkp}, \mathsf {okzkp}\) do not occur in the \(U_i,V_j\). Since the terms \(U_i,V_j\) are left unspecified, this captures most existing zero-knowledge proofs. In particular, it covers the zero-knowledge proofs considered in Example 3.

A voting protocol is a family of processes \(\{ \varPi ^{n_h, n_d, m}(\mathcal {C}r^h_{n_h}, \mathcal {C}r^d_{n_d},\mathcal {K}_\mathsf {pv}, \mathcal {K}_\mathsf {pb}) \}_{n_h, n_d,m \in \mathbb {N}}\) where

  • \(n_h\) and \(n_d\) are the number of honest and dishonest voters respectively;

  • \(\mathcal {C}r^h_{n_h}\) (resp. \(\mathcal {C}r^d_{n_d}\)) is the set of \(n_h\) (resp. \(n_d\)) voting credentials which determines the set of honest eligible voters (resp. dishonest eligible voters), such that \(\mathcal {C}r^h_{n_h}\cap \mathcal {C}r^d_{n_d} = \emptyset \). Each credential \(\tilde{cr}\in \mathcal {C}r^h_{n_h}\cup \mathcal {C}r^d_{n_d}\) is a sequence of terms;

  • m is the number of ballots accepted during the tally;

  • \(\mathcal {K}_\mathsf {pv}\) (resp. \(\mathcal {K}_\mathsf {pb}\)) is the set of all private (resp. public) material.

As usual it is sufficient to consider voting processes that model only the honest voters and the tally (the dishonest voters are left unspecified as part of the environment, and their credentials are public). We may assume w.l.o.g. that the tally process starts with a fresh phase and first reads the ballots on the board. Formally, we assume that voting processes are of the form:

$$\begin{aligned}&\varPi ^{n_h, n_d, m}(\mathcal {C}r^h_{n_h}, \mathcal {C}r^d_{n_d}, \mathcal {K}_\mathsf {pv}, \mathcal {K}_\mathsf {pb})\ \mathop {=}\limits ^{\text {def}} V(\tilde{cr_1}) \mid V(\tilde{cr_2}) \mid \dots \mid V(\tilde{cr_{n_h}})\mid \\&\qquad \qquad \qquad \qquad \qquad \quad \mathsf {tall}:{bb(x_1).\; \ldots \; . bb(x_m).} T^{n, m}(\mathcal {C}r_n, \mathcal {K}_\mathsf {pv}, \mathcal {K}_\mathsf {pb}) \end{aligned}$$

where \(\mathcal {C}r_n = \mathcal {C}r^h_{n_h}\cup \mathcal {C}r^d_{n_d}\), and for all \(i\in \{1, \dots , n_h\}\), \(\tilde{cr_i}\in \mathcal {C}r^h_{n_h}\). Furthermore, we require that \( Phase (V)<\mathsf {tall}\), \( Phase (T^{n, m})=\emptyset \) and \(T^{n, m}(\mathcal {C}r_n, \mathcal {K}_\mathsf {pv}, \mathcal {K}_\mathsf {pb})\) contains at most one output which is performed on the channel tal. We note that from the above structure of a voting process it follows that all traces are prefixes of traces of the form

$$\begin{aligned} tr' {\cdot } \mathsf {phase}\ \mathsf {tall} {\cdot } bb(RB_1) {\dots } bb(RB_m) {\cdot } \nu y. \overline{tal}\langle y\rangle . \end{aligned}$$

\(V(\tilde{cr})\) models an honest voter, whose credentials are \(\tilde{cr}\). \(T^{n, m}(\mathcal {C}r_n, \mathcal {K}_\mathsf {pv}, \mathcal {K}_\mathsf {pb})\) is the remainder of the tallier process. It is parameterised by the number m of ballots it accepts and the number n of eligible voters. We require that \(V(\tilde{cr})\) be independent of n and m and does not use any other credentials, i.e. \( fn (V(\tilde{cr})) \cap \mathcal {C}r_n \subseteq \{\tilde{cr}\}\). These are the only restrictions on the voter process and we believe them to be reasonable and natural.

An e-voting protocol proceeds in two phases: vote casting and tallying. During the vote phase all voters simply cast their ballots. The tally phase proceeds as follows. First m ballots are input. Then a public test is applied to these ballots to carry out a first validity check, e.g. verify some zero knowledge proofs ensuring that the ballots are well formed. Next, the revote policy is applied to remove votes cast by a same voter, e.g., keep only the last one. Finally, the process performs the tally and outputs the result.

3.1 Public Tests

As explained above, the ballot box may apply public tests to the casted ballots. Public tests are Boolean combinations over atomic formulas of the form \(M = N\) where \(M, N \in \mathcal {T}(\varSigma , \mathsf {X})\), i.e. they do not contain any names. An atomic formula is satisfied when \(M =_\mathsf {E}N\) and we lift satisfaction to tests as expected.

We assume a family of tests \(\{ \mathsf {Test}^m \}_{m\in \mathbb {N}}\) where m is the number of casted ballots that are tested and \(\mathsf {Test}^m\) contains m distinguished variables \(x_1, \ldots , x_m\) to be substituted by the ballots. We write \(\mathsf {Test}^m ([B_1, \ldots , B_m]) =\top \) when the test \(\mathsf {Test}^m \{x_1 \mapsto B_1, \ldots , x_m \mapsto B_m\}\) is satisfied. Finally we say that a test is voting-friendly whenever satisfaction is preserved on sublists of ballots, that is \(\mathsf {Test}^m([B_1, \dots , B_m]) = \top \) implies \(\mathsf {Test}^h([B_{i_1}, \dots , B_{i_h}]) = \top \) for any \(1\le i_1<\dots < i_h \le m\).

We believe this condition to be natural. It discards contrived tests that would accept a ballot only if another ballot is present. Conversely, we may consider tests that discard lists with duplicate ballots.

Example 8

The public test applied by the tallying authorities in the Helios protocol consists of two parts. First, a local test that checks the zero knowledge proofs of each submitted ballot, and second, a global test that checks that encrypted votes are pairwise distinct. This is to avoid the replay attack mentioned in [13]. Such checks are formally reflected by the family of tests \(\{\mathsf {Test}^m\}_{m\in \mathbb N}\) with

$$\begin{aligned} \begin{array}{l} Test^m([B_1, \dots , B_m]) \mathop {=}\limits ^{\text {def}} \mathop {\bigwedge }\nolimits _{i=1}^{i=m} \mathsf {lTest}(B_i)\ \mathop {\bigwedge }\nolimits _{i,j\in \{1, \dots ,m\}}^{i\not =j} \mathsf {gTest}(B_i, B_j) \end{array} \end{aligned}$$
$$\begin{aligned} \begin{array}{l}\mathsf {lTest}(B) \mathop {=}\limits ^{\text {def}} \left\{ \begin{array}{ll} \top &{} \text {if } B = \langle id, bal, prf \rangle \text { and } \mathsf {checkzkp}_\mathsf{E}(\mathsf {getmsg}(bal), prf ) =_\mathsf {E}\mathsf {okzkp}_\mathsf{E} \\ \bot &{} \text {otherwise} \end{array} \right. \end{array} \end{aligned}$$
$$\begin{aligned} \begin{array}{l} \mathsf {gTest}(B, B') \mathop {=}\limits ^{\text {def}} \left\{ \begin{array}{ll} \top &{} \text {if } B = \langle id, bal, prf \rangle \text { and } B' = \langle id', bal', prf '\rangle \\ &{} \text { and } \mathsf {getmsg}(bal) \not = \mathsf {getmsg}(bal') \\ \bot &{} \text {otherwise} \end{array} \right. \end{array} \end{aligned}$$

3.2 Revote Policies

Many e-voting protocols offer voters the possibility to cast several votes, keeping eventually only one vote per voter, e.g. the last submitted ballot. Which vote is kept depends on the particular policy. Re-voting intends to guarantee some protection against coercion. We formalize the notion of policy as a function \(\mathsf {Policy}^{n,m}\) which takes a list of m terms (intuitively, the vote and credential) and a set of n credentials (honest and dishonest) and returns the sublist of selected terms to be tallied. A protocol will depend on a family of such policy functions \(\{ \mathsf {Policy}^{n,m} \}_{n,m \in \mathbb {N}}\). We consider two particular, but standard revote policies. The most usual one selects the last cast vote:

$$\begin{aligned} \mathsf {Policy}_{\mathsf {last}}^{n,m}([V_1, \dots , V_m], \mathcal {C}r_n)\mathop {=}\limits ^{\text {def}}[V_{i_1}, \ldots , V_{i_k}] \end{aligned}$$

where each \(V_{i_j} = (v,\tilde{cr})\) is the last occurence of the credential \(\tilde{cr}\in \mathcal {C}r_n\) in the list \([V_1, \dots , V_m]\). We also consider the policy which only keeps the first vote of each voter:

$$\begin{aligned} \mathsf {Policy}_{\mathsf {first}}^{n,m}([V_1, \dots , V_m], \mathcal {C}r_n)\mathop {=}\limits ^{\text {def}}[V_{i_1}, \ldots , V_{i_k}] \end{aligned}$$

where each \(V_{i_j} = (v,\tilde{cr})\) is the first occurence of the credential \(\tilde{cr}\in \mathcal {C}r_n\) in the list \([V_1, \dots , V_m]\). Such a policy typically models the norevote policy (a voter cannot revote).

3.3 Extracting Ballots and Counting Votes

A voting protocol should tally the ballots “as expected”. Formally, what is expected can be formalized through an extract and a counting function.

Given a ballot B, and two sets of terms \(\mathcal {K}_\mathsf {pb}\) and \(\mathcal {K}_\mathsf {pv}\) representing the public and private material, the extraction function \(\mathsf {Extract}\) returns the corresponding vote and credential, or \(\bot \) when a ballot is not well formed., i.e., \(\mathsf {Extract}(B, \mathcal {K}_\mathsf {pv}, \mathcal {K}_\mathsf {pb}) \in (\mathcal {V}\times \mathcal {C}r_n)\cup \{\bot \}.\) Moreover, we lift the extract function to lists of m ballots by applying the function pointwise, i.e., \(\mathsf {Extract}^m([B_1, \dots , B_m], \mathcal {K}_\mathsf {pv}, \mathcal {K}_\mathsf {pb}) \mathop {=}\limits ^{\text {def}}\)

$$\begin{aligned}{}[\mathsf {Extract}(B_1, \mathcal {K}_\mathsf {pv}, \mathcal {K}_\mathsf {pb}), \dots , \mathsf {Extract}(B_m, \mathcal {K}_\mathsf {pv}, \mathcal {K}_\mathsf {pb})] \end{aligned}$$

Similar extract functions have been introduced in [27] to define ballot privacy.

Example 9

The \(\mathsf {Extract}\) function for the Helios protocol decrypts the encrypted vote and associates it with the signature associated to the ballot:

$$\begin{aligned} \begin{array}{l} \mathsf {Extract}(B, \{skE\}, \{\mathsf {pk}(skE)\}) \mathop {=}\limits ^{\text {def}}\qquad \qquad \qquad \qquad \qquad \\[2mm] \left\{ \begin{array}{ll} (v, (id,cred)) &{} \text {if } B = \langle id, bal, prf \rangle \text { and } bal =_\mathsf {E}\mathsf {sig}(\mathsf {aenc}(\mathsf {pk}(skE), r, v),cred) \\ \bot &{} \text {otherwise} \end{array} \right. \end{array} \end{aligned}$$

Similarly the counting function defines how the protocol is supposed to tally the votes. The function \(\mathsf {Count}^\ell \) takes as input a list of \(\ell \) pairs \(( v, cr ) \in \mathcal {V}\times \mathcal {C}r\) and returns a list of terms as the election result.

Definition 4

Let \(\{ \mathsf {Count}^\ell \}_{\ell \in \mathbb {N}}\) be a family of counting functions. \(\{ \mathsf {Count}^\ell \}_{\ell \in \mathbb {N}}\) is voting-friendly if for all mn and lists of terms \(W_1\) of size m, \(W_2\) of size n we have that

  1. 1.

    if \(W_1 =^\#W_2\) then \(\mathsf {Count}^m(W_1) =^\#\mathsf {Count}^n(W_2)\);

  2. 2.

    if \(\mathsf {Count}^m(W_1) =^\#\mathsf {Count}^n(W_2)\) then \(\mathsf {Count}^{m+1}((v_1,cr_1)\,{:}{:}\,W_1) =^\#\mathsf {Count}^{n+1}((v_2,cr_2) \,{:}{:}\, W_2) \text { iff } v_1 = v_2\)

The first assumption requires that the result does not depend on the order in which votes are provided (intuitively, only valid votes are kept at this stage). We believe this property to be natural and it excludes contrived counting functions that would, e.g., only keep votes at even positions. The second assumption states that we may count “step by step”. This is more restrictive since it excludes the majority function, i.e., the function that only outputs the name of the candidate that received most votes. But, it captures the most common result functions, namely the multiset and the additive counting functions.

Example 10

The multiset counting function typically arises in mixnet based tallies, which simply output the list of votes (intuitively once votes have been shuffled).

$$\begin{aligned} \mathsf {Count}^1_\mathsf{Mix}([V_1])\mathop {=}\limits ^{\text {def}} [v] \text { and } \mathsf {Count}^m_\mathsf{Mix}([V_1, \dots , V_m]) \mathop {=}\limits ^{\text {def}} v\,{:}{:}\,Count^{m-1}_\mathsf{Mix}([V_2, \dots , V_m]) \end{aligned}$$

where \(V_1 = (v, \tilde{cr})\) and \(m>1\). The additive counting function can be defined similarly. For simplicity consider a binary vote, where we just want to count the number of 1’s:

$$\begin{aligned} \mathsf {Count}^1_\mathsf{HE}([V_1]) \mathop {=}\limits ^{\text {def}} v \text { and } \mathsf {Count}^m_\mathsf{HE}([V_1, \dots , V_m]) \mathop {=}\limits ^{\text {def}} v+Count^{m-1}_\mathsf{HE}([V_2, \dots , V_m]) \end{aligned}$$

where \(V_1 = (v, \tilde{cr})\), \(m>1\) and \(+\) is an AC symbol. Both functions are voting-friendly.

3.4 Properties

When verifying security properties of e-voting protocols it is common to only consider processes whose runs satisfy a particular property. For instance, vote secrecy is typically expressed as the indistinguishability of two processes modelling the situations where two honest voters swap their votes. We need however to ensure that these two honest voters have indeed cast their votes successfully to avoid trivial attacks. Indeed, in a run where the attacker blocks one of these voters, but not the other, the election result will be different and the two processes would be distinguished. Therefore when checking vote secrecy one typically adds a check that guarantees that the two honest votes are counted. We simply require that a check \(\mathsf {check}([b_1, \dots , b_m])\) applied to a list ballots \([b_1, \dots , b_m]\) satisfies the two following requirements:

  • If \(\mathsf {check}([b_1, \dots , b_m])\) holds then we can identify two (intuitively honest) ballots \(b_{i_1}\), \(b_{i_2}\) such that \(\mathsf {check}\) holds for any sublist containing \(b_{i_1}\) and \(b_{i_2}\).

  • If \(\mathsf {check}([b_1, \dots , b_m])\) does not hold then it does not hold either for any sublist of these ballots or if some ballots are replaced by invalid ones (that is replaced by \(\bot \)).

How such a check is implemented is left unspecified, it could be by listening to private channels, successively checking signatures, etc.

3.5 E-Voting Processes

As often when considering trace equivalence (e.g. [10, 24]), we assume processes to be deterministic. More precisely, we require the vote phase to be determinate: if the same sequence of labels leads to two different processes then the two resulting frames have to be statically equivalent. This typically holds for standard voting processes since the voter’s behaviour is deterministic. For the tallying phase we slightly relax this notion and require what we call almost determinate. This relaxed notion only requires that there exists an output of a tally (among all possible outputs, as the particular tally may be chosen non-deterministically) that ensures static equivalence. This allows us to capture some non-deterministic behaviors such as mixnet tally.

Definition 5

An e-voting protocol \( \{\varPi ^{n_h, n_d, m}(\mathcal {C}r^h_{n_h}, \mathcal {C}r^d_{n_d}, \mathcal {K}_\mathsf {pv}, \mathcal {K}_\mathsf {pb})\}_{n_h, n_d,m\in \mathbb N} \) is almost determinate if for any set of names \(\mathcal {E}_0\), any initial attacker knowledge \(\varPhi _0\), any \(m,n_h, n_d\in \mathbb N\), and any traces \((tr, A_1), (tr, A_2)\in \mathsf {traces}(\mathcal {E}_0, \varPi ^{n_h, n_d, m}(\mathcal {C}r^h_{n_h}, \mathcal {C}r^d_{n_d}, \mathcal {K}_\mathsf {pv}, \mathcal {K}_\mathsf {pb}), \varPhi _0, 0)\) we have that

We can now put all the pieces together and link e-voting protocols to the notions of public tests, revote policies, extraction and counting functions and properties.

Definition 6

An e-voting protocol \(\{\varPi ^{n_h, n_d, m}(\mathcal {C}r^h_{n_h}, \mathcal {C}r^d_{n_d}, \mathcal {K}_\mathsf {pv}, \mathcal {K}_\mathsf {pb})\}_{n_h, n_d,m\in \mathbb N}\) is voting friendly w.r.t. \(\mathsf {check}\), \(\{ \mathsf {Test}^m \}_{m\in \mathbb {N}}\), \(\{ \mathsf {Policy}^{n,m}\}_{n,m\in \mathbb {N}}\), \(\mathsf {Extract}\), \(\{ \mathsf {Count}^\ell \}_{\ell \in \mathbb {N}}\) if it is almost determinate, if \(\{ \mathsf {Test}^m \}_{m\in \mathbb {N}}\), \(\{ \mathsf {Policy}^{n,m}\}_{n,m\in \mathbb {N}}\), \(\mathsf {Extract}\), are voting-friendly, and if for any set of names \(\mathcal {E}_0\), any initial attacker knowledge \(\varPhi _0\), any \(m,n_h, n_d\), and any trace \((tr' {\cdot } \nu x. \mathsf {phase}\ \mathsf {tall}. bb(RB_1) {\dots } bb(RB_m), A_1)\) of \((\mathcal {E}_0, \varPi ^{n_h, n_d, m}(\mathcal {C}r^h_{n_h}, \mathcal {C}r^d_{n_d}, \mathcal {K}_\mathsf {pv}, \mathcal {K}_\mathsf {pb}), \varPhi _0, 0)\), the resulting list of ballots \(BB = [B_1, \dots , B_m]\) (where \(B_i = RB_i\phi (A_1)\)) satisfies the following properties.

1) The tally is successful (that is \((\nu y. \overline{tal}\langle y\rangle , A_2)\in \mathsf {traces}(A_1)\)) if and only if BB passes the test and the check (\(\mathsf {Test}^m(BB) = \top \) and \(\mathsf {check}(BB)=\top \))

2) Whenever the tally produces an output (that is \((\nu y. \overline{tal}\langle y\rangle , A_2)\in \mathsf {traces}(A_1)\)) then it outputs a triple \(y\phi (A_2) = \langle res, nvotes, zkp\rangle \) where

  • res is the result computed by counting the votes once the extraction function and the revote policy have been applied on the bulletin board;

  • nvotes is the number of votes that has been counted;

  • zkp is a (valid) zero-knowledge proof that would not be valid for any other list of ballots different from BB;

  • either res is the only result the tally can produce from BB (typically in the homomorphic case) or the tally can produce any permutation of it (typically in the mixnet case).

A fully formal definition can be found in the companion technical report [26]. We believe most existing protocols satisfy these requirements.

For many protocols ballots can be associated to the public credentials that were used to cast them. This is the case for Helios and some of its variants where ballots either contain the voter identity (in the original Helios) or are signed using private credentials (in the Belenios system). As we will see in the next section we can get tighter bounds for this class of protocols. Formally we define protocols with identifiable ballots as follows.

Definition 7

An e-voting protocol \(\{\varPi ^{n_h, n_d, m}(\mathcal {C}r^h_{n_h}, \mathcal {C}r^d_{n_d}, \mathcal {K}_\mathsf {pv}, \mathcal {K}_\mathsf {pb})\}_{n_h, n_d,m\in \mathbb N}\) has identifiable ballots if for all \(n_h, n_d,m\in \mathbb N\), for any trace

$$\begin{aligned} (tr' {\cdot } \nu x. \mathsf {phase}\ \mathsf {tall}.bb(RB_1) {\dots } bb(RB_m) {\cdot } \nu y. \overline{tal}\langle y\rangle , A) \end{aligned}$$

of \(\varPi ^{n_h, n_d, m}(\mathcal {C}r^h_{n_h}, \mathcal {C}r^d_{n_d}, \mathcal {K}_\mathsf {pv}, \mathcal {K}_\mathsf {pb}))\) there exists a recipe R and a variable x such that

$$\begin{aligned} \forall 1\le i\le m.\ {~if~} \mathsf {Extract}([RB_i\phi (A)], \mathcal {K}_\mathsf {pv}, \mathcal {K}_\mathsf {pb}) = (V, \tilde{cr}) {~then~} R_i\phi (A) = \mathsf {pub}(\tilde{cr}) \end{aligned}$$

where \(R_i = R\{x\mapsto RB_i\}\).

4 Main Results

Throughout the section we consider two voting protocols

$$\begin{aligned} \{\varPi _i^{n_h, n_d, m}(\mathcal {C}r^h_{n_h}, \mathcal {C}r^d_{n_d}, \mathcal {K}_\mathsf {pv}, \mathcal {K}_\mathsf {pb})\}_{n_h, n_d,m\in \mathbb N} \end{aligned}$$

for \(1\le i\le 2\) which are voting-friendly for \(\mathsf {check}_i\), \(\{ \mathsf {Test}^m \}_{m\in \mathbb {N}}\), \(\{\mathsf {Policy}^{n,m} \}_{n,m\in \mathbb {N}}\), \(\mathsf {Extract}_i^m\), \(\{ \mathsf {Count}_i^\ell \}_{\ell \in \mathbb {N}}\). Note that we assume the same public test for both protocols. Moreover we assume that \(n_h\ge 2\) and \(m\ge n_h+n_d\).

Let \(\mathcal {E}_0\) be a set of names, and \(\varPhi _0\) a ground substitution representing the initial attacker knowledge. \(\{A^{n_h, n_d,m}_0\}_{n_h, n_d,m\in \mathbb N}\) and \(\{B^{n_h, n_d,m}_0\}_{n_h, n_d,m\in \mathbb N}\) are two families of extended processes defined as follows

$$\begin{aligned} \begin{array}{lclr} A^{n_h, n_d,m}_0 &{} \mathop {=}\limits ^{\text {def}} (\mathcal {E}_0\cup \mathcal {C}r^h_{n_h}, \varPi ^{n_h, n_d, m}_1(\mathcal {C}r^h_{n_h}, \mathcal {C}r^d_{n_d}, \mathcal {K}_\mathsf {pv}, \mathcal {K}_\mathsf {pb}), \varPhi _0, 0) &{} \forall n_h, n_d,m\in \mathbb N \\ B^{n_h, n_d,m}_0 &{} \mathop {=}\limits ^{\text {def}} (\mathcal {E}_0\cup \mathcal {C}r^h_{n_h}, \varPi ^{n_h, n_d, m}_2(\mathcal {C}r^h_{n_h}, \mathcal {C}r^d_{n_d}, \mathcal {K}_\mathsf {pv}, \mathcal {K}_\mathsf {pb}), \varPhi _0, 0) &{} \forall n_h, n_d,m\in \mathbb N \end{array} \end{aligned}$$

Our reduction results apply to equivalences of the form \(A^{n_h, n_d, m}_0 \approx B^{n_h, n_d, m}_0\) for all \(m,n_h, n_d\). Vote privacy is typically modelled in this way [5]. The proofs of the results presented in this section could not be included due to lack of space, but are available in the technical report [26].

Our first result states that attacks on such equivalences require at most 3 voters.

Proposition 1

If \(A^{k_h, k_d,\ell }_0 \not \approx B^{k_h, k_d,\ell }_0\) then \(\ A^{2, k'_d,\ell }_0 \not \approx B^{2, k'_d,\ell }_0\) for \(k'_d=0\) or \(k'_d=1\).

Note that this case does not yet bound the number of ballots to be considered. In particular, when re-voting is allowed the attacker may a priori need to submit several ballots in order to distinguish the two processes. In other words, the ballot box is still parameterized by the number of ballots to be received. However, whenever we assume that \(\varPi _1\) and \(\varPi _2\) do not allow voters to revote, we can deduce immediately that 3 ballots suffice to capture any attack. More formally, we encode this situation by letting \(k = \ell \) and considering the re-vote policy that only keeps the first vote of each voter.

Theorem 1

If \(\{\mathsf {Policy}^{n,m} \}_{n,m\in \mathbb {N}}=\{\mathsf {Policy}_{\mathsf {first}}^{n,m} \}_{n,m\in \mathbb {N}}\) and \(A^{k_h,k_d,k}_0 \not \approx B^{k_h, k_d, k}_0\) where \(k=k_h+k_d\), then \(A^{2,k'_d, k'}_0 \not \approx B^{2, k'_d,k'}_0\) for \(k'_d=0\) or \(k'_d=1\) and \(k' = 2+k'_d\).

Intuitively, the case where \(k'_d =0\) corresponds to the case where an attacker can distinguish the processes playing only with two honest voters. This case for instance arises when analyzing a naive protocol where each voter simply signs his vote, hence offering no anonymity at all. The case where \(k'_d=1\) corresponds to the case where the attacker computes a vote which depends on the honest votes. The above results state that an attacker does not need more then one ballot in that case. An example of such an attack is the vote copy attack on Helios described in [13]. We could actually encode any attack with 2 voters into an attack with 3 voters by letting the adversary play like a useless, honest, voter. This would require however to formalize the fact that the attacker may always simulate an honest voter, that is, the voting process.

We now consider the case where re-voting is allowed. In this case we can bound the number of ballots that need to be considered to \(4+2k\) (for k number of voters in total).

Proposition 2

If \(A^{k_h, k_d,\ell }_0 \not \approx B^{k_h, k_d,\ell }_0\), then there exists \(\ell _{min} \le 4+2k\) such that \(A^{k_h, k_d,\ell _{min}}_0 \not \approx B^{k_h, k_d,\ell _{min}}_0\) where \(k=k_h+k_d\).

Combining the reductions on the number of voters and the number of ballots we obtain the following theorem.

Theorem 2

If \(A^{k_h, k_d,\ell }_0 \not \approx B^{k_h, k_d,\ell }_0\), then there exists \( k'_d\in \{0, 1\},\ \ell _{min} \le 4+2k\) such that \(A^{2, k'_d,\ell _{min}}_0 \not \approx B^{2, k'_d,\ell _{min}}_0\) where \(k=2+k'_d\).

This is an immediate consequence of Propositions 1 and 2 and yields a bound of 4+2\(\times \)3=10. When protocols have identifying ballots (Definition 7) we can tighten our reduction of the number of ballots: we only need to consider \(4+k\) ballots.

Corollary 1

If \(\varPi _1\) and \(\varPi _2\) have identifying ballots and \(A^{k_h, k_d,\ell }_0 \not \approx B^{k_h, k_d,\ell }_0\), then \(\exists \ell _{min} \le 4+k.\ A^{k_h, k_d,\ell _{min}}_0 \not \approx B^{k_h, k_d,\ell _{min}}_0\) where \(k=k_h+k_d\).

This is a corollary of the proof of Proposition 2. With identifiable ballots, we know that the ballots selected by the revoting policy on the left and on the right hand-side are the same. Again, we combine this result with the reduction on the number of voters.

Theorem 3

If \(\varPi _1\) and \(\varPi _2\) have identifying ballots and \(A^{k_h, k_d,\ell }_0 \not \approx B^{k_h, k_d,\ell }_0\) then \(\exists k'_d\in \{0, 1\},\ \ell _{min} \le 4+k\) such that \(A^{2, k'_d,\ell _{min}}_0 \not \approx B^{2, k'_d,\ell _{min}}_0\) where \(k=2+k'_d\).

This follows from Corollary 1 and Proposition 1 and yields a bound of 4+3=7 ballots.

5 Case Studies

We apply our results on several case studies: several versions of Helios [18, 19, 28] and Prêt-à-Voter [20], as well as the JCJ protocol [29] implemented in the Civitas system [30]. For some of these protocols we show that the ProVerif verification tool [1] can be used to perform a security proof that, thanks to our results, is valid for an arbitrary number of voters and ballots.

For the other protocols, ProVerif is not able to verify the protocols, either due to the fact that equational theories with AC symbols are not supported by ProVerif or simply because of a state explosion problem. In these cases we show that our results nevertheless apply. Given recent progress in automated verification for equivalence properties [9, 10, 31] we hope that verification of some of these protocols will be possible soon. Our results would also be useful to simplify proofs by hand.

The results in this section are summarized in Fig. 2. Our hypotheses were always satisfied wherever applicable. For several protocols, we could not conduct the analysis with ProVerif, either because the equational theory is out of reach of the tool or because we had to stop ProVerif execution after a couple of hours. The case studies are further detailed in the companion report [26]. The results in this section rely on ProVerif scripts available at http://3voters.gforge.inria.fr.

Fig. 2.
figure 2

Summary of application of our results on case studies. A \(\times \) in the “ProVerif” column indicates that we could not successfully run the analysis with ProVerif.

6 Conclusion

In this paper we propose reduction results for e-voting protocols that apply to vote privacy. We believe they also apply to stronger properties such as receipt-freeness. Our first reduction result states that whenever there is an attack, there is also an attack with only two honest voters and at most one dishonest voter. This considerably simplifies the proofs and encodings otherwise needed to verify such protocols using automated verification tools. We moreover consider the case where the protocol allows a voter to cast multiple votes and selects one vote according to a given re-vote policy, e.g. select the last vote casted. In that case verifying privacy is still complicated even when restricted to three voters. We therefore show a second reduction result that allows to consider at most 10 ballots. In case the protocol has identifiable ballots we reduce the number of necessary ballots to 7. We have shown that the hypotheses of our theorems are satisfied by many protocols: several variants of Helios, Prêt-à-Voter, as well as Civitas. For several of these protocols we were able to apply automated tool verification and provide the first automated proofs for an unbounded number of voters and ballots. For the decryption mixnets-based PaV protocol, we even provide the first proof of vote privacy.

An interesting direction for future work is to further tighten the bound on the number of ballots, possibly characterizing properties enjoyed by voting protocols. We also foresee to show similar reduction results for other properties of e-voting, such as verifiability. Given that the result is stated in a symbolic model, we also plan to investigate if the result can be transposed to a computational model.