Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Controlling and limiting energy consumption is crucial for datacenters, both, ecologically and economically. Minimizing energy consumption is key to achieving both, green IT and higher datacenter density [17]. To support the achievement of energy-consumption goals, software-based energy measurement features have been introduced to CPUs by various vendors, e.g., by Intel [21, Chap. 14.9].

While the potential benefits of software-based energy measurement are clear [17], its security implications are not yet well-understood. To clarify such implications is our goal. More concretely, we focus on side channels that attackers might establish using software-based energy measurement. In a side-channel attack, an attacker extracts secrets, like cryptographic keys, from execution characteristics of a program, like running time [4, 11, 22], cache behavior [8, 32, 52], or power consumption [23, 24, 37]. Prior work on power-consumption side channels required specialized hardware or required the device under attack to use a battery.

In this article, we investigate the danger of side channels introduced by software-based energy measurement. We also evaluate the effectiveness of two candidate countermeasures against such side channels. To make things concrete, we focus on Intel RAPL, an energy measurement feature in Intel CPUs [21].

We perform our experiments on an Intel i5-4590 desktop CPU. In our experiments, we measure the energy consumption of a victim program purely in software, using Intel RAPL. Based on our measurements, we evaluate qualitatively whether an attacker can learn secret information and then quantify this threat using statistical methods on a concrete decision procedure. Subsequently, we evaluate the effectiveness of countermeasures based on information theory.

Our main finding is that an attacker can distinguish between RSA secret keys purely by using software-based energy measurement. More concretely, the attacker can distinguish which secret key is used in the RSA implementation from the popular cryptographic library Bouncy Castle. We show that 7 observations suffice to guess the key correctly with a probability above 99%. This number of required observations is surprisingly low and the detected weakness in Bouncy Castle RSA is, hence, a serious concern. While it is clear that CPU features for increasing performance are common sources of side channels (see, e.g., caches [43] or branch prediction [1]), CPU features for controlling energy were not in the focus of research on side channels so far. Our results show that CPU features for controlling energy do introduce side channels and that these side channels are severe. That clarifies the security implications of green IT in this domain.

We investigate two candidate countermeasures against software-based energy side channels, namely the program transformations cross-copying [2] and conditional assignment [40]. We evaluate their effectiveness by the reduction in side-channel capacity that they achieve in our experiments. While cross-copying only reduces capacity by 8%, conditional assignment reduces capacity by 99%. Thus, conditional assignment could be a suitable basis for hardening security-critical implementations against software-based energy side channels.

In summary, our main contributions are (1) a qualitative and a quantitative analysis of software-based energy side channels at the example of Bouncy Castle RSA and Intel RAPL, and (2) a quantitative evaluation of the effectiveness of two candidate countermeasures against energy side channels.

2 Preliminaries

Side Channels. In 1996, Kocher showed that a naive square-and-multiply implementation of modular exponentiation is vulnerable to timing-side-channel attacks [22]. Modular exponentiation is, for example, used in RSA decryption to compute \(p = c^d \pmod n\), for ciphertext c and secret key d [45].

Fig. 1.
figure 1

Square&Multiply

A square-and-multiply implementation of modular exponentiation is given in Fig. 1. Line 5 is only executed when the condition in Line 4 evaluates to true. Execution of Line 5 takes additional time. Since the condition depends on bits from the exponent, the execution time of the program encodes the Hamming weight of the exponent. An attacker can exploit this variation in execution times to extract the secret exponent d [22].

In the style of Millen [39], a side channel can be modeled as an information-theoretic channel [15] with random variables X and Y as the input alphabet and output alphabet. The input alphabet are all secrets a program can process, and the output alphabet are all possible side-channel observations. The worst-case side-channel leakage can be measured by the channel capacity C(X; Y) [15].

Software-Based Energy Measurement. Energy E (measured in J for joule) is the aggregation of instantaneous power consumption values p(t) (measured in W for watt) over time, i.e., \(E = \int _{t_0}^{t_1} p(t) dt\) [19]. Similar to [41], we define the energy consumption of a program as the energy consumed by the CPU and main memory during program execution (e.g., for arithmetics and accesses to data).

Running Average Power Limit (Intel RAPL) is a set of energy sensors on CPUs introduced with Intel’s Sandy Bridge processor architecture [20]. While Intel RAPL ’s primary purpose is to enforce power consumption limits [21, Chap. 14], it also exposes the energy consumption of the CPU through the model-specific register (MSR) MSR_PKG_ENERGY_STATUS, which is updated every millisecond. The measurements provided are accurate [20]. Linux exposes Intel RAPL to userspace through the msr kernel module [31] and through the Power Capping framework (powercap) [30]. Both, msr and powercap, provide energy measurements in pseudo-files. The former can be accessed with root privileges, e.g., under /dev/cpu/0/msr for the first CPU. The latter can be accessed by non-privileged users under /sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj. From powercap, energy measurements in the unit \({\mu }J = 10^{-6}J\) can be obtained.

Distinguishing Experiments. In a distinguishing experiment, two distinct secret inputs are passed to a program and a side channel output is repeatedly measured for each input. For instance, Mantel and Starostin use distinguishing experiments to show that a program exhibits a timing-side-channel vulnerability [36].

Based on the empirical data from a distinguishing experiment, statistical tools can be used to quantify the side-channel leakage of the program under test. For a given attacker strategy, the success probability can be computed based on hypothesis testing. Independent of an attacker strategy, the side-channel capacity C(X; Y) of the program can be estimated with a statistical procedure (e.g., [12]).

A test of hypothesis is a tool to investigate conformance of a hypothesis \({H_0}\) with experimental data [48, p. 64]. We denote the alternative hypothesis by \({H_1}\). A test has two error cases: (a) the test wrongly accepts \({H_0}\) (a false positive), or (b) the test wrongly refutes \({H_0}\) (a false negative). The probabilities for a false positive and a false negative are denoted by \({P({H_0} | {H_1})}\) and \({P({H_1} | {H_0})}\).

The binomial distribution (or Bernoulli distribution) is the probability distribution for the number of successes in n independent experiments [48, p. 112]. The probability that in n experiments, each featuring success probability p, r successes are observed is \({P_{{n},{p}}} (r) = \left( {\begin{array}{c}r\\ n\end{array}}\right) p^r p^{n-r}\), where \(\left( {\begin{array}{c}r\\ n\end{array}}\right) = \frac{n!}{r!(n-r)!}\) is the binomial coefficient. We write \({P_{{n},{p}}} (r \le X)= \sum _{i = 0}^{X} {P_{{n},{p}}} (i)\) for the probability that at most X out of n experiments exhibit a success. Conversely, the probability that more than X out of n experiments exhibit a success is \({P_{{n},{p}}} (r > X) = 1 - {P_{{n},{p}}} (r \le X)\).

Chothia and Smirnov show in [13] how tests of hypothesis can be used to attack e-passports. Based on a simple selection criterion, their distinguishing attack tests the hypothesis that the passport under attack belongs to the victim. Using \({P({H_0} | {H_1})}\) and \({P({H_1} | {H_0})}\), they calculate the number of observations needed to distinguish passports with error rates below \(1\%\).

Program Transformations Against Side Channels. Multiple source-to-source program transformations were proposed for mitigating timing side channels, including cross-copying [2], conditional assignment [40], transactional branching [6], and unification [26]. The technique cross-copying pads branches by adding copies of the statements in one branch to the respective other branch. In the copies, dummy statements are used, which do not affect the program’s state, but require the same execution time as the respective original statements. The technique conditional assignment removes secret-dependent branching completely and replaces assignments from the respective branches by assignments that are masked by the branching condition. Both, cross copying and conditional assignment were evaluated analytically and experimentally [2, 36, 40]. For instance, they were effective against the timing side channel in an implementation of Fig. 1 [36].

RSA in Bouncy Castle. Bouncy Castle is a cryptographic library for Java [29]. A provider class allows the use of Bouncy Castle through the Java Cryptography Extension (JCE). In the form of Spongy Castle [50], Bouncy Castle is widely used on Android, e.g., in the WhatsApp messenger [32]. Side channels in Bouncy Castle are, hence, a serious security threat. Recently, it was shown that Bouncy Castle  1.5’s AES implementation is vulnerable to cache side-channel attacks [32].

Bouncy Castle contains implementations of various variants of the RSA asymmetric encryption scheme. The RSA encryption and decryption functionality is implemented in the Java class RSAEngine. RSAEngine can be used either directly or as backend in cipher modes, such as OAEP [7] and PKCS1 [46]. An RSA key can be generated using the class RSAKeyPairGenerator.

3 Our Approach

In a side-channel attack, an attacker collects sample execution characteristics of a victim program. Based on these samples, the attacker distinguishes between the candidate secrets (e.g., valid crypto keys). The core of many side-channel attacks is to distinguish between candidate secrets from a restricted set (e.g., varying only in one bit [22] or byte [3, 8]). For instance, AlFardan and Paterson [3] distinguish between two secret plaintexts based on the time that an implementation of TLS takes to decrypt them. Using distinguishing experiments [36], one can detect weaknesses in implementations that allow one to distinguish between secret inputs, e.g., as a basic step in a side-channel attack.

We define a general procedure for such experiments and use it to assess the implementation of RSA in Bouncy Castle with respect to two attacker models.

3.1 Procedure for Distinguishing Experiments

An implementation \(imp \) is assessed with respect to a particular security concern, namely the leakage of a secret input s to an attacker under an attacker model a. For instance, \(imp \) could be an RSA implementation and s could be the secret RSA key. The assessment consists of four steps, visualized in Fig. 2: input generation, sample collection, result computation, and result evaluation.

Fig. 2.
figure 2

Procedure for a distinguishing experiment

In the first step, input generation, two input vectors to the implementation \(imp \) are generated, such that all inputs are within the spectrum of valid input data. The input vectors differ only in the secret input s. For instance, to assess the leakage of a secret RSA key, two valid secret RSA keys are generated randomly.

In the sample collection step, the implementation \(imp \) is run on the two input vectors that were generated in the previous step. For both runs, the observation made under the attacker model a is recorded. This step is repeated multiple times to obtain a collection of observations for each input vector.

In the result computation step, the arithmetic means of the two collections of observations are computed. For each collection, the frequency with which each observation occurs in the collection is computed and visualized in a histogram.

The last step is the result evaluation. Based on the computed results, one can detect weaknesses in implementations (if the means are clearly distinguishable and the histograms have little overlap). In addition to such qualitative results, quantitative results can be obtained through a statistical test (see Sect. 5).

3.2 Attacker Models

The sample-collection step in a distinguishing experiment depends on the attacker model. We implement this phase for two attacker models that we call sequential and concurrent. In both models, the attacker can execute an attack procedure with standard capabilities on the machine running the victim program. On Linux, attackers under both models can access powercap ’s pseudo-files on file system /sys. The model sequential captures active attackers who can trigger runs of the victim program. The model concurrent captures passive attackers who observe existing runs of the victim program. On Linux, unprivileged attackers can access information about running processes through file system /proc.

Fig. 3.
figure 3

Measurement procedure under sequential

Fig. 4.
figure 4

Measurement procedure under concurrent

Implementation for Sequential. We implemented the measurement procedure for sequential in Python. Figure 3 shows corresponding pseudocode.

Firstly (Line 2), the attacker reads the energy-consumption counter through powercap by calling the function readCounter. Secondly, the attacker waits busily for the first change to the energy-consumption counter (Lines 3–5). Once the counter has been refreshed, the attacker invokes an execution of the victim program (Line 6) using the invocation command supplied as input to the attack procedure. After executing the victim program, the attacker queries the energy-consumption counter again (Line 7). The difference between the values of the counter before and after the victim’s execution is the attacker’s sample. If the sample is negative, that is, if there was a wraparound of the counter, the sample is discarded (Lines 8–9). Otherwise, the sample is returned (Lines 10–11).

Implementation for Concurrent. Since an attacker under concurrent cannot trigger the victim program himself, he needs to identify runs of the victim program on the system. We use Python to implement the measurement procedure under concurrent. Pseudocode for the procedure is shown in Fig. 4.

The attacker waits until the victim program is executed (Lines 2–17). He detects the invocation of a program by monitoring the /proc filesystem. He recognizes the victim program by the command that was used to invoke it (Line 11). Once the victim program is executed, the attacker measures the energy consumption as the difference in the energy-consumption counter (Lines 19–27).

4 Qualitative Results on Bouncy Castle RSA

We investigate the consequences of software-based energy measurement on software security at the example of Intel RAPL and Bouncy Castle RSA. Using a distinguishing experiment, we identify that running Bouncy Castle RSA on a system with Intel RAPL gives rise to a weakness. The energy consumption of the decryption operation allows to distinguish between secret RSA keys. In the following, we describe the setup and results of our experiment in detail.

4.1 Experimental Setup

Assessed Implementation. To assess the vulnerability of Bouncy Castle RSA, we implement a Java program, called RSA, that decrypts an RSA ciphertext using Bouncy Castle  1.53. It takes a secret key and a ciphertext as input. It decryptsthe ciphertext, using the secret key, and returns the resulting plaintext.

Fig. 5.
figure 5

RSA decryption

Figure 5 lists the pseudo-code of the program. Line 4 decrypts ciphertext ct using secret key (d,n). processBlock is a method from Bouncy Castle’s RSAEngine class, which implements the RSA decryption.

Machine Configuration. We conduct our experiments on a Lenovo ThinkCentre M93p featuring one RAPL-capable Intel i5-4590 CPU @ 3.30GHz with 4GB of RAM. The machine runs Ubuntu 14.10 with a Linux kernel version 3.16.0-44-generic from Ubuntu’s repository. The programs are executed using an OpenJDK 7 64-bit server Java Virtual Machine version 7u79-2.5.5-0ubuntu0.14.10.2 from Ubuntu’s repository. To simulate a server machine that is shared between attacker and victim, we disable the X-server.

Parameters and Sampling. We generate two RSA keys \({k1}\) and \({k2}\) to supply as input to our RSA decryption program during our distinguishing experiment. First, we randomly select two 1536 bit primes p and q to calculate the 3072 bit modulus \(n = p * q\) shared by our keys. To select private exponents for the two keys \({k1}\) and \({k2}\), we exploit that \(d * e \equiv 1 \pmod {(p - 1) * (q - 1)}\) must hold for valid RSA keys [45]. For \({k1}\), we randomly generate a public exponent \(e_{k1} \) and calculate the corresponding private exponent \(d_{k1} \). For \({k2}\), we fix the public exponent to \(e_{k2} = {65537}\) and calculate the corresponding private exponent \(d_{k2} \). The secret exponents that we obtain for \({k1}\) and \({k2}\) have Hamming weight 1460 and 1514, respectively. In addition to the keys, we randomly select a ciphertext \(c < n\) to decrypt with both keys.

In our distinguishing experiments, we utilize our measurement procedures to collect 100000 samples per secret key under the attacker models sequential and concurrent. For the attacker model concurrent, under which an attacker cannot trigger executions of the victim program himself, we invoke the victim program after random delays between 100 ms and 1000 ms.

We reject outliers that lie further than six median absolute deviations from the median. For \({k1}\), we reject 1.24% of the samples under sequential, and 10.78% of the samples under concurrent. For \({k2}\), we reject 1.11% of the samples under sequential, and 11.01% of the samples under concurrent. We plot the collected samples for each key and attacker model as histograms.

4.2 Results for Sequential

The samples collected in our distinguishing experiment under sequential are depicted in Fig. 6. One histogram of energy-consumption samples is given per input. The histograms are colored based on the input: The blue (left) histogram corresponds to the samples for \({k1}\) with Hamming weight 1460, and the red (right) histogram corresponds to the samples for \({k2}\) with Hamming weight 1514.

Fig. 6.
figure 6

Results for sequential

Fig. 7.
figure 7

Results for concurrent

The estimated mean energy consumption for \({k1}\) is 5.07J, and for \({k2}\) the estimated mean energy consumption is 5.14J. The peaks of the histograms and the mean energy consumptions for the inputs are clearly distinct.

Based on the histograms, an attacker under the model sequential can distinguish between the two secret RSA keys. Hence, there is a weakness in Bouncy Castle RSA in the presence of the Intel RAPL feature.

4.3 Results for Concurrent

Figure 7 shows the histograms of the samples per key under concurrent. Again, the blue (left) histogram corresponds to \({k1}\) (Hamming weight 1460) and the red (right) histogram corresponds to \({k2}\) (Hamming weight 1514).

The mean energy consumptions are 7.20J and 7.32J for the keys with Hamming weights 1460 and 1514, respectively. The peaks of the two histograms are clearly distinct. Interestingly, the overlap of the histograms is even a bit smaller compared to the overlap of the histograms under sequential. We will get back to this peculiarity in Sect. 5.

The mean energy consumptions and the histograms for the two RSA keys are clearly distinct. This means that the weakness we detected in Bouncy Castle RSA is even exposed to the weaker attacker model concurrent, under which an attacker only passively observes an RSA decryption.

Remark 1

Note that, the energy consumption measured under concurrent increased significantly by 2.13J and 2.18J, respectively, compared the observations under sequential. This increase is due to the attacker actively monitoring the /proc filesystem to identify termination of the RSA process.

Overall, we identify a weakness in Bouncy Castle RSA that is exposed to attackers under, both, sequential and concurrent. For both attacker models, the mean energy consumption of the decryption differs significantly across the two RSA keys. Based on the histograms from our distinguishing experiments, an attacker is able to clearly distinguish between the two secret keys if he collects enough samples. In the following section, we quantify exactly how many samples an attacker needs in order to be successful.

5 Quantification of the Weakness

The results of our distinguishing experiments show that it is intuitively possible that an attacker can distinguish RSA keys by exploiting a weakness in Bouncy Castle RSA via Intel RAPL. We further investigate the likelihood of an attacker to distinguish keys. To this end, we devise a test procedure that allows an attacker to guess which of the two RSA key is used during decryption. Based on the false positive and false negative rates of the test procedure, we compute how many measurements an attacker requires to correctly guess the key in 99% of all cases.

5.1 A Distinguishing Test

Side-channel attacks, e.g., [8, 13], can be mounted in two phases. In the first phase, the attacker collects a set of offline observations through the side channel as reference point, possibly on a different machine with the same software and hardware setup as the machine he shares with the victim. During the second phase, the attacker collects a set of online observations on the machine he shares with the victim. By relating his online side-channel observations with the offline observations, the attacker deduces information about the secret being processed.

For our distinguishing experiment setting, the offline observations are the collected energy-consumption characteristics of the RSA decryption operation for both, \({k1}\) and \({k2}\). The online observations would be side-channel observations collected to identify which key is used during a system run. To guess which key the system is using, the attacker compares how likely the learned energy-consumption characteristics allow him to explain the online observations. We model the guess by a statistical test to distinguish between the keys.

Fig. 8.
figure 8

Example of a distinguishing test

The attacker’s distinguishing test works as follows: Given two keys, \({k1}\) and \({k2}\), with mean energy consumptions of \(m_{k1} \) and \(m_{k2} \), where \(m_{k1} < m_{k2} \), the attacker determines a distinguishing point \(dp = \frac{m_{k1} + m_{k2} }{2}\). If the attacker observes an energy consumption less than \(dp\), he guesses \({k1}\). Otherwise, he guesses \({k2}\). A false positive is: \({k2}\) was used but the attacker guesses \({k1}\). A false negative is: \({k1}\) was used but the attacker guesses \({k2}\).

A visualization of an example for the test is given in Fig. 8. In the example, the distributions of energy consumptions for \({k1}\) and \({k2}\) follow the normal distributions \(\mathcal {N}(4.5 J, 0.81)\) and \(\mathcal {N}(5.5 J, 0.49)\). Thus, the decision point is at 5J. The area under the curve \({k2}\) to the left of \(dp\) corresponds to the false positive probability \({P({k1} | {k2})} = 23.75\%\). Conversely, the area under the curve \({k1}\) to the right of \(dp\) corresponds to the false negative probability \({P({k2} | {k1})} = 28.93\%\).

The attacker can use majority voting to increase his chances of guessing the correct key. For this, he observes multiple decryption operations and uses his test on each observation. Based on the individual guesses, he chooses the key on which the majority of guesses agreed. Let n be the number of observations the attacker makes. Then the false positive probability is \(p_{{P({k1} | {k2})}}^{n} = {P_{{n},{{P({k1} | {k2})}}}} (r > \lfloor \frac{n}{2}\rfloor )\) and the false negative probability is \(p_{{P({k2} | {k1})}}^{n} = {P_{{n},{{P({k2} | {k1})}}}} (r > \lfloor \frac{n}{2}\rfloor )\). Based on \({P({k1} | {k2})}\) and \({P({k2} | {k1})}\), one can determine the number n of observations needed for the attacker to distinguish \({k1}\) and \({k2}\) with 99% success rate, i.e., with \(p_{{P({k1} | {k2})}}^{n} < 1\%\) and \(p_{{P({k2} | {k1})}}^{n} < 1\%\). In the example from Fig. 8, \({P({k1} | {k2})} = 23.75\%\), so that 17 observations lead to a false positive rate \(p_{{P({k1} | {k2})}}^{17} = 0.87\% < 1\%\). Conversely, \({P({k2} | {k1})} = 28.93\%\), so that 29 observations lead to a false negative rate below 1%, namely \(p_{{P({k2} | {k1})}}^{29} = 0.81\%\). We conclude that the attacker requires 29 observations to distinguish \({k1}\) and \({k2}\) successfully in 99% of all cases.

5.2 Quantitative Results

For a quantitative evaluation of the weakness in Bouncy Castle RSA, we need to know the false positive and false negative probabilities of the distinguishing test. We estimate the probabilities based on the energy consumption characteristics collected offline by the attacker on his reference system. To estimate \({P({k1} | {k2})}\), we count the number of offline observations below \(dp\) of decryption samples with \({k2}\) and divide them by the total number of offline observations for \({k2}\). Conversely, to estimate the false negative probability we count the number of offline observations above \(dp\) of decryption samples of \({k1}\) and divide them by the total number of offline observations for \({k1}\). Formally, the probabilities can be estimated as follows. Let \(O_{k1} \) be the set of all offline observations for decryption operations with \({k1}\) and let \(O_{k2} \) be the set of all offline observations for \({k2}\).

$$\begin{aligned} {P({k1} | {k2})} = \frac{|\{ x | x \in O_{k2} \wedge x < dp \}|}{|O_{k2} |} \qquad&{P({k2} | {k1})} = \frac{|\{ x | x \in O_{k1} \wedge x \ge dp \}|}{|O_{k1} |} \end{aligned}$$
Fig. 9.
figure 9

False-positive and false-negative rates for attackers

We evaluate the weakness for the attacker models sequential and concurrent, using our distinguishing test. For sequential, the distinguishing point is at \(dp = {5.10}J\), due to the means for \({k1}\) and \({k2}\) being 5.07J and 5.14J, respectively (see Sect. 4.2). For concurrent, the distinguishing point is at \(dp = {7.26}J\), due to the means for \({k1}\) and \({k2}\) being 7.20J and 7.32J, respectively (see Sect. 4.3).

The table in Fig. 9 lists the false positive and false negative probabilities \(p_{{P({k1} | {k2})}}^{n} \) and \(p_{{P({k1} | {k2})}}^{n} \) that result from n online observations for a given n under the two attacker models, respectively. Note that, the following equations hold: \({P({k1} | {k2})} \,=\,p_{{P({k1} | {k2})}}^{1} \) and \({P({k2} | {k1})} = p_{{P({k2} | {k1})}}^{1} \). In addition to \(p_{{P({k1} | {k2})}}^{1} \) and \(p_{{P({k2} | {k1})}}^{1} \), we only list the cases in which one of the probabilities falls below 1% for the first time. We highlight the first value below 1% for each of the probabilities by printing it in bold face.

The false positives for 1 observation range from 13.69% for concurrent to 24.75% for sequential. The false negatives for 1 observation range from 13.39% for concurrent to 19.77% for sequential. For 7 online observations, the false positive and false negative probabilities fall below 1% for concurrent. For sequential, the false negative probability falls below 1% at 13 observations and the false positive probability falls below 1% at 19 observations.

The distinguishing tests show that, in the worst case, only 19 observations are required to distinguish key \({k1}\) from key \({k2}\) in 99% of all cases. In this case of 19 observations, concurrent ’s test exhibits false negative and false positive probabilities below 0.01% each. This means that, given only 19 decryption observations, concurrent can distinguish both keys in 99.99% of all cases. Moreover, to distinguish both keys in 99% of all cases, concurrent requires only 7 observations. The finding that concurrent, our weakest attacker model, can distinguish both keys with high likelihood at 7 observations and, even worse, with near certainty at 19 observations, gives us reason to classify the weakness we discovered as severe.

Remark 2

A comparison across the two attacker models yields the surprising result that concurrent requires fewer observations than sequential to distinguish both keys in 99% of the cases. The 7 observations required by an attacker under concurrent are less than half of the 19 observations required by an attacker under sequential. Intuitively, an attacker under sequential should be able to distinguish the keys more easily than an attacker under concurrent, due to sequential ’s ability to trigger victim executions and, hence, to measure more precisely.

After investigating the histograms from Sect. 4 again, our explanation is as follows. For both attacker models, sequential and concurrent, the overlap between both histograms seems to be roughly 0.25J wide. The estimated means differ by 0.07J, and 0.12J, respectively. While the width of the overlap remains similar with decreasing attacker capabilities, the means move further apart, decreasing the likelihood to observe an energy consumption value that lies in the overlap. Hence, the likelihood of an error in the distinguishing test decreases from sequential to concurrent, which is also shown by our quantitative results.

6 A Security Evaluation of Candidate Countermeasures

As we have shown in the previous sections, software-based energy side channels are a serious threat. Restricting access to software-based energy measurement features like Intel RAPL would seriously limit green IT. In contrast, software-level countermeasures would provide more flexibility, allowing energy measurement while mitigating information leakage through energy side channels.

We investigate two candidate software-level countermeasures, namely cross-copying [2] and conditional assignment [40]. Both are countermeasures against timing side channels, which ensure that equal or equivalent statements are executed across every pair of secret-dependent branches, independently of the guard. Intuitively, equal or equivalent statements should consume equivalent amounts of energy. Thus, we consider both techniques promising candidates for mitigating software-based energy side channels. In the following, we evaluate their effectiveness, using experiments and information theory.

6.1 Case Study

To investigate whether cross-copying or conditional assignment can help to mitigate leakage through software-based energy side channels, we quantify their effectiveness on a benchmark program. Motivated by the weakness that we detected in the Bouncy Castle RSA implementation, we use a benchmark that is relevant for RSA. More concretely, we focus on an implementation of square-and-multiply modular exponentiation (Fig. 1).

Fig. 10.
figure 10

Cross-copied version

Fig. 11.
figure 11

Conditional-assignment version

We first check that software-based energy-side-channel leakage is a concern for this benchmark implementation. To this end, we approximate the channel capacity for the implementation. In the next step, we check whether the candidate countermeasures mitigate this threat. To this end, we approximate the channel capacity of a cross-copied version of the implementation and of a conditional-assignment version of the implementation. We evaluate the effectiveness of each countermeasure by the reduction in channel capacity that it causes.

The cross-copied implementation, shown in Fig. 10, contains a dummy assignment (Line 6) in the else-branch that is equivalent to the assignment in the then-branch. The conditional-assignment version replaces the branching by assignments masked by the branching condition (Fig. 11, Line 4 and 5).

6.2 Experimental Setup

For brevity, we call the unmitigated square-and-multiply implementation Baseline, the cross-copied implementation CC, and the conditional-assignment version CA. For experimental evaluation, we use [36]’s Java implementation of Baseline, CC, and CA. We adapt the implementations to log the energy consumption measured through powercap. We disable the network and all but the first CPU core to reduce noise in the measurements. We disable the just-in-time (JIT) compiler of the Java VM to prevent optimizations from interfering with our results. To avoid zero energy consumption results due to execution times below 1 ms, we repeat the computation \(1.31 \times 10^{5}\) times. This results in approximately 100 updates of the energy-consumption counter for a single execution of Baseline. We estimate the channel capacity using an iterative Blahut-Arimoto algorithm [5, 10] based on the samples collected during a distinguishing experiment.

For the distinguishing experiment, we use two input vectors that share \(n = {4096}\) and \(c = {1234567890}\). One secret exponent with Hamming weight 5 (\(d = {2080374784}\)) and one secret exponent with Hamming weight 25 (\(d = {33554431}\)) are used as the first and second value of the secret input, respectively. We follow [36] and collect 10000 samples per input. We reject outliers that lie further than six median absolute deviations from the median. This results in a rejection between 1.07% and 2.73% of all samples for each implementation and each input.

6.3 Experimental Results and Interpretation

The table in Fig. 12 shows the results of our experiments. The mean energy consumptions and channel capacities are given with 95% confidence intervals.

The mean energy consumption for the first input to Baseline is roughly 15373.73nJ. The mean energy consumption for the second input to Baseline is roughly 18934.13nJ. These means are clearly distinguishable. Hence, there is a clear security concern already in the benchmark.

To quantify the severity of the security threat, we determine the channel capacity. Since we consider a scenario in which the attacker tries to distinguish between two inputs, the secret is 1 bit, namely the choice of the input. For Baseline, C(X; Y) is 0.9922 bits/symbol. That is, one attacker observation reveals almost the entire secret under the worst-case prior distribution of inputs.

Next, we investigate the results for CC. Here, the mean energy consumptions for the two inputs are roughly 20372.21nJ and 21040.05nJ, respectively. The channel capacity is approximately 0.9171 bits/symbol.

Intuitively, the mean energy consumptions of CC are still clearly distinguishable. The quantification of the security concern by the channel capacity of CC confirms that the concern is still substantial. CC can still leak 91% of the secret under the worst-case prior input distribution. This shows that [36]’s cross-copying implementation does not mitigate the energy side channel significantly.

We can only speculate why cross-copying is not effective against the energy side channel in our experiments. The difference of data dependencies introduced by the branches might be responsible. In the else branch (Fig. 10, Line 6), the result is written to \(r_{dummy}\) instead of r. This might cause a subtle difference in energy consumption, for example, due to different patterns of pipeline stalling.

Next, we investigate the results for CA. The mean energy consumptions of CA for the two inputs are roughly 32670.41nJ and 32630.73nJ, respectively. The channel capacity is approximately 0.0075 bits/symbol.

The mean energy consumptions for the two inputs to CA are almost identical and, hence, not easy to distinguish. The channel capacity is reduced almost to zero. That is, in our example, conditional assignment effectively reduces the security concern by 99%, almost eliminating the software-based energy side channel.

Fig. 12.
figure 12

Statistical results for modular exponentiation

The successful reduction of channel capacity from Baseline to CA gives us hope that an effective countermeasure against software-based energy side channels can be designed. In particular, conditional assignment is a promising starting point in the design of such countermeasures.

7 Related Work

7.1 Power-Consumption Side Channels

Power-consumption side channels are exploited, e.g., by the techniques Simple Power Analysis (SPA) and Differential Power Analysis (DPA). These techniques were introduced by Kocher, Jaffe, and Jun in attacks on smartcards implementing the DES cryptosystem [23]. In both techniques, traces of the power consumption of a circuit are measured and analyzed. SPA is a direct interpretation of power traces and can yield information about a device’s secret key during crypto computations [23, 24]. DPA is a statistical method to identify correlations between data processed and power consumption [23, 24]. Variations of power analysis have been used in attacks on implementations of cryptography, e.g., of DES [23, 28, 47], of RSA [23, 24, 37, 42], and of AES [24, 34, 44]. All these attacks obtain traces of power consumption from measurements with dedicated hardware.

Recently, power-consumption side channels were exploited without dedicated hardware on mobile devices using batteries [38, 51]. We briefly give an overview on Michalevsky et al.’s work on tracking Android devices through power analysis [38]. They measure the power consumption of a device using its battery monitoring unit. By their measurements, they can, e.g., track users in real-time.

Our work on software-based energy side channels differs from the previously described work on power analysis in the two following aspects.

(a) We investigate a fundamentally weaker attacker model. Our attacker is only able to measure the energy consumption, which is the aggregate of instantaneous power consumption. As a result, the observations required for an attack through software-based energy side channels are more coarse-grained.

(b) On the technical side, we use software-based measurement techniques available on machines without battery, e.g., on desktop and server machines. Software-based techniques allow an attacker to conduct his attack without dedicated hardware and without physical access to the device under attack. Thus, the observations required to exploit software-based energy side channels are easier to obtain than power traces and might be obtainable remotely in the cloud.

Overall, we think that software-based energy side channels are an interesting target for future security research because they use more coarse-grained observations that are easier to obtain.

7.2 Quantitative Side-Channel Analysis

Side channels have been the focus of many research projects since their first appearance in Kocher’s work in 1996 [22]. A multitude of work focuses on exploiting side channels, e.g., [3, 4, 8, 11, 22, 32, 52]. In addition, analysis of side channels using information-theoretic methods has become an area of focus. Köpf and Basin propose a model to analyze adaptive side-channel attacks using information theory [25]. More concretely, they quantify the attacker’s uncertainty about a secret based on the number of side-channel measurements the attacker obtained. CacheAudit [18] by Doychev, Köpf, Mauborgne, and Reineke is a tool employing program analysis and information theory to give upper bounds on information leakage through cache side channels in x86 binaries. Other work on analysis of side channels using information theory includes [9, 27, 33, 35, 49].

The mentioned works are foremost of analytic nature. On the empirical analysis of side channels, we are aware of only few works, e.g., [14, 16, 36]. Mantel and Starostin evaluate the practical effectiveness of program transformations to mitigate timing side channels [36]. For their evaluation, they consider the capacity of the timing side channel in a program. They introduce the idea of distinguishing experiments to obtain experimental results on the side-channel capacity.

We apply [36]’s concept of distinguishing experiments to show software-based energy side channels exist. Following [36]’s approach, we use channel capacity to evaluate the effectiveness of side-channel countermeasures. In summary, we build on [36]’s techniques, but apply them to a novel type of side channel.

Our distinguishing test to quantitatively evaluate the weakness in Bouncy Castle RSA is a variant of [13]’s test to distinguish e-passports. Distinguishing e-passports is done through sending a random message and a replayed message to a passport to obtain the difference in response times. Using a normal distribution as a model of response times and a manually selected distinguishing point, Chothia and Smirnov calculate the number of observations needed to distinguish passports in 98% of all cases. We transfer the test to our setting. Unlike Chothia and Smirnov, we estimate error probabilities based on offline samples alone, because our observations do not follow a normal distribution.

Like the distinguishing attack in [3] and the distinguishing experiments in [16, 36] we focus on distinguishing between two secrets in our qualitative and quantitative evaluation. We take care to use two representative secrets by following standard random key generation procedures (OpenSSL’s default public exponent, criteria in [45]). A notable work that distinguishes between more than two secrets is [13], which considers ten different e-passports.

8 Conclusion

Software-based energy measurement features facilitate the optimization of energy consumption, which is crucial in datacenters. We showed, at the example of Intel RAPL and Bouncy Castle RSA, that these important features also introduce a security issue. Based on only 7 energy samples measured with Intel RAPL, an attacker can distinguish between two RSA secret keys with 99% success probability. Overall, our results show that software-based energy side channels are a serious security concern.

To protect against the security issues without excluding a large fraction of programs from the optimization of energy consumption, fine-grained countermeasures are needed. We have identified conditional assignment as a promising starting point for designing such countermeasures. In our quantitative experimental evaluation of candidate countermeasures, conditional assignment was effective in the protection of our benchmark program.

Interesting directions for future work will be to derive key-recovery attacks against Bouncy Castle RSA from our results and to investigate the effect of just-in-time compilation. We hope that our approach using distinguishing experiments will also be helpful for the timely detection of side-channel vulnerabilities in other security-critical implementations.