1 Introduction

Proximity services are a special type of location-based service (LBS) where the user is informed about nearby people of interest and their distance rather than their exact location in an attempt to protect location privacy. To that end, queries are sent to the proximity service including the location of the user, the search radius and possibly some information about the target. Unfortunately, when the exact distance to a user is revealed by the proximity service it is possible to retrieve the exact location of the user. Examples include claims that Egyptian authorities leveraged dating apps to track down gay users [6], and others attempted to find locations of Tinder users [19]. Consequently, the need to provide rigorous privacy guarantees in proximity services is evident.

In view of these threats, some effort has been devoted to the development of privacy-preserving proximity testing protocols [8, 12, 15]. These solutions allow two users to learn whether they are within a certain distance of each other but no further information about their location or distance is revealed to the other user. Some of these protocols rely on a trusted third party to handle users’ locations while others get rid of this third party and operate in a decentralized way.

In general, these solutions focus on partially static models, where attackers can change their location freely but victims do not. Capturing the behavior of an optimal attacker in this setting when the victim is static is already hard although some progress has been made in recent years [9, 13]. However, this situation is quite unrealistic and motivated us to investigate the expected effort for adversaries to localize users that move in a particular mobility pattern.

Although there has been extensive study in mobility models [2], attack strategies [11, 13, 19], location privacy protection mechanisms [8, 9, 18] and location privacy quantification metrics [16, 17], there is very limited research on location prediction based on sequential spatio-temporal data using a probabilistic approach. Given that location data acquired from an LBS platform are sequential spatio-temporal data, one can obtain information on the moving patterns/behaviors of the user by analyzing the trajectory dataset. Hence, we are interested in quantifying the effort of an arbitrary attacker issuing proximity queries in finding a user under certain models. In other words: how quickly can an attacker locate a user based on queries to the LBS?

Our main contributions are as follows:

  1. 1.

    Given any attacker strategy, assuming the victim follows a random walk, we establish a formula for calculating the probability of the attacker finding the location of the victim after any number of queries.

  2. 2.

    We give upper and lower bounds on the minimum number of queries an attacker needs to issue to locate the victim with probability \(\frac{1}{2}\) (generalizable to other probabilities). In particular, for a search space of size M and assumptions on victim’s initial location and mobility, we show that an optimal attacker needs at most \(\frac{M}{2}\) queries to locate a victim with probability \(\frac{1}{2}\).

  3. 3.

    We implement the linear jump strategy from the proof, and show empirically that its effort falls within the theoretical bounds. We find that for non-uniform initial distributions of victims, the strategy performs worse. To address this, we propose a greedy updating attacker strategy.

  4. 4.

    We then use the strategies to estimate attacker effort for more general settings (e.g., limited knowledge known by the attacker, other mobility models).

  5. 5.

    We derive a transition matrix representing real-world mobility patterns from a large dataset [21, 22], and show how that mobility pattern influences attacker effort compared to previous results.

The rest of this work is organized as follows. In Sect. 2, we present our mathematical modeling of search spaces and mobility patterns. Section 3 summarizes the problem statement. We present the mathematical analysis for calculating the probability of an attacker locating a victim using location proximity queries in Sect. 4. In Sect. 5, we present our applied linear jumping and greedy attacker strategies, and evaluate them empirically in Sect. 6. Related work and conclusions are discussed in Sects. 7 and 8, respectively.

2 Preliminaries

2.1 Search Domain

In our model, users are able to move in a finite space that can be divided into discrete locations. The granularity of the locations is limited by the maximum precision of the positioning device or privacy considerations [5].

Space. We consider two cases. In the one-dimensional case, the search space is divided into n locations, each of which has two adjacent locations, left and right, except for the corners. We call this space \(\mathcal {S}^n\). For the two-dimensional case, the search space can be divided into \(m\times n\) locations. Each point typically has four adjacent locations, except for the corners. We call this space \(\mathcal {S}^{M}\), with \(M=mn\).

Time. We assume discrete time steps \(k\in \{0,1,2,\dots \}\). In each time step, the user \(\mathcal {X}\) can move once. The movement of users is represented as a transition in our model. Let \(\mathcal {X}_k\) denote the position of user \(\mathcal {X}\) at time k.

Location. A user \(\mathcal {X}\) located in the space \(\mathcal {S}^M\) at time k is denoted \(\mathcal {X}_k\), with \(k\in {\mathbb Z}_{\ge 0}\) and \(\mathcal {X}_k \in \{0, 1,\ldots , M-1\}\). The location of \(\mathcal {X}\) at time 0 is called the initial location of \(\mathcal {X}\). The set of possible values for \(\mathcal {X}_{k+1}\) will be determined by the particular mobility pattern of the user.

2.2 Mobility Models

We now describe some common mobility patterns (see Fig. 1) to give a better intuition of our modeling of search spaces and descriptions of entities moving in the search space. We assume an honest user like Bob follows a realistic mobility pattern where subsequent locations are contiguous to each other. We will use mobility patterns and mobility models interchangeably throughout this paper. Two-dimensional search spaces can be projected to a one-dimensional space.

Fig. 1.
figure 1

Mobility models

Static Mobility Model. An entity \(\mathcal {X}\) that follows a static mobility model starts at a random initial position r in the search space \(\mathcal {S}^M\) and does not move afterwards. For example, a person who stays at home or in the office. With a fixed \(r\in \mathcal {S}^M\):

$$ \mathcal {X}_{k+1} = \mathcal {X}_{k}, \ \mathcal {X}_0 = r,\ k\in {\mathbb Z}_{\ge 0} $$

Linear Mobility Model. An entity \(\mathcal {X}\) that follows a linear mobility model starts at an arbitrary initial position \(\mathcal {X}_0\) within the search space \(\mathcal {S}^M\) \((M=mn\) for dimension two and \(M=n\) for dimension one) and keeps moving such that the following conditions are satisfied:

  1. 1.

    \(\{\mathcal {X}_{iM}, \mathcal {X}_{iM+1},\dots , \mathcal {X}_{iM+M-1}\}=\{0,1,\dots , M-1\}\) for all \(i\in {\mathbb Z}_{\ge 0}\);

  2. 2.

    \(\mathcal {X}_{k+1}\in \{\mathcal {X}_k+1,\mathcal {X}_k-1\}\) for a one dimensional search space of size n \((k\in {\mathbb Z}_{\ge 0})\).

This model may apply when a person is driving a car on the road.

Random Walk Mobility Model. An entity \(\mathcal {X}\) that follows a random walk mobility model starts at a random position and decides its next move uniformly at random from all the positions in its vicinity. Hence the following mathematical expressions hold for a one-dimensional search space of size n (similar expressions can be derived for the two-dimensional case).

$$\begin{aligned}&{\mathrm{Pr}\left( \mathcal {X}_{k+1}=\mathcal {X}_k+1\right) }={\mathrm{Pr}\left( \mathcal {X}_{k+1}=\mathcal {X}_k-1\right) }=\frac{1}{2} \text {, if } 1\le \mathcal {X}_k\le n-2; \nonumber \\&{\mathrm{Pr}\left( \mathcal {X}_{k+1}=1\right) }=1 \text {, if }\mathcal {X}_k=0; \\&{\mathrm{Pr}\left( \mathcal {X}_{k+1}=n-2\right) }=1 \text {, if }\mathcal {X}_k=n-1, \end{aligned}$$

where \({\mathrm{Pr}\left( \cdot \right) }\) denotes probability hereinafter. Later, our theoretical probabilistic derivation will start with a random walk mobility model. A person sightseeing or shopping could fit this model.

Random Jump. An entity \(\mathcal {X}\) that follows this strategy can arbitrarily move to any other position. In other words, the next position is sampled freshly from a uniform distribution. An attacker, for example, who can fake his/her location as a means to perform attacks, such as trilateration, could be described using the random jump model.

3 Location Proximity Attacks

Based on the modeling of search spaces and possible mobility patterns from Sect. 2, we proceed to give a formal mathematical description of the problem statement for this paper. Afterwards, we will derive general analytical bounds on the expected location effort in dimension one.

3.1 System and Attacker Model

Let Alice (\(\mathcal {A}\)) be the attacker and Bob (\(\mathcal {B}\)) be a user whose location is of interest to Alice. Bob uses a LBS that will disclose Bob’s presence at location \(\mathcal {B}_k\) to other users that claim to be at the same location. Bob (and Alice) can only make one location claim per discrete time step k. Each time k, Bob will move once (and update its location on the LBS), and Alice will thus be able to perform one query to the LBS to verify if Bob is at Alice’s claimed location \(\mathcal {A}_k\). Alice sends the first query at time \(k=0\), and thus k is the time of the \((k+1)\)-th query.

The goal of Alice is to minimize the number of queries that she needs to send to LBS to be able to verify Bob’s location. Conversely, Bob does not have a particular goal except to use the service privately. He is not even aware of being tracked. Alice on the other hand is assumed to have a priori information about Bob’s probability distribution obtained from past observations, external sources, geographic features of the terrain or a combination thereof. Later in Sect. 6.2, we will discuss a real data example, where the attacker can obtain some information from historical trajectory data of a victim.

While we use a third party LBS in this system model for simplicity, similar scenarios could be constructed if Alice and Bob engage in a privacy-preserving proximity protocol that is initiated by Alice and where the inputs of both of them remain private (e.g., the protocol discussed in [8]).

3.2 Problem Statement and Formalization

We now present our problem statement and the underlying formalization.

Problem Statement. We are interested in finding \(k_{O,p}\): the number of queries required by an optimal attacker strategy to locate Bob with probability of at least p. Formally, we define \(k_{O,p}\) as follows:

$$\begin{aligned} k_{O,p}:=\min _\mathcal {A}k_{\mathcal {A},p}, \end{aligned}$$
(1)

with \(k_{\mathcal {A},p}\) being the number of steps required by a specific attacker strategy \(\mathcal {A}\) to locate Bob with probability \(0\le p\le 1\). We can find that number as follows:

$$ k_{\mathcal {A},p}:=\min \{k:{\mathrm{Pr}\left( E_k\right) }\ge p\}, $$

with \({\mathrm{Pr}\left( E_k\right) }\) being the cumulative probability that Bob was located within k steps, that is, \(k+1\) queries.

Attacker Strategies. For a fixed attacker strategy \(\mathcal {A}=\mathcal {A}_0,\mathcal {A}_1,\dots \), we are interested in the probability of two events: \(E_k\), and \(F_j\). The event \(E_k\) is the event that Alice locates Bob within k steps:

$$ E_k:=\{\exists i\le k\text { s.t. }\mathcal {A}_i=\mathcal {B}_i\} $$

\(F_j\) is the event that Alice locates Bob exactly at step j:

$$ F_j:=\{\mathcal {A}_j=\mathcal {B}_j\}. $$

Before we can derive the probabilities of those events (\({\mathrm{Pr}\left( E_k\right) }\) and \({\mathrm{Pr}\left( F_j\right) }\)), we now show how to compute the probabilistic locations of Bob.

Probabilistic Locations. Consider a search space \(\mathcal {S}^M\). We assume the mobility model of Bob can be described by a transition matrix P where each entry of P is a transition probability \(p_{ij}\) representing the probability of Bob moving from location i to location j in one step. Furthermore, we assume the probability of Bob moving from location i to location j is the same at any step. More precisely:

$$ {\mathrm{Pr}\left( \mathcal {B}_{k}=j|\mathcal {B}_{k-1}=i\right) }=p_{ij}, \ \forall k\in {\mathbb Z}_{\ge 1} \text { and }i,j\in \mathcal {S}^M. $$

Thus, it is straightforward to calculate the probability of Bob being at a particular location after k steps by simply taking the kth power of the transition matrix.

Let \(B^{(k)}\) \((k\in {\mathbb Z}_{\ge 0})\) be a vector representing the probability of Bob being at each location j \((j\in \{0,1,\ldots , M-1\})\) after k steps, i.e. \(B^{(k)}_j={\mathrm{Pr}\left( \mathcal {B}_k=j\right) }\). Assume that Bob is at location i after k steps, then \(B^{(k+1)}\) \((k\in {\mathbb Z}_{\ge 0})\) can be calculated as follows:

figure a

Upper Bound for \({\mathrm{Pr}\left( E_k\right) }\). By definition,

$$\begin{aligned} {\mathrm{Pr}\left( E_k\right) }={\mathrm{Pr}\left( F_0\cup F_1\dots \cup F_k\right) }, \end{aligned}$$

which gives the following upper bound for \({\mathrm{Pr}\left( E_k\right) }\) since the probability of finding Bob at step i is at most equal to the probability of Bob’s most likely location at that step:

$$\begin{aligned} {\mathrm{Pr}\left( E_k\right) }\le {\mathrm{Pr}\left( F_0\right) }+{\mathrm{Pr}\left( F_1\right) }+\dots +{\mathrm{Pr}\left( F_k\right) }\le \sum _{i=0}^k\max _{j}B^{(i)}_j. \end{aligned}$$
(2)

Note that the above upper bound on \({\mathrm{Pr}\left( E_k\right) }\) holds for any attacker strategy \(\mathcal {A}\).

Lower Bound on k. We define

$$\begin{aligned} k_{lower,p}:=\min \left\{ k:\sum _{i=0}^k\max _{j}B^{(i)}_j\ge p\right\} . \end{aligned}$$
(3)

In view of Eq. 2, \(k_{lower,p}\le k_{O,p}\) is a lower bound of \(k_{O,p}\).

4 Quantifying Attacker Effort

Next we present the theoretical analysis on the minimal number of steps required by an optimal attack strategy to locate a particular victim. First, we derive the formula for calculating \({\mathrm{Pr}\left( E_k\right) }\) and then we consider the case of a victim following a random walk mobility model. We choose a random walk model to give a rigorous mathematical analysis. In more complex moving patterns, the problem may not have analytical or closed solutions because the transition matrix is indefinable or needs to be recursively updated. Our mathematical derivation will be presented for the one-dimensional case only as we can always project two-dimensions to a one-dimensional space.

4.1 Attacker’s Effort Computation

Given the transition matrix P and the vector \(B^{(0)}\) of initial position probabilities, \(B^{(k)}=B^{(0)}P^k\) gives the probabilities of Bob being at each different position after k steps. More precisely, \(B^{(k)}_j={\mathrm{Pr}\left( \mathcal {B}_k=j\right) }\) is the probability of Bob at position j after k steps.

Let \(P^i_{j_1 j_2}\) denote the \((j_1,j_2)\)-entry of the matrix \(P^i\). It gives the probability of Bob going from position \(j_1\) to position \(j_2\) in i steps, i.e. for any \(k\in {\mathbb Z}_{\ge 0}\),

$$ {\mathrm{Pr}\left( \mathcal {B}_{i+k}=j_2|\mathcal {B}_k=j_1\right) }= P^i_{j_1 j_2}. $$

Fixing an attacker strategy \(\mathcal {A}\), let \(\mathcal {A}_i\) denote the position of Alice at step i. For any positive integers \(i_1<i_2\), the probability of Alice locating Bob at both steps \(i_1\) and \(i_2\) is equal to the probability of Bob being at position \(\mathcal {A}_{i_1}\) at step \(i_1\) multiplied by the probability of Bob reaching position \(\mathcal {A}_{i_2}\) in \(i_2-i_1\) steps, i.e.

$$\begin{aligned} {\mathrm{Pr}\left( F_{i_1}\cap F_{i_2}\right) }&={\mathrm{Pr}\left( \mathcal {B}_{i_1}=\mathcal {A}_{i_1} \cap \mathcal {B}_{i_2}=\mathcal {A}_{i_2}\right) }\end{aligned}$$
(4)
$$\begin{aligned}&={\mathrm{Pr}\left( \mathcal {B}_{i_1}=\mathcal {A}_{i_1}\right) } {\mathrm{Pr}\left( \mathcal {B}_{i_2}=\mathcal {A}_{i_2}|\mathcal {B}_{i_1}=\mathcal {A}_{i_1}\right) }=B^{(i_1)}_{\mathcal {A}_{i_1}}P^{i_2-i_1}_{\mathcal {A}_{i_1}\mathcal {A}_{i_2}}. \end{aligned}$$
(5)

To get a general formula for \({\mathrm{Pr}\left( E_k\right) }\), we first note

$$\begin{aligned} {\mathrm{Pr}\left( E_k\right) }&={\mathrm{Pr}\left( F_0\cup F_1\cup \dots \cup F_k\right) } = {\mathrm{Pr}\left( (F_0 \cup F_1 \cup \cdots \cup F_{k-1})^c \cap F_k\right) } \nonumber \\&= {\mathrm{Pr}\left( F_0\right) } + {\mathrm{Pr}\left( F_0^c \cap F_1\right) } + {\mathrm{Pr}\left( (F_0 \cup F_1)^c \cap F_2\right) } + \cdots \nonumber \\&\quad +\,{\mathrm{Pr}\left( (F_0 \cup F_1 \cup \cdots \cup F_{k-1})^c \cap F_k\right) } \nonumber \\&= \sum _{0 \le i \le k} {\mathrm{Pr}\left( (F_0 \cup F_1 \cup \cdots \cup F_{i-1})^c \cap F_i\right) }, \end{aligned}$$
(6)

where \(F^c\) is the complement of F, i.e., \(F_j^c\) means Alice does not successfully locates Bob at step j. Our GUAS attacker strategy uses this result for aggregate success computation (see Sect. 5.2). From probability theory we can also write

$$\begin{aligned} {\mathrm{Pr}\left( E_k\right) }&={\mathrm{Pr}\left( F_0\cup F_1\cup \dots \cup F_k\right) } \nonumber \\&={\sum _{m=0}^k(-1)^{m+1}\left( \sum _{0\le i_1<\dots <i_m\le k}{\mathrm{Pr}\left( F_{i_1}\cap \dots \cap F_{i_m}\right) }\right) }. \end{aligned}$$
(7)

By combining Eq. 4 with Eq. 7, we have the following general formula for \({\mathrm{Pr}\left( E_k\right) }\):

$$\begin{aligned}&\sum _{m=0}^kB_{\mathcal {A}_m}^{(m)}\left( 1+\sum _{\ell =1}^{k-m}(-1)^\ell \sum _{m< i_1<i_2<\dots<i_\ell \le k}P_{\mathcal {A}_m\mathcal {A}_{i_1}}^{i_1-m}\dots P_{\mathcal {A}_{i_{\ell -1}}\mathcal {A}_{i_\ell }}^{i_{\ell }-i_{\ell -1}}\right) \nonumber \\&\qquad \qquad =B^{(0)}_{\mathcal {A}_0}\left( 1-\sum _{i=1}^kP^i_{\mathcal {A}_0\mathcal {A}_i}+\sum _{1\le i_1<i_2\le k}P^{i_1}_{\mathcal {A}_0\mathcal {A}_{i_1}}P^{i_2-i_1}_{\mathcal {A}_{i_1}\mathcal {A}_{i_2}}-\dots \right) \nonumber \\&\qquad \quad +\,B^{(1)}_{\mathcal {A}_1}\left( 1-\sum _{i=2}^k P_{\mathcal {A}_1\mathcal {A}_i}^{i-1}+\sum _{2\le i_1<i_2\le k}P^{i_1-1}_{\mathcal {A}_1\mathcal {A}_{i_1}}P^{i_2-i_1}_{\mathcal {A}_{i_1}\mathcal {A}_{i_2}}-\dots \right) +\dots +B_{\mathcal {A}_k}^{(k)}.\quad \end{aligned}$$
(8)

If the attacker strategy is not deterministic, then for one fixed sequence of positions \(a_0,a_1,a_2,\dots ,a_k\), \({\mathrm{Pr}\left( E_k|\mathcal {A}_0=a_0 \cap \mathcal {A}_1=a_1 \cap \dots \cap \mathcal {A}_k=a_k\right) }\) is given by

$$\begin{aligned} \sum _{m=0}^kB_{a_m}^{(m)}\left( 1+ \sum _{\ell =1}^{k-m}(-1)^\ell \sum _{m< i_1<i_2<\dots <i_\ell \le k}P_{a_m a_{i_1}}^{i_1-m}P_{a_{i_1}a_{i_2}}^{i_2-i_1}\dots P_{a_{i_{\ell -1}}a_{i_\ell }}^{i_{\ell }-i_{\ell -1}}\right) , \end{aligned}$$

and we have the following formula for \({\mathrm{Pr}\left( E_k\right) }\)

$$\begin{aligned} \sum _{1\le a_0,a_1,\dots ,a_k\le n} {\mathrm{Pr}\left( \mathcal {A}_0=a_0 \cap \dots \cap \mathcal {A}_k=a_k\right) } \cdot {\mathrm{Pr}\left( E_k|\mathcal {A}_0=a_0 \cap \dots \cap \mathcal {A}_k=a_k\right) } \end{aligned}$$
(9)

Note that when the attacker strategy is deterministic, for one sequence of positions \(a_0,a_1,a_2,\dots ,a_k\) we have \({\mathrm{Pr}\left( \mathcal {A}_0=a_0,\dots ,\mathcal {A}_k=a_k\right) }=1\) and we get the formula in (8).

For the next subsection, we assume Bob follows a random walk strategy that can be represented as a Markovian process with transition probabilities \(p_{ij}\), consistent with the provisions of the random walk mobility model in Sect. 2.2. We give our evaluations for the case \(p=0.5\). To simplify the notations, we define

$$\begin{aligned}&k_\mathcal {A}:=k_{\mathcal {A},0.5}=\min \{k:{\mathrm{Pr}\left( E_k\right) }\ge 0.5\},\\ {}&k_O:=k_{O,0.5}=\min _\mathcal {A}k_\mathcal {A},\\&k_{lower}:=k_{lower,0.5}. \end{aligned}$$

Thus if Alice follows strategy \(\mathcal {A}\), \(k_\mathcal {A}\) (resp. \(k_\mathcal {A}+1\)) is the number of steps (resp. number of queries) needed for Alice to locate Bob with a probability of at least 0.5. \(k_O\) (resp. \(k_O+1\)) is the minimum number of steps (resp. minimum number of queries) needed for Alice to locate Bob with a probability of at least 0.5 independently of strategy used. In addition, \(k_{lower}\le k_O\) is a lower bound of \(k_O\).

4.2 Random Walk Example

In this section, we derive upper and lower bounds on \(k_O\) by analyzing the matrices \(B^{(k)}\). We show that “for a search space of size n, \(\lfloor \frac{n}{3}\rfloor -1\le k_O\le \lfloor \frac{n}{2}\rfloor \)” (see Corollary 1). To achieve this goal, we first estimate the values of \(B^{(k)}_j\). For a one-dimensional search space of size n, Bob follows a transition matrix P with initial position vector \(B^{(0)}\):

$$ B^{(0)}=\left[ \frac{1}{n},\frac{1}{n},\dots ,\frac{1}{n}\right] ,~P=\begin{bmatrix} 0&1&0&\dots&0&0 \\ \frac{1}{2}&0&\frac{1}{2}&\dots&0&0 \\ 0&\frac{1}{2}&0&\dots&0&0 \\ \vdots&\vdots&\vdots&\ddots&\vdots&\vdots \\ 0&0&0&\dots&0&\frac{1}{2} \\ 0&0&0&\dots&1&0 \end{bmatrix} $$

Then the probabilities of Bob in each position at step i is given by \(B^{(i)}=B^{(0)}P^i\):

$$\begin{aligned}&B^{(1)}=\left[ \frac{1}{2n},\frac{3}{2n},\frac{1}{n},\dots ,\frac{1}{n},\frac{3}{2n},\frac{1}{2n}\right] ,\nonumber \\&B^{(2)}=\left[ \frac{3}{4n},\frac{1}{n},\frac{5}{4n},\frac{1}{n},\dots ,\frac{1}{n},\frac{5}{4n},\frac{1}{n},\frac{3}{4n}\right] ,\nonumber \\&B^{(3)}=\left[ \frac{1}{2n},\frac{11}{8n},\frac{1}{n},\frac{9}{8n},\frac{1}{n},\dots ,\frac{1}{n},\frac{9}{8n},\frac{1}{n},\frac{11}{8n},\frac{1}{2n}\right] ,\nonumber \\&B^{(4)}=\left[ \frac{11}{16n},\frac{1}{n},\frac{5}{4n},\frac{1}{n},\frac{17}{16n},\frac{1}{n},\dots ,\frac{1}{n},\frac{17}{16n},\frac{1}{n},\frac{5}{4n},\frac{1}{n},\frac{11}{16n}\right] ,\end{aligned}$$
(10)

We have the following observation:

Lemma 1

For \(k\in {\mathbb Z}_{\ge 0}\),

$$ {\left\{ \begin{array}{ll} \frac{1}{n}\le B_j^{(k)}\le \frac{3}{2n}&{}1\le j\le n-2\\ \frac{1}{2n}\le B_j^{(k)}\le \frac{3}{4n}&{}j=0,n-1 \end{array}\right. }. $$

Proof

We prove the above claims by mathematical induction. For \(B^{(0)}\), the claim is true. We assume it is true for \(B^{(k)}\), then for \(B^{(k+1)}\)

  1. 1.

    \(2\le j \le n-3\), \(B^{(k+1)}_j=\frac{1}{2}\left( B^{(k)}_{j-1}+B^{(k)}_{j+1}\right) \), by induction hypothesis

    $$ \frac{1}{n}=\frac{1}{2}\left( \frac{1}{n}+\frac{1}{n}\right) \le B^{(k+1)}_j\le \frac{1}{2}\left( \frac{3}{2n}+\frac{3}{2n}\right) =\frac{3}{2n}. $$
  2. 2.

    \(j=1,n-2\), \(B^{(k+1)}_1=B^{(k)}_0+\frac{1}{2}B_2^{(k)}\), by induction hypothesis

    $$ \frac{1}{n}=\frac{1}{2n}+\frac{1}{2}\cdot \frac{1}{n}\le B^{(k+1)}_2\le \frac{3}{4n}+\frac{1}{2}\cdot \frac{3}{2n}=\frac{3}{2n};\quad \frac{1}{n}\le B_{n-2}^{(k+1)}\le \frac{3}{2n}. $$
  3. 3.

    \(j=0,n-1\), \(B^{(k+1)}_0=B_1^{(k)}\frac{1}{2}\), by induction hypothesis

    $$ \frac{1}{2n}=\frac{1}{2}\cdot \frac{1}{n}\le B_1^{(k+1)}\le \frac{1}{2}\cdot \frac{3}{2n}=\frac{3}{4n};\quad \frac{1}{2n}\le B_{n-1}^{(k+1)}\le \frac{3}{4n}. $$

The above bounds on \(B_j^{(k)}\) helps us to get upper and lower bounds on \(k_O\). Next we show a lower bound of \(k_{lower}\), which gives a lower bound for \(k_O\).

Proposition 1

$$ k_O\ge k_{lower}\ge \lfloor \frac{n}{3}\rfloor -1. $$

Proof

By Lemma 1, for any k,

$$ \sum _{i=0}^k\max _j B_j^{(i)}\le \frac{3}{2n}\cdot (k+1)=\frac{3k+3}{2n}. $$

If \(k<\lfloor \frac{n}{3}\rfloor -1\), \(\sum _{i=0}^k\max _j B_j^{(i)}<\frac{1}{2}\). Thus by definition, we must have \(k_{lower}\ge \lfloor \frac{n}{3}\rfloor -1\).

By the definition of \(k_O\), for any given attacker strategy \(\mathcal {A}\), \(k_O\le k_\mathcal {A}\). Next, we construct a specific attacker strategy \(\mathcal {A}_{jp}\) and prove an upper bound for \(k_{\mathcal {A}_{jp}}\), to obtain an upper bound for \(k_O\).

Lemma 2

For any \(n\ge 4\), there exists a strategy \(\mathcal {A}_{jp}\) such that \(k_{\mathcal {A}_{jp}}\le \lfloor \frac{n}{2}\rfloor \).

Proof

First we notice that \(P_{ij}^\ell =0\) if \(j-i>\ell \) due to our construction of the random walk P.

  1. 1.

    Let \(n\ge 4\) be even, consider the attacker strategy \(\mathcal {A}_{jp}\) with the first \(\frac{n}{2}\) positions given by \(\mathcal {A}_0=0,\mathcal {A}_1=2,\mathcal {A}_2=4,\dots ,\mathcal {A}_m=2m,\dots ,\mathcal {A}_{\frac{n}{2}-1}=n-2.\) For all \(0\le i<j\le \frac{n}{2}-1\),

    $$ \mathcal {A}_j-\mathcal {A}_i=2(j-i)>j-i. $$

    Then for \(0\le i<j\le \frac{n}{2}-1\),

    $$\begin{aligned} {\mathrm{Pr}\left( F_i\cap F_j\right) }= & {} {\mathrm{Pr}\left( \mathcal {B}_i=\mathcal {A}_i\text { and }\mathcal {B}_j=\mathcal {A}_j\right) }={\mathrm{Pr}\left( \mathcal {B}_i=\mathcal {A}_i\right) }{\mathrm{Pr}\left( \mathcal {B}_j=\mathcal {A}_j|\mathcal {B}_i=\mathcal {A}_i\right) }\\= & {} B_{\mathcal {A}_i}^{i}P_{\mathcal {A}_i\mathcal {A}_j}^{j-i}=0. \end{aligned}$$

    Together with Lemma 1 we have

    $$\begin{aligned} {\mathrm{Pr}\left( E_{\frac{n}{2}-1}\right) }= & {} \sum _{i=0}^{\frac{n}{2}-1}{\mathrm{Pr}\left( F_i\right) }=B^{(0)}_0+B^{(1)}_2+\dots +B_{n-2}^{\left( \frac{n}{2}-1\right) }\ge \sum _{i=0}^{\frac{n}{2}-1}\frac{1}{n}=\frac{1}{n}\frac{n}{2}=\frac{1}{2}. \end{aligned}$$

    Hence \(k_{\mathcal {A}_{jp}}\le \frac{n}{2}-1\).

  2. 2.

    Let \(n\ge 5\) be odd, consider the attacker strategy \(\mathcal {A}_{jp}\) with the first \(\lfloor \frac{n}{2}\rfloor +1\) positions given by \(\mathcal {A}_0=0,\mathcal {A}_1=2,\mathcal {A}_2=4,\dots ,\mathcal {A}_m=2m,\dots ,\mathcal {A}_{\lfloor \frac{n}{2}\rfloor }=n-1\). Similarly we can prove \({\mathrm{Pr}\left( F_i\cap F_j\right) }=0\) for \(0\le i<j\le \lfloor \frac{n}{2}\rfloor \). Together with Lemma 1 we have

    $$\begin{aligned} {\mathrm{Pr}\left( E_{\lfloor \frac{n}{2}\rfloor }\right) }= & {} \sum _{i=0}^{\lfloor \frac{n}{2}\rfloor }{\mathrm{Pr}\left( F_i\right) }=B^{(0)}_0+B^{(1)}_2+\dots +B_{n-1}^{\left( \lfloor \frac{n}{2}\rfloor \right) }\\\ge & {} \frac{1}{2n}+\sum _{i=0}^{\lfloor \frac{n}{2}\rfloor -1}\frac{1}{n}=\frac{1}{n}\left( \left\lfloor \frac{n}{2}\right\rfloor +\frac{1}{2}\right) =\frac{1}{2}, \end{aligned}$$

    and hence \(k_{\mathcal {A}_{jp}}\le \lfloor \frac{n}{2}\rfloor \).

Recall the notation:

$$ k_\mathcal {A}=\min \{k:{\mathrm{Pr}\left( E_k\right) }\ge \frac{1}{2}\},\qquad k_O=\min _\mathcal {A}k_\mathcal {A}$$

Corollary 1

For a one-dimensional search space of size n, \(\lfloor \frac{n}{3}\rfloor -1\le k_O\le \lfloor \frac{n}{2}\rfloor \).

We used the bounds in Lemma 1 to approximate \(\sum _{i=0}^k\max _{0\le j\le n-1}B^{(i)}_j\) in Proposition 1. We also derived the lower bound of \(k_{lower}\), which then gave lower bounds on \(k_O\). From Eq. (10), we can see \(\max _{0\le j\le n-1}B_j^{(k)}\) decreases when k increases, which means the upper bounds on \(B_j^{(k)}\) derived in Lemma 1 can be tightened for larger values of k. Thus we expect the lower bound for \(k_O\) in Corollary 1 to be higher than \(\lfloor \frac{n}{3}\rfloor -1\).

Similarly, the upper bound \(\lfloor \frac{n}{2}\rfloor \) for \(k_O\) was derived through a lower bound on \(B_j^{(k)}\). However, from Eq. (10) we see that \(B_j^{(k)}\) can achieve higher values than the lower bounds, which suggests that the specific attacker strategy \(\mathcal {A}_{jp}\) described in Lemma 2, \(k_{\mathcal {A}_{jp}}\) is strictly smaller than \(\lfloor \frac{n}{2}\rfloor \). These observations motivate the calculations of exact values of \(\sum _{i=0}^k\max _{0\le j\le n-1}B^{(i)}_j\) and \(k_{\mathcal {A}_{jp}}\), which yield tighter bounds on \(k_O\).

5 Attacker Strategies

In the previous section we derived theoretical bounds for the optimal attacker strategy. Unfortunately, the lower bound estimate does not help to find the optimal strategy. Conversely, the upper bound provides us with a constructive strategy, which we briefly discuss in this section and name it the Linear Jumping Strategy (LJS). We also propose a greedy strategy, which we call the Greedy Updating Attack Strategy (GUAS), which leverages estimations of Bob’s positions to determine next queries.

5.1 Linear Jumping Strategy

The linear jumping strategy sequentially selects every second location in B. Given uniform initial distributions and a random walk mobility model of the victim, it is expected to meet the upper bound cost as discussed in the previous section. The strategy was implemented by us for evaluation on different cases. Algorithm 1 in Appendix A summarizes the implementation in pseudocode. The runtime of LJS is linear in n.

5.2 Greedy Updating Attack Strategy

While we could implement an exhaustive search to find optimal strategies for any given setting, that approach is impractical and computationally expensive. In particular, for n locations, we need to evaluate the \(k_{\mathcal {A},p}\) for \(n^{\lceil \frac{n}{2}\rceil }\) different strategies (n potential choices per step, we expect to be done within \(\lceil \frac{n}{2}\rceil \) steps). This is infeasible even for moderate values of n. Instead, we introduce a (locally) greedy strategy which is computationally cheaper and allows us to perform simulations on a larger range of settings. We call it the Greedy Updating Attack Strategy (GUAS).

Suppose Alice has some assumed initial location of Bob \(\tilde{B}^{(0)}\) (for example, Bob’s actual initial location distribution \(B^{(0)}\)), and transition matrix P. At each time step i, Alice keeps her current estimate of Bob’s location as \(\tilde{B}^{(i)}\). In particular, Alice can use \(\tilde{B}^{(i)}\) to keep track of locations she checked previously, and found to be empty. That means that \(\tilde{B}^{(i)}\) depends on the actual choices of the attacker, while \(B^{(i)}\) just depends on \(B^{(0)}\) and P.

In query \((i+1)\), Alice checks location \(\max \limits _{j} \tilde{B}_j^{(i)}\), that is, the most likely location of Bob at that timeFootnote 1. If Alice succeeds, then we are done; if Alice does not find Bob, she updates \(\tilde{B}^{(i)}\) by setting the probability of the location checked in the current query to be 0 and re-normalizing \(\tilde{B}^{(i)}\) to ensure that \(\sum \tilde{B}^{i}=1\). Thus, the following values of \(\tilde{B}^{(i+1)}\) will be computed under the condition that the victim was not at the location tested in query i and earlier. The algorithm is described as pseudocode in Algorithm 2 in Appendix A. The runtime of the GUAS is quadratic in n: for each of the up to \(\lceil \frac{n}{2}\rceil \) queries, we need to find the minimal value of \(\tilde{B}^{(i-1)}\) (which is of size n). Thus, we were able to run this strategy for a search space size \(n=2000\) without issues.

In Sect. 6, we evaluate both LJS and GUAS for a random walk transition pattern and a sparse high-dimensional transition matrix derived from a real dataset. For the real dataset, we also investigate the attacker’s performance if \(\tilde{B}^{(0)}\ne {B^{(0)}}\), i.e., attacker has no information about victim’s initial location distribution.

6 Evaluation

Based on the assumption that the attacker is aware of the victims probability distribution for initial location and the transition matrix, we simulated LJS and GUAS for different values of the initial location of Bob and the transition matrix. In addition, we tested our strategy on a transition matrix derived from a real dataset. All experiments were performed on a Core i3-4005U CPU@1.70 GHz with 8 GB RAM.

6.1 Random Walk Transition Matrix

In this experiment, we used the transition matrix that represents a random walk (i.e., same P as in Sect. 4.2) and performed simulations with various initial distributions for Bob, \(B^{(0)}\), and for different search spaces with sizes varying from 100 to 2000. We assume the attacker knows the initial distribution of the victim.

Specifically, we consider space sizes of 100, 500 and 2000. We run 100 simulations for each of these sizes and represent the initial location probability distribution for \(B^{(0)}\) by a Dirichlet distribution with concentration parameter \(\alpha \). With increasing \(\alpha \) values, \(B^{(0)}\) is closer to being uniformly distributed, so a positive \(\alpha \) value of almost zero indicates an initial distribution with very high likelihood in one location, and almost zero in all others. The results represent the effort of the attacker to successfully locate Bob with a probability of 0.5 for GUAS and LJS strategies under various initial distributions for \(B^{(0)}\) regardless of the search space size. In Fig. 2 we provide a box-plot representation of some representative results (see Table 3 in Appendix B for full results). The upper and lower borders of the box represent the upper and lower quartiles, and the bar in the box is the median. The upper and lower whiskers represent the maximum and minimum number of queries in all simulations, excluding outliers, which are more than 1.5 times beyond the upper and lower quartiles.

Fig. 2.
figure 2

GUAS and LJS results for random walk mobility for different search space sizes with success probability of 0.5.

We observed that with a large concentration parameter \(\alpha \), which leads to an almost uniform distribution of the initial position, LJS achieves the optimal lower bound, while GUAS is unable to do so. With decreasing concentration parameter value, the initial distribution becomes less uniformly distributed. Assuming that the attacker is aware of the exact initial distribution, GUAS becomes more effective for such non-uniform initial distribution. This is true regardless of the size of the search space, as shown in Fig. 2.

6.2 Real-World Mobility Pattern

We also performed simulations with a transition matrix we derived from the T-Drive dataset [21, 22] released by Microsoft. The dataset comprises GPS trajectories of 10357 taxis between Feb. 2 and Feb. 8 2008 within the city of Beijing, around 15 million points and a total trajectories distance of sums up to 9 million kilometers. The average sampling interval is around 177 s with a distance of 623 m.

In our simulation, we take a subset of data within the third ring of Beijing. We discretize that area in grid cells of 500 m \(\times \) 500 m, which corresponds to a lat-long unit of around 0.005. This results in 884 different locations. The area is mapped into a one-dimensional space using the following projection:

$$\begin{aligned} proj (i, j)&= j\times n + i, \end{aligned}$$

where (ij) represents the coordinate of a grid cell in lat-long plane, and n is the total number of partitions in longitude. Based on the aforementioned discretization and projection method, we compute the probability of transitioning from one grid cell to another and build up the aggregated transition matrix, \(P_{\text {taxi}}\), in the following way. For each taxi, we check which grid cell the taxi is moving to by reading the data in chronological order on a daily basis. Whenever there is a transit from, say location (or grid cell) x to y, we record this transit. After processing and recording all the transits for the first taxi, we can normalize the resulting matrix and build up a transition matrix only for this taxi. We do this for all the taxis, aggregate all the transition matrices and then normalize them to arrive at \(P_{\text {taxi}}\).

Fig. 3.
figure 3

Attack strategies performance on T-Drive dataset for unknown and known initial distributions with success probabilities of 0.5 and 0.8.

Results. We simulated different “true” initial distributions for the victim, and checked on both the cases where the attacker does not know and knows about Bob’s initial distribution. If the attacker is unaware of the initial distributions for the victim, she assumes a uniform initial distribution. See Fig. 3a and b for results calculated over 100 simulations. In these cases, the success is determined by how close the victim’s distribution is to uniform. While for larger values of \(\alpha \), the attacker’s guess is close enough to not decrease performance significantly. If the victim’s initial distribution is less random (\(\alpha \le 0.01\)), the attacker without knowledge on the initial distribution will perform worse, while the attacker with knowledge on initial B can leverage this in the GUAS strategy to decrease the required number of guesses.

Comparing performance of GUAS and LJS in Fig. 3a and b, we observe that for a less structured/dynamic mobility pattern, which for example could be modeled by some highly sparse transition matrix, GUAS performs better than LJS. Our interpretation of this result is that a realistic transition matrix leads to a set of locations that are significantly more likely than others. For LJS, it does not matter if the attacker knows the initial distribution since LJS does not utilize knowledge on the victim. See Tables 1 and 2 in Appendix B for detailed results.

7 Related Work

An extensive body of research has been devoted to describing and understanding mobility models for mobile ad hoc and other types of networks, such as vehicular networks [3, 10]. Typically, these papers describe mobility models where mobile nodes are independent of or dependent on each other, namely entity mobility models or group mobility models. The goal is usually to study the behavior of individual entity mobility models and help researchers decide which model is most suitable. Random mobility models are the models of choice of most authors.

Similarly, other authors have studied the Rendezvous Problem [20], which consists of finding an optimal strategy for two or more mobile entities who are unaware of each others’ location, to meet. This problem and some slight variations of it has attracted much attention from the research community because of its potential application to many engineering problems like the one we are considering in this paper. However, as far as we are concerned, this problem remains open [1, 4]. The main difference between rendezvous search problems and the one we are tackling in this paper is that rather than having two entities trying to find each other, here one of the entities is a victim that might not even aware of the existence of the attacker trying to find him/her.

Another line of work studies the probability for n independent entities to meet while they all follow random walk trajectories in dimension two [7, 14]. For \(n=2\), this is related to the special case when Alice chooses a random walk strategy. The difference is that we assume the entities take a step after every fixed length of time interval, while in the aforementioned papers they consider an entity takes a step after a time interval whose length follows a Poisson distribution.

In the realm of location privacy, the effort an attacker needs to locate a victim has been studied in different settings [9, 13, 16, 17]. In [9, 13], the focus was on static victim. A particular attacker model was used against moving target in [9]. Although this attacker model was shown to be optimal for a static victim, its efficiency as an attacker model for moving victim was not discussed. This is precisely the focus of our paper.

In [16, 17], the focus is on the quantification of users’ location privacy by analyzing location-based applications and location-privacy preserving mechanism (LPPM). They assume an obfuscated trace (obtained by an LPPM) of the victim is available to the attacker and the goal of the attacker is to find out the real trace [17] or the real location [16] of the victim. In contrast, in this paper the attacker is only aware of the mobility model of the victim, without any knowledge of the victim’s trajectory.

8 Conclusions

In this paper, we have presented a framework to reason about the expected effort of attackers attempting to locate moving targets using proximity testing. We first provide mathematical analysis for asymptotic bounds (on the search space) on the best attacker strategies under a random walk mobility model for the moving victim. We then derive two concrete strategies, and evaluate their performance over a range of parameters. The LJS strategy is found to work well for random walk mobility and close to uniform initial distribution of Bob, while the GUAS strategy requires less queries for less uniform initial distributions of Bob (which are known to the attacker). We then derive a realistic mobility model from a real dataset consisting of spatio-temporal trajectory data, and analyze the performance of our strategies. In that setting, we find that the GUAS strategy consistently requires less than \(\frac{N}{6}\) queries (\(p=0.5\)), while LJS requires more than 0.75N, where N is the search space size. Summarizing, we have shown theoretically and practically that (using a strategy suitable to the setting), an attacker is able to locate a victim with 50% probability with at most \(\frac{N}{2}\) steps. For example, using GUAS in our Beijing dataset the attacker could localize a victim with 50% probability in 134 steps (6.6 h), within an area of 0.25 km\(^2\).

Proposed Countermeasures. To prevent the demonstrated attack, we propose the following countermeasures: (a) The LBS could probabilistically return a wrong result (e.g., with 20% chance), (b) the LBS could verify that the sequence of attacker’s location claims confirms to some assumed transition matrix P (i.e., the likelihood of a claimed trajectory is above some threshold \(\tau \)), and (c) the LBS could impose limitations on the number of queries or the speed/frequency of queries by the requester. We leave the evaluation of these countermeasures for future work.