On Anonymizing Streaming Crime Data: A Solution Approach for Resource Constrained Environments

Sakpere, Aderonke Busayo; Kayem, Anne V. D. M.

doi:10.1007/978-3-319-92925-5_11

Aderonke Busayo Sakpere¹⁹ &
Anne V. D. M. Kayem²⁰

Part of the book series: IFIP Advances in Information and Communication Technology ((Tutorials,volume 526))

Included in the following conference series:

IFIP International Summer School on Privacy and Identity Management

1489 Accesses

Abstract

A typical resource constrained environment is restrained in terms of availability of resources such as skilled personnel, equipments, power and Internet connectivity. Designing privacy-based service-oriented architectures therefore requires re-adapting existing solutions to cope with the constraints of the environment. In this paper, we consider the case of mobile crime-reporting systems that have emerged as an effective and efficient data collection method in developing countries. Analyzing the data, can be helpful in addressing crime but, law enforcement agencies in resource-constrained contexts typically do not have the expertise required to handle these tasks. A possible cost-effective strategy is thus to outsource the data analytics operations to third-party service providers. However, the sensitivity of the data makes privacy an important consideration. In this paper we propose a two-pronged approach to addressing the issue of privacy in outsourcing crime data in resource constrained contexts. We build on this in the second step to propose a streaming data anonymization algorithm to analyse reported data based on occurrence rate rather than at a preset time on a static repository. Results from our prototype implementation and usability tests indicate that having a usable and covet crime-reporting application encourages users to declare crime occurrences and anonymizing streaming data contributes to faster crime resolution times.

You have full access to this open access chapter, Download chapter PDF

Supporting Streaming Data Anonymization with Expressions of User Privacy Preferences

Privacy and Confidentiality in Service Science and Big Data Analytics

Privacy in Mobile Devices

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

While organizations generate data that can contribute to improving performance daily, many of these organizations do not have the in-house expertise required to analyse the data. The lack of expertise is prominent in resource constrained environments manifested in rural/remote developing world regions, for instance, where constraints on resources such as access to computational power, reliable electricity, and the Internet pose a further challenge. A cost-effective solution is to outsource the data to a professional third-party data analytics service provider.

A study [1] carried out in technologically resource-constrained environments has revealed that collected crime data are usually not studied or analysed to support crime resolution. A possible reason for this is the lack of the necessary in-house expertise, both in terms of human capital and computational processing power [5, 15, 24]. This deprives policy makers in these regions of the benefits that could have been derived through data analytics. A possible solution to this is to involve a third-party data analytics service provider [1, 2]. However, because of the sensitive nature of crime data it makes sense to ensure that the outsourced data are protected from all unauthorized access including that of an honest-but-curious data mining service provider. Therefore, this paper focuses on developing a test bed framework to preserve privacy during real-time information sharing using the crime domain as an application scenario. However, it is important to stress that the ideas and approaches considered in this study are applicable to other areas or domains as well.

2 Related Work

A naive approach to preserve privacy or anonymity in data is to exclude explicit identifiers such as name and/or identification number. However linking attacks aimed at data deanonymisation, can be provoked successfully by combining non-explicit identifiers (such as date of birth, address and sex) with external or publicly available data [3, 25, 26]. To illustrate how a linking attack can be provoked, let us consider Fig. 1, which shows two compartments (or storage) that contains data. The upper compartment contains a portion of a publicly available table in which “name” is an explicit identifier attribute and the lower compartment shows a portion of a data stream that has been sanitized to exclude explicit identifiers (name) in order to disguise the identities of the individuals associated with the data. However, when a joining operation is performed on both compartments using attributes common to both compartments, the supposedly anonymized individual is re-identified successfully as Ade who lives at 10 Pope Street and also revealing her sensitive information that she has been a victim of rape.

According to Sweeney [18, 19] 87% of the population in the United States were uniquely identified by the combination of non-explicit identifiers such as gender, zip code and date of birth from the 1990 census dataset using linking attack. Therefore, Sweeney et al. came up with a better approach named k-anonymity to anonymize data in a manner that linking attack is minimized.

K-anonymity ensures privacy is preserved by hiding each individual in a cluster which contains at least k individuals such that an adversary finds it difficult to get additional individual information, but rather information about a group of k individuals [18, 20]. To understand how k-anonymity works, let us assume an attacker tries to identify a friend in a k-anonymized table, but the only information he has is her birth date and gender. K-anonymity ensures that the adversary finds it difficult to identify the individual by guaranteeing that at least k people have the same date of birth and gender. Thus minimizing the rate of linking attack to at least 1/k. K-anonymity algorithms can generally be grouped into two categories, namely hierarchy-based generalization and hierarchy-free generalization [3]. In the hierarchy-based generalization, data anonymization requires a generalization tree as an input to aid in the anonymization process while in the hierarchy-free generalization, generalization tree is not required as an input rather the algorithm makes use of clustering concepts and some heuristics.

The evolution of k-anonymity has led to the birth of newer privacy models to address its inherent limitation. Some of the popular and newer privacy models that extend k-anonymity are $\ell $-diversity and t-closeness [29, 30]. The main essence of $\ell $-diversity is to address homogeneity attack to which k-anonymity is vulnerable and it does this by requiring that each cluster in a k-anonymized table has at least $\ell $ distinct sensitive values [29]. T-closeness further complements $\ell $-diversity by ensuring that distribution of sensitive values in a cluster is similar to that of the entire anonymized table [30].

An equally fast-growing data preservation technique is differential privacy. Differential privacy achieves anonymization by altering the data (i.e. unanonymized data) with the addition of mathematical “noise” [13]. In other words, differential privacy preserves privacy through the “difference” between the data supplied and the noise added to it. Interestingly, recent research [16, 17] has shown that the use of t-closeness with k-anonymity can yield similar privacy result as those of differential privacy. In this research we focus on k-anonymity and its complementary techniques because of the simplicity [29], effectiveness [19] and high utility [31] offered, especially when compared to an evolving counterpart such as differential privacy. In addition, recent research [31, 32] has shown that differential privacy is achieved as long as a dataset is anonymized using k-anonymity and t-closeness. Therefore this paper focuses on the use of k-anonymity, $\ell $-diversity and t-closeness to achieve anonymization.

The adaptation of k-anonymity & its complementaries to data stream (real-time data) has led current research to integrate the concept of a buffering (or sliding window) mechanism and delay constraint into data stream anonymization [3, 4, 22, 25,26,27]. The buffer is designed to hold a portion of the data stream at every instant of time, after which an anonymization algorithm can be applied to data in the buffer. Delay constraints is required to put a check on each tuple so that it does not stay in the buffer beyond a pre-defined deadline. Inspite of this, many of the existing algorithms adapted for anonymization of data streams face the following challenges:

First, buffering according to delay constraints, can result in certain records being held in the buffer for long periods [3, 8, 10]. When such records are time-sensitive or need to be processed in real time, occurrence of delay usually results in high levels of information loss. Since a key requirement of a good anonymization scheme is high data utility, high levels of information loss due to expired tuples or dropped (or suppressed/unanonymizable) records are undesirable.
Second, building on the first problem, we note that many of the existing data stream anonymization schemes based on k-anonymity and its derivatives do not take distribution of future data streams into consideration during anonymization [4]. An implication of this is that a record that is likely to offer better anonymization at a lower rate of information loss in a future sliding window or data stream can be anonymized with such a future sliding window rather than the current sliding window or data stream. Therefore, there is a need to have a model that can predict the best sliding window or stream with which a record should be anonymized.

Therefore, the focus in this paper is to present a data-stream anonymization framework that addresses the aforementioned challenges inherent in existing framework. More detailed literature review can be found in [14].

3 Data Stream Anonymization Framework

Figure 2 presents an overview of our Data Stream Anonymization Framework using the crime domain as an application scenario. Users make crime reports electronically and the reports are anonymized in real time at the anonymization layer. The results from the anonymization layer are transferred to third party for data mining process at the application layer.

3.1 Users Layer: Crime Reporting Layer

As noted in previous sections, this research considers the crime domain as an application scenario for achieving data stream anonymization. However, the ideas in this research extend to any other domain that requires real-time anonymization of sensitive data. Thus, to enable us to create an application that allow people to report crime in a secured and covert way, we converted the existing paper-based crime reporting system of a University Campus Setting in South Africa into a digitized Crime Reporting System. We chose to use mobile device as platform for crime reporting because the use of mobile devices provides a good security platform for crime report [11, 23].

In order to ensure that our mobile crime reporting system is acceptable and usable in real-life we applied an iterative user-centered design methodology. To this effect we interviewed different key stakeholders within law enforcement agencies and crime victims. We had different iterations in our design until we came up with a final prototype acceptable to all stakeholders. Figure 3 presents the screenshots of our final prototype. More details about the research on the development and deployment of the crime reporting application,CryHelp, can be found in [5].

3.2 Anonymization Layer: Data Stream Anonymization

In our proposed crime reporting application scenario, data arrives in form of streams and contains information that is analyzed for statistical or data mining predictions. These data are temporarily stored in a buffer in order for anonymization to take place. A buffer is used to hold portions of the continuous data stream based on delay constraints that specify the duration for which tuples can remain in the buffer just before anonymization takes place. As illustrated in Fig. 4, the buffer optimizer uses time-based sliding window and Poisson probability to monitor the data in the buffer, ensuring that tuples are anonymised before the expiry time threshold is reached while Fig. 5 illustrates the data anonymizer which uses k-anonymity, $\ell $-diversity and t-closeness for data privacy preservation. We opted for Poisson distribution because it is concerned with the number of success that occurs within a given unit of measure. This property of the Poisson distribution makes viewing the arrival rate of the reported crime data as a series of events occurring within a fixed time interval at an average rate that is independent of occurrence of the time of the last event [6]. Only one parameter needs to be known, the rate at which the events occur which in our case is the rate at which crime reporting occurs.

4 Adaptive Buffer Re-sizing Scheme

As illustrated in Fig. 6, a “sliding window (buffer)”, $\text {sw}_i$, is a subset of the data stream, DS, where $\text {DS} = \{\text {sw}_{1},\, \text {sw}_{2},\, \text {sw}_{3},\ldots ,\, \text {sw}_{m}\}$ implies that the data stream consists of m sliding windows. The sliding windows obey a total ordering such that for $i<j$, $\text {sw}_{i}$ precedes $\text {sw}_{j}$. Each sliding window, sw$_i$ only exists for a specific period of time T and consists of a finite and varying number of records, n, so that $\text {sw}_{i}={R_{0},\ldots ,R_{n-1}}$.

Our adaptive buffer sizing scheme as illustrated in Fig. 7 is categorized into 6 phases and detailed explanation about how each of these phases work is in our earlier publication [12]. We summarize these details as follows. First we begin by setting the size of the buffer to some initial threshold value. Given the time-sensitivity of the data, we set the size of the sliding window, $\text {sw}_i$, to a value, T. T is a time value that is bounded by a lower bound value, $t_l$, and an upper bound value, $t_u$. The anonymization algorithm is applied to the data that was collected in the sliding window $\text {sw}_i$ during the period T. So, essentially $\text {sw}_{i} = T$. All records that are not anonymizable from the data collected in $\text {sw}_i$ are either included in a subsequent sliding window, say $\text {sw}_{i+1}$ or incorporated into already anonymized clusters of data that are similar in content wise.

In order to determine whether or not an unanonymizable record can be included in a subsequent sliding window, say $\text {sw}_{i+1}$, we compute its expiry time $T_{E}$ and compare its values to the bounds for acceptable sliding window sizes $[t_l,t_u]$. We compute $T_{E}$ as follows:

$$\begin{aligned} T_{E} = \text {sw}_{i} - T_{S} - T_{A} \end{aligned}$$

(1)

where $T_{E}$ is the time to expiry of a record , $T_{S}$ is the time for which the record was stored in $\text {sw}_i$, and $T_{A}$ is the time required to anonymize the data in $\text {sw}_i$.

Starting with the unanonymizable record, $R_i$, that has the lowest $T_{E}$ and whose value falls within the acceptable bound, [$t_l,t_u$], we check for other unanonymizable records, n, that belong to the same data anonymization group as $R_i$. We then proceed to find the rate of arrival, $\lambda $, of data in that anonymization group. We compute the arrival rate of records required to anonymize $R_i$ within time $T_{E}$ as follows:

$$\begin{aligned} \lambda = \frac{\text {n}}{\text {sw}_1} \times \text {T}_E \end{aligned}$$

(2)

The expected arrival rate, $\lambda $, is used to determine the probability of arrival of at least the number of records needed to guarantee that delaying anonymizing the unanonymizable record, $R_i$, to the next sliding window $\text {sw}_{i+1}$ will not adversely increase information loss. We next determine the minimum number of records, m, required to guarantee anonymization of $R_i$ in the next sliding window, $\text {sw}_{i+1}$. When the decision is to include the $R_i$ into the next sliding window $\text {sw}_{i+1}$, we need to then compute the optimal size for $\text {sw}_{i+1}$ in order to minimize information loss from record expiry. We achieve this by finding the probability that m records will actually arrive in the data stream within time, $T_{E}$, in order to anonymize the unanonymizable record, $R_i$. We use Eq. 3 to compute the probability of having i = 0 ...m records arrive in the stream within $T_{E}$

(3)

where $\lambda $ is the expected arrival rate, e is the base of the natural logarithm (i.e. $e = 2.71828$) and $\text {i}$ is the number of records under observation.

Therefore the probability of having greater that m or more records arrive in the stream within time $T_{E}$ is

$$\begin{aligned} 1 - \sum _{i=0}^{m-1} pr \end{aligned}$$

(4)

where pr is the probability outcome of Eq. 3.

If the result of Eq. 4 is greater than a preset probability threshold, $\delta $, we set the size of the subsequent sliding window, $\text {sw}_{i+1}$, to the expiry time of the unanonymizable record under consideration. We then mark the unanonymizable record for inclusion in the subsequent sliding window along with other unanonymizable records that have their $T_{E}$ within bounds for acceptable sliding window sizes $[t_l,t_u]$. In the event that the probability of all unanonymizable records is less than the preset probability threshold, we set the subsequent sliding window size to a random number within the time bound, $[t_l,t_u]$. More detailed explanation of the adaptive buffer scheme can be found in our previous work [12].

5 Experiments and Results

This section presents the implementation and results of the crime-reporting Application, CryApp, and the adaptive buffering scheme algorithm.

5.1 CryHelp Application Evaluation

In order to evaluate the usability of our mobile crime reporting application, CryHelp, we developed a questionnaire. The questionnaire was based on IBM CSUQ [23]. The advantage of the IBM CSUQ [23] is that it allows questionnaires to be divided into scores and specific categories. These categories are: System Overall, System Usefulness, Information Quality and Interface Quality. These categories allow evaluation of each individual component of the system to gauge which aspects perform well or poorly on average. These results directly address the issue of whether a mobile device can be used to effectively and securely send a crime report.

Figure 8 shows the result of each component of the system. From the figure, it can be seen that overall the system was well received with an overall system score of 77.06%, this suggests the users found the system very usable with a standard deviation of 0.05 for contributing scores System Use, Information Quality and Interface Quality. It is not surprising to find that the interface quality (78.33%), though marginally, is the most appreciated aspect of the system as the design process was centered on the users. These results bode very well for the feasibility of a mobile solution for crime reporting.

5.2 Experiments on Anonymization

Our feasibility study and experiment conducted on the prototype crime data collection application, CryHelp [5], informed the generation of more datasets for the second phase of experiment. The generation of more crime data was done using a random generator software^{Footnote 1} and pseudo-random algorithm based on a Gaussian distribution to populate the crime data-stream based on ground-truth provided by the users, UCT Campus Protection Service and the South African Police Service. Our data are in two sets, which contain 1000 and 10 000 records respectively, the first set contains 1000 records while the second set contains 10000 records, this is a reasonable bound for daily average crime report rates in South Africa [5].

Therefore, this section discusses the gains obtained using Poisson probability distribution to predict the time a sliding window should exist, while ensuring that records do not expire, the number of unanonymizable (or suppressed) records is minimal and privacy is maintained using k-anonymity, $\ell $-diversity and t-closeness. The gains obtained are explained in the following sub-sections:

Recovered Unanonymizable Tuples: During anonymization there is usually a trade-off between the rate of IL, suppression and generalization. Usually if an equivalence class (cluster) is unable to satisfy the privacy requirement, such a class is either merged with another class or all its records are suppressed. A higher suppression rate implies that vital information is likely to be concealed from the recipient of the anonymized table, while merging of classes implies an increase in IL, which has the drawback of offering lower data utility. In order to curb this, Poisson probability distribution predicts the chances of such unanonymizable (suppressed) records undergoing anonymization in the next sliding window in a manner that preserves privacy and maximizes data utility with the goal of minimizing delay or expiration of records.

Figures 9 and 10 show the rate at which unanonymizable records were anonymized again, going by the predictions of Poisson probability distribution. It is evident from the figure that many unanonymizable records were recovered and allowed to go for anonymization again. It was also observed that the probability threshold influenced the number of unanonymizable records recovered. This leads to the conclusion that the higher the probability threshold, the lower the probability of unanonymizable records being given a chance for anonymization re-consideration in subsequent sliding window(s). The implication of this is that more records are likely not to be given the chance of another round of anonymization if higher threshold values are used. Another observation is that if a higher threshold value is used, then there are fewer changes or movements in records between sliding windows.

Privacy Value/Level Versus Recovered Unanonymizable Tuples: “Privacy level” simply means the degree of anonymity offered, while unanonymizable tuples are those tuples that belong to an equivalence class whose size is less than k. For the purpose of sliding windows that start with a small number of tuples, the minimum privacy level threshold was set as k=2 and the maximum at k= 15; the $\ell $-diversity value, $\ell $, was varied between values 3 and 5 and finally the t-closeness value, t, was alternated between values 0.1 and 0.15 for the two datasets.

As illustrated in Figs. 11 and 12, it was observed that as the privacy value/level increases, the possible number of unanonymizable records that can be recovered using Poisson probability prediction is reduced. The main reason for this is that as the privacy level or degree increases, it is expected that the rate or possibility of achieving anonymization will become increasingly challenging. This definitely also influences the expectation of higher chances of anonymization rate for unanonymizable records. To understand the reason for the decline in the rate of recovered unanonymizable records better, let us assume the k privacy level is set to three and an equivalence class, $\text {EC}_i$, has two records; this implies that we are looking for at least one more record to make $\text {EC}_i$ satisfy k-anonymity. In essence, using Poisson probability, the adaptive buffer resizing model attempts to predict the chance of at least one record in $\text {EC}_i$ arrive within time, t, in the next sliding window. If k is set to four, this will mean the chance of at least two records arriving in the next sliding window. An implication of this is that the chances of having at least two records is more difficult or demanding compared to the chances of just one record. Thus, this explains why the model has a drop in recovered unanonymizable records as privacy level increases. Therefore, the conclusion is that the rate at which unanonymizable records in a current sliding window can be anonymized in a subsequent window is mainly dependent on the privacy value.

5.3 Benchmarking: Poisson Solution Comparison with Non-Poisson Solution

As a baseline case, for evaluating our proposed adaptive buffering scheme we implemented the proactive-FAANST and passive-FAANST. These algorithms are a good comparison benchmark because they are the current state-of-the-art streaming data anonymization and reduce IL with minimum delay and expired tuples [3]. The proactive-FAANST decides if an unanonymizable record will expire if included in the next sliding window, while passive-FAANST searches for unanonymizable records that have expired. A major drawback of these two variants is that there is no way of deciding whether or not such unanonymizable records would be anonymizable during the next sliding window. This is necessary to avoid repeatedly cycling a tuple that has a low chance of anonymization in subsequent sliding window(s). Moreover, these algorithms do not consider the fact that the flow or speed of a data stream could change. These weaknesses of proactive-FAANST and passive-FAANST are what we attempt to address by using Poisson probability distribution to predict if such tuples would be anonymizable in subsequent sliding window(s) by taking into consideration the arrival rate of records, success rate of anonymization per sliding window, time a tuple can exist and rate of suppressed records.

Expired Tuples and Information Loss in Delay: A tuple expires when it remains in the system for longer than a pre-specified threshold called delay [3, 10]. In order to decide whether a tuple has exceeded its time-delay constraint, additional attributes such as arrival time, expected waiting time and entry time were included. As a heuristic, the choice of delay values, $t_l = 2000\, \text {ms}$ and $t_u = 5000\, \text {ms}$, is guided by values of delay that are used in published experimentation results [3].

In general, our approach shows that there are fewer expired tuples when compared to passive-FAANST and proactive-FAANST solutions. This is because before our Poisson prediction transfers suppressed records to another sliding window, it checks for the possibility of their anonymization. In other solutions, there is no mechanism in place to check the likelihood of the anonymizability of a suppressed record before allowing it to go to the next sliding window/round. As a result, such tuples are sent to the next sliding window and have high a tendency to expire eventually. Our solution also shows that the lower a k-value, the higher the number of expired tuples. This is because the outcome of Poisson prediction is lower for higher k-values. As a result, there are fewer changes of sliding windows as the k-value increases and this means there is a lower possibility of expired tuples.

One of the main goals of our solution is to reduce IL in delay (i.e. to lower the number of expired tuples). Figure 13 depicts that our solution is successful in achieving its main goal, and the IL (delay) in our solution is lower than in passive and proactive solutions. In order to determine the total number of records that expired, a simple count function was used to retrieve all records that had remain in the buffer longer than the upper limit threshold, $t_u$. To determine the average expired records, we sum up the expired records in all the experiments and divide the result by the total number of experiments.

Data Utility and Information Loss in Record: An important factor that is considered in anonymization is the degree of usability of anonymized data for data analysis or data mining tasks [28]. Therefore, we compared the degree of IL in records of our solution with that of Passive-FAANST and proactive-FAANST. Our result, as illustrated in Fig. 14, shows that at the minimal level of privacy enforcement, the information loss of our solution is on par with the other two schemes, while at the maximal level our solution has better data utility.

6 Conclusions

We started this paper on the note that resource constrained environments lack data analytic expertise that can analyze and mine crime data in real-time. This anonymization process is important in order to provide intervention that can carry out this analysis in timely fashion. We adopted k-anonymity, $\ell $-diversity and t-closeness as our anonymization scheme due to their simplicity, efficiency and applicability in real-life. However current literature on integration of these techniques to data stream has issues in terms of performance and privacy. The performance issue deals with information loss in terms of delay and running cost.

To address the challenge of ensuring that delay is optimal during anonymization process, we adaptively resized the buffer to handle intermittent flows of crime reporting traffic optimally by using Poisson Distribution. Results from our prototype implementation demonstrate that in addition to ensuring privacy of the data, our proposed scheme outperforms other with an information loss rate of 1.95% in comparison to 12.7% on varying the privacy level of crime report data records.

Notes

1.
http://www.mockaroo.com.

References

Isafiade, O.E., Bagula, A.B.: Citisafe: adaptive spatial pattern knowledge using FP-growth algorithm for crime situation recognition. In: Ubiquitous Intelligence and Computing, IEEE 10th International Conference on Autonomic and Trusted Computing (UIC/ATC), pp. 551–556 (2013)
Google Scholar
Qiu, L., Li, Y., Wu, X.: Protecting business intelligence and customer privacy while outsourcing data mining tasks. Knowl. Inf. Syst. 17(1), 99–120 (2008)
Article Google Scholar
Zakerzadeh, H., Osborn, S.L.: Delay-sensitive approaches for anonymizing numerical streaming data. Int. J. Inf. Secur. 12, 1–15 (2013)
Article Google Scholar
Guo, K., Zhang, Q.: Fast clustering-based anonymization approaches with time constraints for data streams. Knowl.-Based Syst. 46, 95–108 (2013)
Article Google Scholar
Sakpere, A.B., Kayem, A.V.D.M., Ndlovu, T.: A usable and secure crime reporting system for technology resource constrained contexts. In: Proceedings of the 29th IEEE International Conference on Advanced Information Networking and Applications Workshops (WAINA 2015), Gwangju, Korea, 24–27 March 2015
Google Scholar
Li, S.: Poisson process with fuzzy rates. Fuzzy Optim. Decis. Mak. 9(3), 289–305 (2006)
Article MathSciNet Google Scholar
Sakpere, A.B., Kayem, A.V.D.M.: A state of the art review of data stream anonymisation schemes. IGI Global, PA, USA (2014)
Google Scholar
Sakpere A.B., Kayem, A.V.D.M.: Adaptive buffer resizing for efficient anonymization of streaming data with minimal information loss. In: Proceedings of 1st International Conference on Information Systems Security and Privacy (ICISSP), pp. 191–201 (2015)
Google Scholar
Sakpere, A.B.: User-defined privacy preferences for k-anonymization in electronic crime reporting systems for developing nations. In: proceedings of the 1st International Doctoral Symposium on Security and Privacy (2015)
Google Scholar
Mohammadian, E., Noferesti, M., Jalili, R.: FAST: fast anonymization of big data streams. In: Proceedings of the 2014 International Conference on Big Data Science and Computing, p. 23. ACM (2014)
Google Scholar
Lasley, J.R., Palombo, B.J.: When crime reporting goes high-tech: an experimental test of computerized citizen response to crime. J. Crim. Justice 23(6), 519–529 (1995)
Article Google Scholar
Sakpere, A.B., Kayem, A.V.D.M.: Adaptive buffer resizing for efficient anonymization of streaming data with minimal information loss. In: Proceedings of the 1st International Conference on Information Systems Security and Privacy, pp. 191–201 (2015). https://doi.org/10.5220/0005288901910201. ISBN 978-989-758-081-9
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006). https://doi.org/10.1007/11787006_1
Chapter Google Scholar
Sakpere, A.B., Kayem, A.V.D.M.: A state of the art review of data stream anonymisation schemes. In: Information Security in Diverse Computing Environments. IGI Global, PA, USA (2014)
Google Scholar
Jensen, K.L., Iipito, H.N., Onwordi, M.U., Mukumbira, S.: Toward an mPolicing solution for Namibia: leveraging emerging mobile platforms and crime mapping. In: Proceedings of the South African Institute for Computer Scientists and Information Technologists Conference, pp. 196–205. ACM (2012)
Google Scholar
Domingo-Ferrer, J., Soria-Comas, J.: From t-closeness to differential privacy and vice versa in data anonymization. Knowl.-Based Syst. 74(151–158), 2015 (2015)
Google Scholar
Soria-Comas, J., Domingo-Ferrer, J.: Differential privacy via t-closeness in data publishing. In: Proceedings of the 11th Annual Conference on Privacy, Security and Trust (PST), pp. 27–35. IEEE (2013)
Google Scholar
Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(05), 557–570 (2002)
Article MathSciNet Google Scholar
Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(05), 571–588 (2002)
Article MathSciNet Google Scholar
Sweeney, L.: Computational Disclosure Control: A Primer on Data Privacy Protection. Thesis (Ph.D.), Massachusetts Institute of Technology, Cambridge, MA (2001). http://www.swiss.ai.mit.edu/6805/articles/privacy/sweeney-thesis-draft.pdf
Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 279–288. ACM (2002)
Google Scholar
Cao, J., Carminati, B., Ferrari, E., Tan, K.L.: Castle: continuously anonymizing data streams. IEEE Trans. Dependable Secur. Comput. 8(3), 337–352 (2011)
Article Google Scholar
Lewis, J.R.: IBM computer usability satisfaction questionnaires: psychometric evaluation and instructions for use. Int. J. Hum.-Comput. Interact. 7(1), 57–78 (1995)
Article Google Scholar
Burke, M.J.: Enabling anonymous crime reporting on mobile phones in the developing world. Masters Dissertation, University of Cape Town (2013)
Google Scholar
Wang, W., Li, J., Ai, C., Li, Y.: Privacy protection on sliding window of data streams. In: Collaborative Computing: Networking, Applications and Worksharing. IEEE (2007)
Google Scholar
Li, J., Ooi, B.C., Wang, W.: Anonymizing streaming data for privacy protection. In: IEEE 24th International Conference on Data Engineering, pp. 1367–1369. IEEE (2008)
Google Scholar
Zhang, J., Yang, J., Zhang, J., Yuan, Y.: KIDS: K-anonymization data stream base on sliding window. In: 2nd International Conference on Future Computer and Communication (ICFCC), vol. 2, pp. V2–311. IEEE (2010)
Google Scholar
Li, T., Li, N.: On the tradeoff between privacy and utility in data publishing. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 517–526. ACM (2009)
Google Scholar
Machanavajjhala, A., Kifer, D., Johannes, G.: l-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data (TKDD) 1(1), 3 (2007)
Article Google Scholar
Li, N., Li, T., Venkatasubramanian, S.: t-closeness: privacy beyond k-anonymity and l-diversity. In: International Conference on Data Engineering (ICDE), no. 3, pp. 106–115 (2007)
Google Scholar
Soria-Comas, J., Domingo-Ferrer, J.: Differential privacy via t-closeness in data publishing. In: Proceedings of the 11th Annual Conference on Privacy, Security and Trust (PST). IEEE (2013)
Google Scholar
Soria-Comas, J., Domingo-Ferrer, J.: Differential privacy via t-closeness in data publishing. Knowledge-Based Systems. Elsevier (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Cape Town, Cape Town, South Africa
Aderonke Busayo Sakpere
Internet Technologies and Systems Group, Hasso-Plattner-Institute, Potsdam, Germany
Anne V. D. M. Kayem

Authors

Aderonke Busayo Sakpere
View author publications
You can also search for this author in PubMed Google Scholar
Anne V. D. M. Kayem
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aderonke Busayo Sakpere .

Editor information

Editors and Affiliations

Unabhängiges Landeszentrum für Datenschutz (ULD), Kiel, Germany
Marit Hansen
Tilburg Institute for Law, Technology and Society (TILT), Tilburg University, Tilburg, The Netherlands
Eleni Kosta
European Commission - Joint Research Centre, Ispra, Italy
Igor Nai-Fovino
Karlstad University, Karlstad, Sweden
Simone Fischer-Hübner

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sakpere, A.B., Kayem, A.V.D.M. (2018). On Anonymizing Streaming Crime Data: A Solution Approach for Resource Constrained Environments. In: Hansen, M., Kosta, E., Nai-Fovino, I., Fischer-Hübner, S. (eds) Privacy and Identity Management. The Smart Revolution. Privacy and Identity 2017. IFIP Advances in Information and Communication Technology(), vol 526. Springer, Cham. https://doi.org/10.1007/978-3-319-92925-5_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-92925-5_11
Published: 09 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92924-8
Online ISBN: 978-3-319-92925-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

On Anonymizing Streaming Crime Data: A Solution Approach for Resource Constrained Environments

Abstract

Similar content being viewed by others

Supporting Streaming Data Anonymization with Expressions of User Privacy Preferences