Abstract
Crime analysis is an approach for identifying patterns and trends in crime events, while information extraction is the task of extracting relevant information from unstructured data. If crime reports are not directly available to the public, a possible solution is to derive crime information published in newspaper articles.
This paper aims at extracting, localizing, deduplicating, and visualizing crime events from online news articles. This work demonstrates how crime-related information can be obtained from newspapers and exploited to create a consistent database of crime events with an automatic process. The approach employs a Named Entity Recognition (NER) algorithm to retrieve locations, organizations and persons and a mapping phase to link entities to Linked Data resources. The date of the event is retrieved through the temporal expressions extraction and normalization. For duplicate detection, an approach analyses and combines crime category, description, location, and crime event date to identify which news articles refer to the same event. The approach has been successfully applied in the Modena province (Italy), focusing on eleven types of crime happen from 2011 till now. The flexibility of the approach allows it to be easily adapted to other cities, regions, or countries and also to other domains.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
On the police open data portal https://data.police.uk/, it was possible to download data about March 2020 on the 21st May 2020.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
European Data Portal https://www.europeandataportal.eu/it.
- 10.
The code of both applications is open source and is available in a github repository https://github.com/federicarollo/Crime-event-localization-and-deduplication.
- 11.
An example is available at https://gazzettadimodena.gelocal.it/ricerca?query=furti where furti (theft) is a type of crime.
- 12.
- 13.
- 14.
- 15.
We use the function OpenStreetMapUtils.getInstance().getCoordinates(location) where location is a string that can be generated by the municipality and the address, or the entity name retrieved from the DB. This function provides the latitude and the longitude of the location. The success of this function depends on how the address is stored in Open Street Map and how the location is reported in the news.
- 16.
- 17.
- 18.
Crime Visualization App - https://dbgroup.ing.unimore.it/crimemap.
- 19.
The test has been performed on a Microsoft Windows 10 Pro with 16 GB RAM.
References
Agarwal, N., Rawat, M., Maheshwari, V.: Comparative analysis of jaccard coefficient and cosine similarity for web document similarity measure. Int. J. Adv. Res. Eng. Technol. 2(X), 18–21 (2014)
Alonso, O., Fetterly, D., Manasse, M.: Duplicate news story detection revisited. In: Banchs, R.E., Silvestri, F., Liu, T.-Y., Zhang, M., Gao, S., Lang, J. (eds.) AIRS 2013. LNCS, vol. 8281, pp. 203–214. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-45068-6_18
Alqahtani, A., Garima, A., Alaiad, A.: Crime analysis in Chicago city. In: International Conference on Information and Communication Systems (ICICS), pp. 166–172 (2019). https://doi.org/10.1109/IACS.2019.8809142
Arya, C., Dwivedi, S.K.: Content extraction from news web pages using tag tree. Int. J. Auton. Comp. 3(1), 34–51 (2018). https://doi.org/10.1504/IJAC.2018.10013755
Attardi, G., Dei Rossi, S., Simi, M.: The tanl pipeline. In: Proceedings of the Workshop on Web Services and Processing Pipelines in HLT, co-located LREC (2010)
Broder, A.Z., Glassman, S.C., Manasse, M.S., Zweig, G.: Syntactic clustering of the web. Comput. Netw. ISDN Syst. 29(8–13), 1157–1166 (1997)
Chaulagain, B., Shakya, A., Bhatt, B., Newar, D.K.P., Panday, S.P., Pandey, R.K.: Casualty information extraction and analysis from news. In: Proceedings of the International Conference on Information Systems for Crisis Response and Management. ISCRAM Association (2019)
Keyvanpour, M.R., Javideh, M., Ebrahimi, M.R.: Detecting and investigating crime by means of data mining: a general crime matching framework. Proc. Comput. Sci. 3, 872–880 (2011)
Oatley, G., Zeleznikow, J., Ewart, B.: Matching and predicting crimes. In: International Conference on Innovative Techniques and Applications of Artificial Intelligence, pp. 19–32. Springer (2004). https://doi.org/10.1007/1-84628-103-2_2
Palmero Aprosio, A., Moretti, G.: Italy goes to Stanford: a collection of CoreNLP modules for Italian. ArXiv e-prints (2016)
Pianta, E., Girardi, C., Zanoli, R., Kessler, F.B.: The textpro tool suite. In: Proceedings of LREC-08 (2008)
Piskorski, J., Zavarella, V., Atkinson, M., Verile, M.: Timelines: entity-centric event extraction from online news. In: Proceedings of Text2Story - Third Workshop on Narrative Extraction From Texts. CEUR Workshop Proceedings, vol. 2593, pp. 105–114. CEUR-WS.org (2020)
Po, L., Rollo, F.: Building an urban theft map by analyzing newspaper crime reports. In: 13th International Workshop on Semantic and Social Media Adaptation and Personalization, SMAP Zaragoza, Spain, pp. 13–18 (2018). https://doi.org/10.1109/SMAP.2018.8501866
Po, L., Rollo, F., Lado, R.T.: Topic detection in multichannel Italian newspapers. In: Semantic Keyword-Based Search on Structured Data Sources - COST Action IC1302 Second International KEYSTONE Conference, IKC, Cluj-Napoca, Romania, pp. 62–75 (2016). https://doi.org/10.1007/978-3-319-53640-8_6
Rollo, F.: A key-entity graph for clustering multichannel news: student research abstract. In: Proceedings of the Symposium on Applied Computing, SAC 2017, Marrakech, Morocco, pp. 699–700 (2017). https://doi.org/10.1145/3019612.3019930
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: International Conference on New Methods in Language Processing, Manchester, UK (1994)
Stadler, C., Lehmann, J., Höffner, K., Auer, S.: Linkedgeodata: a core for a web of spatial open data. Semant. Web 3, 333–354 (2012)
Strötgen, J., Gertz, M.: Heideltime: high quality rule-based extraction and normalization of temporal expressions. In: Proceedings of the International Workshop on Semantic Evaluation, SemEval@ACL, pp. 321–324 (2010)
Wang, T., Rudin, C., Wagner, D., Sevieri, R.: Learning to detect patterns of crime. In: Machine Learning and Knowledge Discovery in Databases, pp. 515–530 (2013)
Zhang, K., Zhang, C., Chen, X., Tan, J.: Automatic web news extraction based on DS theory considering content topics. In: Proceedings of International Conference. LNCS, vol. 10860, pp. 194–207. Springer (2018). https://doi.org/10.1007/978-3-319-93698-7_15
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Rollo, F., Po, L. (2020). Crime Event Localization and Deduplication. In: Pan, J.Z., et al. The Semantic Web – ISWC 2020. ISWC 2020. Lecture Notes in Computer Science(), vol 12507. Springer, Cham. https://doi.org/10.1007/978-3-030-62466-8_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-62466-8_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62465-1
Online ISBN: 978-3-030-62466-8
eBook Packages: Computer ScienceComputer Science (R0)