Skip to main content

Crime Event Localization and Deduplication

  • Conference paper
  • First Online:
The Semantic Web – ISWC 2020 (ISWC 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12507))

Included in the following conference series:

Abstract

Crime analysis is an approach for identifying patterns and trends in crime events, while information extraction is the task of extracting relevant information from unstructured data. If crime reports are not directly available to the public, a possible solution is to derive crime information published in newspaper articles.

This paper aims at extracting, localizing, deduplicating, and visualizing crime events from online news articles. This work demonstrates how crime-related information can be obtained from newspapers and exploited to create a consistent database of crime events with an automatic process. The approach employs a Named Entity Recognition (NER) algorithm to retrieve locations, organizations and persons and a mapping phase to link entities to Linked Data resources. The date of the event is retrieved through the temporal expressions extraction and normalization. For duplicate detection, an approach analyses and combines crime category, description, location, and crime event date to identify which news articles refer to the same event. The approach has been successfully applied in the Modena province (Italy), focusing on eleven types of crime happen from 2011 till now. The flexibility of the approach allows it to be easily adapted to other cities, regions, or countries and also to other domains.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    On the police open data portal https://data.police.uk/, it was possible to download data about March 2020 on the 21st May 2020.

  2. 2.

    https://www.istat.it/en/.

  3. 3.

    https://emm.newsexplorer.eu/.

  4. 4.

    https://emm.newsbrief.eu/.

  5. 5.

    https://www.crunchbase.com/organization/opencalais.

  6. 6.

    https://eventregistry.org/.

  7. 7.

    https://data.cityofchicago.org/Public-Safety/Crimes-Map/dfnk-7re6.

  8. 8.

    http://aperto.comune.torino.it/.

  9. 9.

    European Data Portal https://www.europeandataportal.eu/it.

  10. 10.

    The code of both applications is open source and is available in a github repository https://github.com/federicarollo/Crime-event-localization-and-deduplication.

  11. 11.

    An example is available at https://gazzettadimodena.gelocal.it/ricerca?query=furti where furti (theft) is a type of crime.

  12. 12.

    http://tint.fbk.eu/.

  13. 13.

    http://dbpedia.org/resource/.

  14. 14.

    https://nominatim.openstreetmap.org/.

  15. 15.

    We use the function OpenStreetMapUtils.getInstance().getCoordinates(location) where location is a string that can be generated by the municipality and the address, or the entity name retrieved from the DB. This function provides the latitude and the longitude of the location. The success of this function depends on how the address is stored in Open Street Map and how the location is reported in the news.

  16. 16.

    https://github.com/tdebatty/java-string-similarity.

  17. 17.

    http://dati.istat.it/Index.aspx?QueryId=25097&lang=en.

  18. 18.

    Crime Visualization App - https://dbgroup.ing.unimore.it/crimemap.

  19. 19.

    The test has been performed on a Microsoft Windows 10 Pro with 16 GB RAM.

References

  1. Agarwal, N., Rawat, M., Maheshwari, V.: Comparative analysis of jaccard coefficient and cosine similarity for web document similarity measure. Int. J. Adv. Res. Eng. Technol. 2(X), 18–21 (2014)

    Google Scholar 

  2. Alonso, O., Fetterly, D., Manasse, M.: Duplicate news story detection revisited. In: Banchs, R.E., Silvestri, F., Liu, T.-Y., Zhang, M., Gao, S., Lang, J. (eds.) AIRS 2013. LNCS, vol. 8281, pp. 203–214. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-45068-6_18

    Chapter  Google Scholar 

  3. Alqahtani, A., Garima, A., Alaiad, A.: Crime analysis in Chicago city. In: International Conference on Information and Communication Systems (ICICS), pp. 166–172 (2019). https://doi.org/10.1109/IACS.2019.8809142

  4. Arya, C., Dwivedi, S.K.: Content extraction from news web pages using tag tree. Int. J. Auton. Comp. 3(1), 34–51 (2018). https://doi.org/10.1504/IJAC.2018.10013755

    Article  Google Scholar 

  5. Attardi, G., Dei Rossi, S., Simi, M.: The tanl pipeline. In: Proceedings of the Workshop on Web Services and Processing Pipelines in HLT, co-located LREC (2010)

    Google Scholar 

  6. Broder, A.Z., Glassman, S.C., Manasse, M.S., Zweig, G.: Syntactic clustering of the web. Comput. Netw. ISDN Syst. 29(8–13), 1157–1166 (1997)

    Article  Google Scholar 

  7. Chaulagain, B., Shakya, A., Bhatt, B., Newar, D.K.P., Panday, S.P., Pandey, R.K.: Casualty information extraction and analysis from news. In: Proceedings of the International Conference on Information Systems for Crisis Response and Management. ISCRAM Association (2019)

    Google Scholar 

  8. Keyvanpour, M.R., Javideh, M., Ebrahimi, M.R.: Detecting and investigating crime by means of data mining: a general crime matching framework. Proc. Comput. Sci. 3, 872–880 (2011)

    Article  Google Scholar 

  9. Oatley, G., Zeleznikow, J., Ewart, B.: Matching and predicting crimes. In: International Conference on Innovative Techniques and Applications of Artificial Intelligence, pp. 19–32. Springer (2004). https://doi.org/10.1007/1-84628-103-2_2

  10. Palmero Aprosio, A., Moretti, G.: Italy goes to Stanford: a collection of CoreNLP modules for Italian. ArXiv e-prints (2016)

    Google Scholar 

  11. Pianta, E., Girardi, C., Zanoli, R., Kessler, F.B.: The textpro tool suite. In: Proceedings of LREC-08 (2008)

    Google Scholar 

  12. Piskorski, J., Zavarella, V., Atkinson, M., Verile, M.: Timelines: entity-centric event extraction from online news. In: Proceedings of Text2Story - Third Workshop on Narrative Extraction From Texts. CEUR Workshop Proceedings, vol. 2593, pp. 105–114. CEUR-WS.org (2020)

    Google Scholar 

  13. Po, L., Rollo, F.: Building an urban theft map by analyzing newspaper crime reports. In: 13th International Workshop on Semantic and Social Media Adaptation and Personalization, SMAP Zaragoza, Spain, pp. 13–18 (2018). https://doi.org/10.1109/SMAP.2018.8501866

  14. Po, L., Rollo, F., Lado, R.T.: Topic detection in multichannel Italian newspapers. In: Semantic Keyword-Based Search on Structured Data Sources - COST Action IC1302 Second International KEYSTONE Conference, IKC, Cluj-Napoca, Romania, pp. 62–75 (2016). https://doi.org/10.1007/978-3-319-53640-8_6

  15. Rollo, F.: A key-entity graph for clustering multichannel news: student research abstract. In: Proceedings of the Symposium on Applied Computing, SAC 2017, Marrakech, Morocco, pp. 699–700 (2017). https://doi.org/10.1145/3019612.3019930

  16. Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: International Conference on New Methods in Language Processing, Manchester, UK (1994)

    Google Scholar 

  17. Stadler, C., Lehmann, J., Höffner, K., Auer, S.: Linkedgeodata: a core for a web of spatial open data. Semant. Web 3, 333–354 (2012)

    Article  Google Scholar 

  18. Strötgen, J., Gertz, M.: Heideltime: high quality rule-based extraction and normalization of temporal expressions. In: Proceedings of the International Workshop on Semantic Evaluation, SemEval@ACL, pp. 321–324 (2010)

    Google Scholar 

  19. Wang, T., Rudin, C., Wagner, D., Sevieri, R.: Learning to detect patterns of crime. In: Machine Learning and Knowledge Discovery in Databases, pp. 515–530 (2013)

    Google Scholar 

  20. Zhang, K., Zhang, C., Chen, X., Tan, J.: Automatic web news extraction based on DS theory considering content topics. In: Proceedings of International Conference. LNCS, vol. 10860, pp. 194–207. Springer (2018). https://doi.org/10.1007/978-3-319-93698-7_15

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Federica Rollo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rollo, F., Po, L. (2020). Crime Event Localization and Deduplication. In: Pan, J.Z., et al. The Semantic Web – ISWC 2020. ISWC 2020. Lecture Notes in Computer Science(), vol 12507. Springer, Cham. https://doi.org/10.1007/978-3-030-62466-8_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-62466-8_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-62465-1

  • Online ISBN: 978-3-030-62466-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics