A Graph Database Representation of Portuguese Criminal-Related Documents

Organizations have been challenged by the need to process an increasing amount of data, both structured and unstructured, retrieved from heterogeneous sources. Criminal investigation police are among these organizations, as they have to manually process a vast number of criminal reports, news articl...

Full description

Bibliographic Details
Main Authors: Gonçalo Carnaz, Vitor Beires Nogueira, Mário Antunes
Format: Article
Language:English
Published: MDPI AG 2021-06-01
Series:Informatics
Subjects:
Online Access:https://www.mdpi.com/2227-9709/8/2/37
_version_ 1797531351592730624
author Gonçalo Carnaz
Vitor Beires Nogueira
Mário Antunes
author_facet Gonçalo Carnaz
Vitor Beires Nogueira
Mário Antunes
author_sort Gonçalo Carnaz
collection DOAJ
description Organizations have been challenged by the need to process an increasing amount of data, both structured and unstructured, retrieved from heterogeneous sources. Criminal investigation police are among these organizations, as they have to manually process a vast number of criminal reports, news articles related to crimes, occurrence and evidence reports, and other unstructured documents. Automatic extraction and representation of data and knowledge in such documents is an essential task to reduce the manual analysis burden and to automate the discovering of names and entities relationships that may exist in a case. This paper presents <i>SEMCrime</i>, a framework used to extract and classify named-entities and relations in Portuguese criminal reports and documents, and represent the data retrieved into a graph database. A <i>5WH1</i> (Who, What, Why, Where, When, and How) information extraction method was applied, and a graph database representation was used to store and visualize the relations extracted from the documents. Promising results were obtained with a prototype developed to evaluate the framework, namely a name-entity recognition with an F-Measure of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.73</mn></mrow></semantics></math></inline-formula>, and a 5W1H information extraction performance with an F-Measure of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.65</mn></mrow></semantics></math></inline-formula>.
first_indexed 2024-03-10T10:42:33Z
format Article
id doaj.art-0254df6d6983408a8ea2ad7f64a72b29
institution Directory Open Access Journal
issn 2227-9709
language English
last_indexed 2024-03-10T10:42:33Z
publishDate 2021-06-01
publisher MDPI AG
record_format Article
series Informatics
spelling doaj.art-0254df6d6983408a8ea2ad7f64a72b292023-11-21T22:49:02ZengMDPI AGInformatics2227-97092021-06-01823710.3390/informatics8020037A Graph Database Representation of Portuguese Criminal-Related DocumentsGonçalo Carnaz0Vitor Beires Nogueira1Mário Antunes2Informatics Departament, University of Évora, 7002-554 Évora, PortugalInformatics Departament, University of Évora, 7002-554 Évora, PortugalComputer Science and Communication Research Centre (CIIC), School of Technology and Management, Polytechnic of Leiria, 2411-901 Leiria, PortugalOrganizations have been challenged by the need to process an increasing amount of data, both structured and unstructured, retrieved from heterogeneous sources. Criminal investigation police are among these organizations, as they have to manually process a vast number of criminal reports, news articles related to crimes, occurrence and evidence reports, and other unstructured documents. Automatic extraction and representation of data and knowledge in such documents is an essential task to reduce the manual analysis burden and to automate the discovering of names and entities relationships that may exist in a case. This paper presents <i>SEMCrime</i>, a framework used to extract and classify named-entities and relations in Portuguese criminal reports and documents, and represent the data retrieved into a graph database. A <i>5WH1</i> (Who, What, Why, Where, When, and How) information extraction method was applied, and a graph database representation was used to store and visualize the relations extracted from the documents. Promising results were obtained with a prototype developed to evaluate the framework, namely a name-entity recognition with an F-Measure of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.73</mn></mrow></semantics></math></inline-formula>, and a 5W1H information extraction performance with an F-Measure of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.65</mn></mrow></semantics></math></inline-formula>.https://www.mdpi.com/2227-9709/8/2/37knowledge representationgraph databasesnatural language processingcriminal-related documentscybersecuritycriminal domain, police reports
spellingShingle Gonçalo Carnaz
Vitor Beires Nogueira
Mário Antunes
A Graph Database Representation of Portuguese Criminal-Related Documents
Informatics
knowledge representation
graph databases
natural language processing
criminal-related documents
cybersecurity
criminal domain, police reports
title A Graph Database Representation of Portuguese Criminal-Related Documents
title_full A Graph Database Representation of Portuguese Criminal-Related Documents
title_fullStr A Graph Database Representation of Portuguese Criminal-Related Documents
title_full_unstemmed A Graph Database Representation of Portuguese Criminal-Related Documents
title_short A Graph Database Representation of Portuguese Criminal-Related Documents
title_sort graph database representation of portuguese criminal related documents
topic knowledge representation
graph databases
natural language processing
criminal-related documents
cybersecurity
criminal domain, police reports
url https://www.mdpi.com/2227-9709/8/2/37
work_keys_str_mv AT goncalocarnaz agraphdatabaserepresentationofportuguesecriminalrelateddocuments
AT vitorbeiresnogueira agraphdatabaserepresentationofportuguesecriminalrelateddocuments
AT marioantunes agraphdatabaserepresentationofportuguesecriminalrelateddocuments
AT goncalocarnaz graphdatabaserepresentationofportuguesecriminalrelateddocuments
AT vitorbeiresnogueira graphdatabaserepresentationofportuguesecriminalrelateddocuments
AT marioantunes graphdatabaserepresentationofportuguesecriminalrelateddocuments