An Effective Fuzzy Clustering of Crime Reports Embedded by a Universal Sentence Encoder Model
Crime reports clustering is crucial for identifying and preventing criminal activities that frequently happened in society. In the proposed work, named entities in a report are recognized to extract the crime-related phrases and subsequently, the phrases are preprocessed by applying stopword removal...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-01-01
|
Series: | Mathematics |
Subjects: | |
Online Access: | https://www.mdpi.com/2227-7390/11/3/611 |
_version_ | 1797623819779702784 |
---|---|
author | Aparna Pramanik Asit Kumar Das Danilo Pelusi Janmenjoy Nayak |
author_facet | Aparna Pramanik Asit Kumar Das Danilo Pelusi Janmenjoy Nayak |
author_sort | Aparna Pramanik |
collection | DOAJ |
description | Crime reports clustering is crucial for identifying and preventing criminal activities that frequently happened in society. In the proposed work, named entities in a report are recognized to extract the crime-related phrases and subsequently, the phrases are preprocessed by applying stopword removal and lemmatization operations. Next, the module of the universal encoder model, called the transformer, is applied to extract phrases of the report to get a sentence embedding for each associated sentence, aggregation of which finally provides the vector representation of that report. An innovative and efficient graph-based clustering algorithm consisting of splitting and merging operations has been proposed to get the cluster of crime reports. The proposed clustering algorithm generates overlapping clusters, which indicates the existence of reports of multiple crime types. The fuzzy theory has been used to provide a score to the report for expressing its membership into different clusters, and accordingly, the reports are labelled by multiple categories. The efficiency of the proposed method has been assessed by taking into account different datasets and comparing them with other state-of-the-art approaches with the help of various performance measure metrics. |
first_indexed | 2024-03-11T09:34:10Z |
format | Article |
id | doaj.art-37309f3c93fe47479612178d18a682d8 |
institution | Directory Open Access Journal |
issn | 2227-7390 |
language | English |
last_indexed | 2024-03-11T09:34:10Z |
publishDate | 2023-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Mathematics |
spelling | doaj.art-37309f3c93fe47479612178d18a682d82023-11-16T17:22:05ZengMDPI AGMathematics2227-73902023-01-0111361110.3390/math11030611An Effective Fuzzy Clustering of Crime Reports Embedded by a Universal Sentence Encoder ModelAparna Pramanik0Asit Kumar Das1Danilo Pelusi2Janmenjoy Nayak3Department of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, Howrah 711103, West Bengal, IndiaDepartment of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, Howrah 711103, West Bengal, IndiaDepartment of Communication Sciences, University of Teramo, 64100 Teramo, ItalyPost Graduate Department of Computer Science, Maharaja Sriram Chandra Bhanja Deo (MSCB) University, Baripada 757003, Odisha, IndiaCrime reports clustering is crucial for identifying and preventing criminal activities that frequently happened in society. In the proposed work, named entities in a report are recognized to extract the crime-related phrases and subsequently, the phrases are preprocessed by applying stopword removal and lemmatization operations. Next, the module of the universal encoder model, called the transformer, is applied to extract phrases of the report to get a sentence embedding for each associated sentence, aggregation of which finally provides the vector representation of that report. An innovative and efficient graph-based clustering algorithm consisting of splitting and merging operations has been proposed to get the cluster of crime reports. The proposed clustering algorithm generates overlapping clusters, which indicates the existence of reports of multiple crime types. The fuzzy theory has been used to provide a score to the report for expressing its membership into different clusters, and accordingly, the reports are labelled by multiple categories. The efficiency of the proposed method has been assessed by taking into account different datasets and comparing them with other state-of-the-art approaches with the help of various performance measure metrics.https://www.mdpi.com/2227-7390/11/3/611crime report analysisnamed entity recognitionuniversal encoder-based feature embeddinggraph-based clusteringoverlapping clustersfuzzy theory |
spellingShingle | Aparna Pramanik Asit Kumar Das Danilo Pelusi Janmenjoy Nayak An Effective Fuzzy Clustering of Crime Reports Embedded by a Universal Sentence Encoder Model Mathematics crime report analysis named entity recognition universal encoder-based feature embedding graph-based clustering overlapping clusters fuzzy theory |
title | An Effective Fuzzy Clustering of Crime Reports Embedded by a Universal Sentence Encoder Model |
title_full | An Effective Fuzzy Clustering of Crime Reports Embedded by a Universal Sentence Encoder Model |
title_fullStr | An Effective Fuzzy Clustering of Crime Reports Embedded by a Universal Sentence Encoder Model |
title_full_unstemmed | An Effective Fuzzy Clustering of Crime Reports Embedded by a Universal Sentence Encoder Model |
title_short | An Effective Fuzzy Clustering of Crime Reports Embedded by a Universal Sentence Encoder Model |
title_sort | effective fuzzy clustering of crime reports embedded by a universal sentence encoder model |
topic | crime report analysis named entity recognition universal encoder-based feature embedding graph-based clustering overlapping clusters fuzzy theory |
url | https://www.mdpi.com/2227-7390/11/3/611 |
work_keys_str_mv | AT aparnapramanik aneffectivefuzzyclusteringofcrimereportsembeddedbyauniversalsentenceencodermodel AT asitkumardas aneffectivefuzzyclusteringofcrimereportsembeddedbyauniversalsentenceencodermodel AT danilopelusi aneffectivefuzzyclusteringofcrimereportsembeddedbyauniversalsentenceencodermodel AT janmenjoynayak aneffectivefuzzyclusteringofcrimereportsembeddedbyauniversalsentenceencodermodel AT aparnapramanik effectivefuzzyclusteringofcrimereportsembeddedbyauniversalsentenceencodermodel AT asitkumardas effectivefuzzyclusteringofcrimereportsembeddedbyauniversalsentenceencodermodel AT danilopelusi effectivefuzzyclusteringofcrimereportsembeddedbyauniversalsentenceencodermodel AT janmenjoynayak effectivefuzzyclusteringofcrimereportsembeddedbyauniversalsentenceencodermodel |