An Effective Fuzzy Clustering of Crime Reports Embedded by a Universal Sentence Encoder Model

Crime reports clustering is crucial for identifying and preventing criminal activities that frequently happened in society. In the proposed work, named entities in a report are recognized to extract the crime-related phrases and subsequently, the phrases are preprocessed by applying stopword removal...

Full description

Bibliographic Details
Main Authors: Aparna Pramanik, Asit Kumar Das, Danilo Pelusi, Janmenjoy Nayak
Format: Article
Language:English
Published: MDPI AG 2023-01-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/11/3/611
_version_ 1797623819779702784
author Aparna Pramanik
Asit Kumar Das
Danilo Pelusi
Janmenjoy Nayak
author_facet Aparna Pramanik
Asit Kumar Das
Danilo Pelusi
Janmenjoy Nayak
author_sort Aparna Pramanik
collection DOAJ
description Crime reports clustering is crucial for identifying and preventing criminal activities that frequently happened in society. In the proposed work, named entities in a report are recognized to extract the crime-related phrases and subsequently, the phrases are preprocessed by applying stopword removal and lemmatization operations. Next, the module of the universal encoder model, called the transformer, is applied to extract phrases of the report to get a sentence embedding for each associated sentence, aggregation of which finally provides the vector representation of that report. An innovative and efficient graph-based clustering algorithm consisting of splitting and merging operations has been proposed to get the cluster of crime reports. The proposed clustering algorithm generates overlapping clusters, which indicates the existence of reports of multiple crime types. The fuzzy theory has been used to provide a score to the report for expressing its membership into different clusters, and accordingly, the reports are labelled by multiple categories. The efficiency of the proposed method has been assessed by taking into account different datasets and comparing them with other state-of-the-art approaches with the help of various performance measure metrics.
first_indexed 2024-03-11T09:34:10Z
format Article
id doaj.art-37309f3c93fe47479612178d18a682d8
institution Directory Open Access Journal
issn 2227-7390
language English
last_indexed 2024-03-11T09:34:10Z
publishDate 2023-01-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj.art-37309f3c93fe47479612178d18a682d82023-11-16T17:22:05ZengMDPI AGMathematics2227-73902023-01-0111361110.3390/math11030611An Effective Fuzzy Clustering of Crime Reports Embedded by a Universal Sentence Encoder ModelAparna Pramanik0Asit Kumar Das1Danilo Pelusi2Janmenjoy Nayak3Department of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, Howrah 711103, West Bengal, IndiaDepartment of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, Howrah 711103, West Bengal, IndiaDepartment of Communication Sciences, University of Teramo, 64100 Teramo, ItalyPost Graduate Department of Computer Science, Maharaja Sriram Chandra Bhanja Deo (MSCB) University, Baripada 757003, Odisha, IndiaCrime reports clustering is crucial for identifying and preventing criminal activities that frequently happened in society. In the proposed work, named entities in a report are recognized to extract the crime-related phrases and subsequently, the phrases are preprocessed by applying stopword removal and lemmatization operations. Next, the module of the universal encoder model, called the transformer, is applied to extract phrases of the report to get a sentence embedding for each associated sentence, aggregation of which finally provides the vector representation of that report. An innovative and efficient graph-based clustering algorithm consisting of splitting and merging operations has been proposed to get the cluster of crime reports. The proposed clustering algorithm generates overlapping clusters, which indicates the existence of reports of multiple crime types. The fuzzy theory has been used to provide a score to the report for expressing its membership into different clusters, and accordingly, the reports are labelled by multiple categories. The efficiency of the proposed method has been assessed by taking into account different datasets and comparing them with other state-of-the-art approaches with the help of various performance measure metrics.https://www.mdpi.com/2227-7390/11/3/611crime report analysisnamed entity recognitionuniversal encoder-based feature embeddinggraph-based clusteringoverlapping clustersfuzzy theory
spellingShingle Aparna Pramanik
Asit Kumar Das
Danilo Pelusi
Janmenjoy Nayak
An Effective Fuzzy Clustering of Crime Reports Embedded by a Universal Sentence Encoder Model
Mathematics
crime report analysis
named entity recognition
universal encoder-based feature embedding
graph-based clustering
overlapping clusters
fuzzy theory
title An Effective Fuzzy Clustering of Crime Reports Embedded by a Universal Sentence Encoder Model
title_full An Effective Fuzzy Clustering of Crime Reports Embedded by a Universal Sentence Encoder Model
title_fullStr An Effective Fuzzy Clustering of Crime Reports Embedded by a Universal Sentence Encoder Model
title_full_unstemmed An Effective Fuzzy Clustering of Crime Reports Embedded by a Universal Sentence Encoder Model
title_short An Effective Fuzzy Clustering of Crime Reports Embedded by a Universal Sentence Encoder Model
title_sort effective fuzzy clustering of crime reports embedded by a universal sentence encoder model
topic crime report analysis
named entity recognition
universal encoder-based feature embedding
graph-based clustering
overlapping clusters
fuzzy theory
url https://www.mdpi.com/2227-7390/11/3/611
work_keys_str_mv AT aparnapramanik aneffectivefuzzyclusteringofcrimereportsembeddedbyauniversalsentenceencodermodel
AT asitkumardas aneffectivefuzzyclusteringofcrimereportsembeddedbyauniversalsentenceencodermodel
AT danilopelusi aneffectivefuzzyclusteringofcrimereportsembeddedbyauniversalsentenceencodermodel
AT janmenjoynayak aneffectivefuzzyclusteringofcrimereportsembeddedbyauniversalsentenceencodermodel
AT aparnapramanik effectivefuzzyclusteringofcrimereportsembeddedbyauniversalsentenceencodermodel
AT asitkumardas effectivefuzzyclusteringofcrimereportsembeddedbyauniversalsentenceencodermodel
AT danilopelusi effectivefuzzyclusteringofcrimereportsembeddedbyauniversalsentenceencodermodel
AT janmenjoynayak effectivefuzzyclusteringofcrimereportsembeddedbyauniversalsentenceencodermodel