Exploration of the Hidden Influential Factors on Crime Activities: A Big Data Approach

Crime activities have long been a great concern of all the countries. Analysis of crime data has been a key part yet a considerable challenge for discovering crime patterns and reducing crimes. In recent year, along with the development of data collection and data mining techniques, lots of big data...

Full description

Bibliographic Details
Main Authors: Jianming Zhou, Zheng Li, Jack J. Ma, Feifeng Jiang
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9143124/
_version_ 1818917697658814464
author Jianming Zhou
Zheng Li
Jack J. Ma
Feifeng Jiang
author_facet Jianming Zhou
Zheng Li
Jack J. Ma
Feifeng Jiang
author_sort Jianming Zhou
collection DOAJ
description Crime activities have long been a great concern of all the countries. Analysis of crime data has been a key part yet a considerable challenge for discovering crime patterns and reducing crimes. In recent year, along with the development of data collection and data mining techniques, lots of big data-related studies have been conducted to analyze the crime data. Studying the numerical influential factors is one important yet challenging problem, especially for those indirect features. Though a number of studies have been conducted to analyze the influential factors of crime activities, most of them have some limitations in the era of “big data”. Some adopted the linear statistical methods, of which the basic assumption is opposite to the non-linear real world. Some limited their studied factors within one or two aspects. Some overlooked the importance of ranking the influence of factors. To fill these research gaps, this paper proposes a big data approach to analyze the influential factors on the crime activities, and experimented it on New York City. More than 1515 different factors ranging from demographic, housing, education, economy, social, and city planning were considered and analyzed. The proposed framework combines non-linear machine learning algorithms and geographical information system (GIS) to study the spatial determinants of crimes. Recursive feature elimination (RFE) is used to select the optimum feature set. Performance of gradient boost decision tree (GBDT), logistic regression (LR), support vector machine (SVM), artificial neural network (ANN) and random forest (RF) are compared to generate the optimum model. Important impact factors were then investigated using GBDT and GIS. The experimental results demonstrate that the combined GBDT and GIS model can find out the most important factors of crime rate with high efficiency and accuracy.
first_indexed 2024-12-20T00:38:11Z
format Article
id doaj.art-ce6c3dec559544aa995976439d991a18
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-20T00:38:11Z
publishDate 2020-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-ce6c3dec559544aa995976439d991a182022-12-21T19:59:41ZengIEEEIEEE Access2169-35362020-01-01814103314104510.1109/ACCESS.2020.30099699143124Exploration of the Hidden Influential Factors on Crime Activities: A Big Data ApproachJianming Zhou0Zheng Li1https://orcid.org/0000-0002-5893-3167Jack J. Ma2https://orcid.org/0000-0002-7826-0449Feifeng Jiang3China Unicom Guangzhou Branch, Guangzhou, ChinaDepartment of Research and Development, Big Bay Innovation Research and Development Limited, Hong KongDepartment of Research and Development, Big Bay Innovation Research and Development Limited, Hong KongDepartment of Architecture and Civil Engineering, City University of Hong Kong, Hong KongCrime activities have long been a great concern of all the countries. Analysis of crime data has been a key part yet a considerable challenge for discovering crime patterns and reducing crimes. In recent year, along with the development of data collection and data mining techniques, lots of big data-related studies have been conducted to analyze the crime data. Studying the numerical influential factors is one important yet challenging problem, especially for those indirect features. Though a number of studies have been conducted to analyze the influential factors of crime activities, most of them have some limitations in the era of “big data”. Some adopted the linear statistical methods, of which the basic assumption is opposite to the non-linear real world. Some limited their studied factors within one or two aspects. Some overlooked the importance of ranking the influence of factors. To fill these research gaps, this paper proposes a big data approach to analyze the influential factors on the crime activities, and experimented it on New York City. More than 1515 different factors ranging from demographic, housing, education, economy, social, and city planning were considered and analyzed. The proposed framework combines non-linear machine learning algorithms and geographical information system (GIS) to study the spatial determinants of crimes. Recursive feature elimination (RFE) is used to select the optimum feature set. Performance of gradient boost decision tree (GBDT), logistic regression (LR), support vector machine (SVM), artificial neural network (ANN) and random forest (RF) are compared to generate the optimum model. Important impact factors were then investigated using GBDT and GIS. The experimental results demonstrate that the combined GBDT and GIS model can find out the most important factors of crime rate with high efficiency and accuracy.https://ieeexplore.ieee.org/document/9143124/Big data techniquesfeature analysisfelony assaultgradient boost decision treemachine learningrecursive feature elimination
spellingShingle Jianming Zhou
Zheng Li
Jack J. Ma
Feifeng Jiang
Exploration of the Hidden Influential Factors on Crime Activities: A Big Data Approach
IEEE Access
Big data techniques
feature analysis
felony assault
gradient boost decision tree
machine learning
recursive feature elimination
title Exploration of the Hidden Influential Factors on Crime Activities: A Big Data Approach
title_full Exploration of the Hidden Influential Factors on Crime Activities: A Big Data Approach
title_fullStr Exploration of the Hidden Influential Factors on Crime Activities: A Big Data Approach
title_full_unstemmed Exploration of the Hidden Influential Factors on Crime Activities: A Big Data Approach
title_short Exploration of the Hidden Influential Factors on Crime Activities: A Big Data Approach
title_sort exploration of the hidden influential factors on crime activities a big data approach
topic Big data techniques
feature analysis
felony assault
gradient boost decision tree
machine learning
recursive feature elimination
url https://ieeexplore.ieee.org/document/9143124/
work_keys_str_mv AT jianmingzhou explorationofthehiddeninfluentialfactorsoncrimeactivitiesabigdataapproach
AT zhengli explorationofthehiddeninfluentialfactorsoncrimeactivitiesabigdataapproach
AT jackjma explorationofthehiddeninfluentialfactorsoncrimeactivitiesabigdataapproach
AT feifengjiang explorationofthehiddeninfluentialfactorsoncrimeactivitiesabigdataapproach