Automatic Extraction of Research Themes in Epidemiological Criminology From PubMed Abstracts From 1946 to 2020: Text Mining Study

BackgroundThe emerging field of epidemiological criminology studies the intersection between public health and justice systems. To increase the value of and reduce waste in research activities in this area, it is important to perform transparent research priority setting cons...

Full description

Bibliographic Details
Main Authors: George Karystianis, Paul Simpson, Wilson Lukmanjaya, Natasha Ginnivan, Goran Nenadic, Iain Buchan, Tony Butler
Format: Article
Language:English
Published: JMIR Publications 2023-09-01
Series:JMIR Formative Research
Online Access:https://formative.jmir.org/2023/1/e49721
_version_ 1797676916549877760
author George Karystianis
Paul Simpson
Wilson Lukmanjaya
Natasha Ginnivan
Goran Nenadic
Iain Buchan
Tony Butler
author_facet George Karystianis
Paul Simpson
Wilson Lukmanjaya
Natasha Ginnivan
Goran Nenadic
Iain Buchan
Tony Butler
author_sort George Karystianis
collection DOAJ
description BackgroundThe emerging field of epidemiological criminology studies the intersection between public health and justice systems. To increase the value of and reduce waste in research activities in this area, it is important to perform transparent research priority setting considering the needs of research beneficiaries and end users along with a systematic assessment of the existing research activities to address gaps and harness opportunities. ObjectiveIn this study, we aimed to examine published research outputs in epidemiological criminology to assess gaps between published outputs and current research priorities identified by prison stakeholders. MethodsA rule-based method was applied to 23,904 PubMed epidemiological criminology abstracts to extract the study determinants and outcomes (ie, “themes”). These were mapped against the research priorities identified by Australian prison stakeholders to assess the differences from research outputs. The income level of the affiliation country of the first authors was also identified to compare the ranking of research priorities in countries categorized by income levels. ResultsOn an evaluation set of 100 abstracts, the identification of themes returned an F1-score of 90%, indicating reliable performance. More than 53.3% (11,927/22,361) of the articles had at least 1 extracted theme; the most common was substance use (1533/11,814, 12.97%), followed by HIV (1493/11,814, 12.64%). The infectious disease category (2949/11,814, 24.96%) was the most common research priority category, followed by mental health (2840/11,814, 24.04%) and alcohol and other drug use (2433/11,814, 20.59%). A comparison between the extracted themes and the stakeholder priorities showed an alignment for mental health, infectious diseases, and alcohol and other drug use. Although behavior- and juvenile-related themes were common, they did not feature as prison priorities. Most studies were conducted in high-income countries (10,083/11,814, 85.35%), while countries with the lowest income status focused half of their research on infectious diseases (47/91, 52%). ConclusionsThe identification of research themes from PubMed epidemiological criminology research abstracts is possible through the application of a rule-based text mining method. The frequency of the investigated themes may reflect historical developments concerning disease prevalence, treatment advances, and the social understanding of illness and incarcerated populations. The differences between income status groups are likely to be explained by local health priorities and immediate health risks. Notable gaps between stakeholder research priorities and research outputs concerned themes that were more focused on social factors and systems and may reflect publication bias or self-publication selection, highlighting the need for further research on prison health services and the social determinants of health. Different jurisdictions, countries, and regions should undertake similar systematic and transparent research priority–setting processes.
first_indexed 2024-03-11T22:37:23Z
format Article
id doaj.art-eae6f4a871e044909adc49b7b819595e
institution Directory Open Access Journal
issn 2561-326X
language English
last_indexed 2024-03-11T22:37:23Z
publishDate 2023-09-01
publisher JMIR Publications
record_format Article
series JMIR Formative Research
spelling doaj.art-eae6f4a871e044909adc49b7b819595e2023-09-22T13:16:14ZengJMIR PublicationsJMIR Formative Research2561-326X2023-09-017e4972110.2196/49721Automatic Extraction of Research Themes in Epidemiological Criminology From PubMed Abstracts From 1946 to 2020: Text Mining StudyGeorge Karystianishttps://orcid.org/0000-0003-3491-361XPaul Simpsonhttps://orcid.org/0000-0002-1947-8923Wilson Lukmanjayahttps://orcid.org/0000-0002-7747-4648Natasha Ginnivanhttps://orcid.org/0000-0002-8581-6812Goran Nenadichttps://orcid.org/0000-0003-0795-5363Iain Buchanhttps://orcid.org/0000-0003-3392-1650Tony Butlerhttps://orcid.org/0000-0002-2679-2769 BackgroundThe emerging field of epidemiological criminology studies the intersection between public health and justice systems. To increase the value of and reduce waste in research activities in this area, it is important to perform transparent research priority setting considering the needs of research beneficiaries and end users along with a systematic assessment of the existing research activities to address gaps and harness opportunities. ObjectiveIn this study, we aimed to examine published research outputs in epidemiological criminology to assess gaps between published outputs and current research priorities identified by prison stakeholders. MethodsA rule-based method was applied to 23,904 PubMed epidemiological criminology abstracts to extract the study determinants and outcomes (ie, “themes”). These were mapped against the research priorities identified by Australian prison stakeholders to assess the differences from research outputs. The income level of the affiliation country of the first authors was also identified to compare the ranking of research priorities in countries categorized by income levels. ResultsOn an evaluation set of 100 abstracts, the identification of themes returned an F1-score of 90%, indicating reliable performance. More than 53.3% (11,927/22,361) of the articles had at least 1 extracted theme; the most common was substance use (1533/11,814, 12.97%), followed by HIV (1493/11,814, 12.64%). The infectious disease category (2949/11,814, 24.96%) was the most common research priority category, followed by mental health (2840/11,814, 24.04%) and alcohol and other drug use (2433/11,814, 20.59%). A comparison between the extracted themes and the stakeholder priorities showed an alignment for mental health, infectious diseases, and alcohol and other drug use. Although behavior- and juvenile-related themes were common, they did not feature as prison priorities. Most studies were conducted in high-income countries (10,083/11,814, 85.35%), while countries with the lowest income status focused half of their research on infectious diseases (47/91, 52%). ConclusionsThe identification of research themes from PubMed epidemiological criminology research abstracts is possible through the application of a rule-based text mining method. The frequency of the investigated themes may reflect historical developments concerning disease prevalence, treatment advances, and the social understanding of illness and incarcerated populations. The differences between income status groups are likely to be explained by local health priorities and immediate health risks. Notable gaps between stakeholder research priorities and research outputs concerned themes that were more focused on social factors and systems and may reflect publication bias or self-publication selection, highlighting the need for further research on prison health services and the social determinants of health. Different jurisdictions, countries, and regions should undertake similar systematic and transparent research priority–setting processes.https://formative.jmir.org/2023/1/e49721
spellingShingle George Karystianis
Paul Simpson
Wilson Lukmanjaya
Natasha Ginnivan
Goran Nenadic
Iain Buchan
Tony Butler
Automatic Extraction of Research Themes in Epidemiological Criminology From PubMed Abstracts From 1946 to 2020: Text Mining Study
JMIR Formative Research
title Automatic Extraction of Research Themes in Epidemiological Criminology From PubMed Abstracts From 1946 to 2020: Text Mining Study
title_full Automatic Extraction of Research Themes in Epidemiological Criminology From PubMed Abstracts From 1946 to 2020: Text Mining Study
title_fullStr Automatic Extraction of Research Themes in Epidemiological Criminology From PubMed Abstracts From 1946 to 2020: Text Mining Study
title_full_unstemmed Automatic Extraction of Research Themes in Epidemiological Criminology From PubMed Abstracts From 1946 to 2020: Text Mining Study
title_short Automatic Extraction of Research Themes in Epidemiological Criminology From PubMed Abstracts From 1946 to 2020: Text Mining Study
title_sort automatic extraction of research themes in epidemiological criminology from pubmed abstracts from 1946 to 2020 text mining study
url https://formative.jmir.org/2023/1/e49721
work_keys_str_mv AT georgekarystianis automaticextractionofresearchthemesinepidemiologicalcriminologyfrompubmedabstractsfrom1946to2020textminingstudy
AT paulsimpson automaticextractionofresearchthemesinepidemiologicalcriminologyfrompubmedabstractsfrom1946to2020textminingstudy
AT wilsonlukmanjaya automaticextractionofresearchthemesinepidemiologicalcriminologyfrompubmedabstractsfrom1946to2020textminingstudy
AT natashaginnivan automaticextractionofresearchthemesinepidemiologicalcriminologyfrompubmedabstractsfrom1946to2020textminingstudy
AT gorannenadic automaticextractionofresearchthemesinepidemiologicalcriminologyfrompubmedabstractsfrom1946to2020textminingstudy
AT iainbuchan automaticextractionofresearchthemesinepidemiologicalcriminologyfrompubmedabstractsfrom1946to2020textminingstudy
AT tonybutler automaticextractionofresearchthemesinepidemiologicalcriminologyfrompubmedabstractsfrom1946to2020textminingstudy