Comparison of Statistical and Machine-Learning Models on Road Traffic Accident Severity Classification

Portugal has the sixth highest road fatality rate among European Union members. This is a problem of different dimensions with serious consequences in people’s lives. This study analyses daily data from police and government authorities on road traffic accidents that occurred between 2016 and 2019 i...

Full description

Bibliographic Details
Main Authors: Paulo Infante, Gonçalo Jacinto, Anabela Afonso, Leonor Rego, Vitor Nogueira, Paulo Quaresma, José Saias, Daniel Santos, Pedro Nogueira, Marcelo Silva, Rosalina Pisco Costa, Patrícia Gois, Paulo Rebelo Manuel
Format: Article
Language:English
Published: MDPI AG 2022-05-01
Series:Computers
Subjects:
Online Access:https://www.mdpi.com/2073-431X/11/5/80
_version_ 1797500586095017984
author Paulo Infante
Gonçalo Jacinto
Anabela Afonso
Leonor Rego
Vitor Nogueira
Paulo Quaresma
José Saias
Daniel Santos
Pedro Nogueira
Marcelo Silva
Rosalina Pisco Costa
Patrícia Gois
Paulo Rebelo Manuel
author_facet Paulo Infante
Gonçalo Jacinto
Anabela Afonso
Leonor Rego
Vitor Nogueira
Paulo Quaresma
José Saias
Daniel Santos
Pedro Nogueira
Marcelo Silva
Rosalina Pisco Costa
Patrícia Gois
Paulo Rebelo Manuel
author_sort Paulo Infante
collection DOAJ
description Portugal has the sixth highest road fatality rate among European Union members. This is a problem of different dimensions with serious consequences in people’s lives. This study analyses daily data from police and government authorities on road traffic accidents that occurred between 2016 and 2019 in a district of Portugal. This paper looks for the determinants that contribute to the existence of victims in road traffic accidents, as well as the determinants for fatalities and/or serious injuries in accidents with victims. We use logistic regression models, and the results are compared to the machine-learning model results. For the severity model, where the response variable indicates whether only property damage or casualties resulted in the traffic accident, we used a large sample with a small imbalance. For the serious injuries model, where the response variable indicates whether or not there were victims with serious injuries and/or fatalities in the traffic accident with victims, we used a small sample with very imbalanced data. Empirical analysis supports the conclusion that, with a small sample of imbalanced data, machine-learning models generally do not perform better than statistical models; however, they perform similarly when the sample is large and has a small imbalance.
first_indexed 2024-03-10T03:05:56Z
format Article
id doaj.art-9908e8c173f54987a3dddd11de49724d
institution Directory Open Access Journal
issn 2073-431X
language English
last_indexed 2024-03-10T03:05:56Z
publishDate 2022-05-01
publisher MDPI AG
record_format Article
series Computers
spelling doaj.art-9908e8c173f54987a3dddd11de49724d2023-11-23T10:33:43ZengMDPI AGComputers2073-431X2022-05-011158010.3390/computers11050080Comparison of Statistical and Machine-Learning Models on Road Traffic Accident Severity ClassificationPaulo Infante0Gonçalo Jacinto1Anabela Afonso2Leonor Rego3Vitor Nogueira4Paulo Quaresma5José Saias6Daniel Santos7Pedro Nogueira8Marcelo Silva9Rosalina Pisco Costa10Patrícia Gois11Paulo Rebelo Manuel12CIMA, IIFA, University of Évora, 7000-671 Évora, PortugalCIMA, IIFA, University of Évora, 7000-671 Évora, PortugalCIMA, IIFA, University of Évora, 7000-671 Évora, PortugalDepartment of Matematics, ECT, University of Évora, 7000-671 Évora, PortugalAlgoritmi Research Centre, University of Évora, 7000-671 Évora, PortugalAlgoritmi Research Centre, University of Évora, 7000-671 Évora, PortugalAlgoritmi Research Centre, University of Évora, 7000-671 Évora, PortugalDepartment of Informatics, ECT, University of Évora, 7000-671 Évora, PortugalICT, IIFA, University of Évora, 7000-671 Évora, PortugalICT, IIFA, University of Évora, 7000-671 Évora, PortugalCICS.NOVA.UEVORA, IIFA, University of Évora, 7000-208 Évora, PortugalDepartment of Visual Arts and Design, EA, University of Évora, 7000-208 Évora, PortugalCIMA, IIFA, University of Évora, 7000-671 Évora, PortugalPortugal has the sixth highest road fatality rate among European Union members. This is a problem of different dimensions with serious consequences in people’s lives. This study analyses daily data from police and government authorities on road traffic accidents that occurred between 2016 and 2019 in a district of Portugal. This paper looks for the determinants that contribute to the existence of victims in road traffic accidents, as well as the determinants for fatalities and/or serious injuries in accidents with victims. We use logistic regression models, and the results are compared to the machine-learning model results. For the severity model, where the response variable indicates whether only property damage or casualties resulted in the traffic accident, we used a large sample with a small imbalance. For the serious injuries model, where the response variable indicates whether or not there were victims with serious injuries and/or fatalities in the traffic accident with victims, we used a small sample with very imbalanced data. Empirical analysis supports the conclusion that, with a small sample of imbalanced data, machine-learning models generally do not perform better than statistical models; however, they perform similarly when the sample is large and has a small imbalance.https://www.mdpi.com/2073-431X/11/5/80injurylogistic regressionmachine learningroad traffic accidentsseverity of victims
spellingShingle Paulo Infante
Gonçalo Jacinto
Anabela Afonso
Leonor Rego
Vitor Nogueira
Paulo Quaresma
José Saias
Daniel Santos
Pedro Nogueira
Marcelo Silva
Rosalina Pisco Costa
Patrícia Gois
Paulo Rebelo Manuel
Comparison of Statistical and Machine-Learning Models on Road Traffic Accident Severity Classification
Computers
injury
logistic regression
machine learning
road traffic accidents
severity of victims
title Comparison of Statistical and Machine-Learning Models on Road Traffic Accident Severity Classification
title_full Comparison of Statistical and Machine-Learning Models on Road Traffic Accident Severity Classification
title_fullStr Comparison of Statistical and Machine-Learning Models on Road Traffic Accident Severity Classification
title_full_unstemmed Comparison of Statistical and Machine-Learning Models on Road Traffic Accident Severity Classification
title_short Comparison of Statistical and Machine-Learning Models on Road Traffic Accident Severity Classification
title_sort comparison of statistical and machine learning models on road traffic accident severity classification
topic injury
logistic regression
machine learning
road traffic accidents
severity of victims
url https://www.mdpi.com/2073-431X/11/5/80
work_keys_str_mv AT pauloinfante comparisonofstatisticalandmachinelearningmodelsonroadtrafficaccidentseverityclassification
AT goncalojacinto comparisonofstatisticalandmachinelearningmodelsonroadtrafficaccidentseverityclassification
AT anabelaafonso comparisonofstatisticalandmachinelearningmodelsonroadtrafficaccidentseverityclassification
AT leonorrego comparisonofstatisticalandmachinelearningmodelsonroadtrafficaccidentseverityclassification
AT vitornogueira comparisonofstatisticalandmachinelearningmodelsonroadtrafficaccidentseverityclassification
AT pauloquaresma comparisonofstatisticalandmachinelearningmodelsonroadtrafficaccidentseverityclassification
AT josesaias comparisonofstatisticalandmachinelearningmodelsonroadtrafficaccidentseverityclassification
AT danielsantos comparisonofstatisticalandmachinelearningmodelsonroadtrafficaccidentseverityclassification
AT pedronogueira comparisonofstatisticalandmachinelearningmodelsonroadtrafficaccidentseverityclassification
AT marcelosilva comparisonofstatisticalandmachinelearningmodelsonroadtrafficaccidentseverityclassification
AT rosalinapiscocosta comparisonofstatisticalandmachinelearningmodelsonroadtrafficaccidentseverityclassification
AT patriciagois comparisonofstatisticalandmachinelearningmodelsonroadtrafficaccidentseverityclassification
AT paulorebelomanuel comparisonofstatisticalandmachinelearningmodelsonroadtrafficaccidentseverityclassification