Variable Selection Algorithm for a Mixture of Poisson Regression for Handling Overdispersion in Claims Frequency Modeling Using Telematics Car Driving Data

In automobile insurance, it is common to adopt a Poisson regression model to predict the number of claims as part of the actuarial pricing process. The Poisson assumption can rarely be justified, often due to overdispersion, and alternative modeling is often considered, typically zero-inflated model...

Full description

Bibliographic Details
Main Authors: Jennifer S. K. Chan, S. T. Boris Choy, Udi Makov, Ariel Shamir, Vered Shapovalov
Format: Article
Language:English
Published: MDPI AG 2022-04-01
Series:Risks
Subjects:
Online Access:https://www.mdpi.com/2227-9091/10/4/83
_version_ 1797409386977558528
author Jennifer S. K. Chan
S. T. Boris Choy
Udi Makov
Ariel Shamir
Vered Shapovalov
author_facet Jennifer S. K. Chan
S. T. Boris Choy
Udi Makov
Ariel Shamir
Vered Shapovalov
author_sort Jennifer S. K. Chan
collection DOAJ
description In automobile insurance, it is common to adopt a Poisson regression model to predict the number of claims as part of the actuarial pricing process. The Poisson assumption can rarely be justified, often due to overdispersion, and alternative modeling is often considered, typically zero-inflated models, which are special cases of finite mixture distributions. Finite mixture regression modeling of telematics data is challenging to implement since the huge number of covariates computationally prohibits the essential variable selection needed to attain a model with desirable predictive power devoid of overfitting. This paper aims at devising an algorithm that can carry the task of variable selection in the presence of a large number of covariates. This is achieved by generating sub-samples of the data corresponding to each component of the Poisson mixture, and wherein variable selection is applied following the enhancement of the Poisson assumption by means of controlling the number of zero claims. The resulting algorithm is assessed by measuring the out-of-sample AUC (Area Under the Curve), a Machine Learning tool for quantifying predictive power. Finally, the application of the algorithm is demonstrated by using data of claim history and telematics data describing driving behavior. It transpires that unlike alternative algorithms related to Poisson regression, the proposed algorithm is both implementable and enjoys an improved AUC (0.71). The proposed algorithm allows more accurate pricing in an era where telematics data is used for automobile insurance.
first_indexed 2024-03-09T04:14:54Z
format Article
id doaj.art-3a64456869ea473a8684308953e01b57
institution Directory Open Access Journal
issn 2227-9091
language English
last_indexed 2024-03-09T04:14:54Z
publishDate 2022-04-01
publisher MDPI AG
record_format Article
series Risks
spelling doaj.art-3a64456869ea473a8684308953e01b572023-12-03T13:55:55ZengMDPI AGRisks2227-90912022-04-011048310.3390/risks10040083Variable Selection Algorithm for a Mixture of Poisson Regression for Handling Overdispersion in Claims Frequency Modeling Using Telematics Car Driving DataJennifer S. K. Chan0S. T. Boris Choy1Udi Makov2Ariel Shamir3Vered Shapovalov4School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, AustraliaDiscipline of Business Analytics, The University of Sydney, Sydney, NSW 2006, AustraliaActuarial Resreach Center, University of Haifa, Haifa 3498838, IsraelEfi Arazi School of Computer Science, Reichman University, Herzliya 4610101, IsraelActuarial Resreach Center, University of Haifa, Haifa 3498838, IsraelIn automobile insurance, it is common to adopt a Poisson regression model to predict the number of claims as part of the actuarial pricing process. The Poisson assumption can rarely be justified, often due to overdispersion, and alternative modeling is often considered, typically zero-inflated models, which are special cases of finite mixture distributions. Finite mixture regression modeling of telematics data is challenging to implement since the huge number of covariates computationally prohibits the essential variable selection needed to attain a model with desirable predictive power devoid of overfitting. This paper aims at devising an algorithm that can carry the task of variable selection in the presence of a large number of covariates. This is achieved by generating sub-samples of the data corresponding to each component of the Poisson mixture, and wherein variable selection is applied following the enhancement of the Poisson assumption by means of controlling the number of zero claims. The resulting algorithm is assessed by measuring the out-of-sample AUC (Area Under the Curve), a Machine Learning tool for quantifying predictive power. Finally, the application of the algorithm is demonstrated by using data of claim history and telematics data describing driving behavior. It transpires that unlike alternative algorithms related to Poisson regression, the proposed algorithm is both implementable and enjoys an improved AUC (0.71). The proposed algorithm allows more accurate pricing in an era where telematics data is used for automobile insurance.https://www.mdpi.com/2227-9091/10/4/83mixture poisson regressionvariable selectiontelematics
spellingShingle Jennifer S. K. Chan
S. T. Boris Choy
Udi Makov
Ariel Shamir
Vered Shapovalov
Variable Selection Algorithm for a Mixture of Poisson Regression for Handling Overdispersion in Claims Frequency Modeling Using Telematics Car Driving Data
Risks
mixture poisson regression
variable selection
telematics
title Variable Selection Algorithm for a Mixture of Poisson Regression for Handling Overdispersion in Claims Frequency Modeling Using Telematics Car Driving Data
title_full Variable Selection Algorithm for a Mixture of Poisson Regression for Handling Overdispersion in Claims Frequency Modeling Using Telematics Car Driving Data
title_fullStr Variable Selection Algorithm for a Mixture of Poisson Regression for Handling Overdispersion in Claims Frequency Modeling Using Telematics Car Driving Data
title_full_unstemmed Variable Selection Algorithm for a Mixture of Poisson Regression for Handling Overdispersion in Claims Frequency Modeling Using Telematics Car Driving Data
title_short Variable Selection Algorithm for a Mixture of Poisson Regression for Handling Overdispersion in Claims Frequency Modeling Using Telematics Car Driving Data
title_sort variable selection algorithm for a mixture of poisson regression for handling overdispersion in claims frequency modeling using telematics car driving data
topic mixture poisson regression
variable selection
telematics
url https://www.mdpi.com/2227-9091/10/4/83
work_keys_str_mv AT jenniferskchan variableselectionalgorithmforamixtureofpoissonregressionforhandlingoverdispersioninclaimsfrequencymodelingusingtelematicscardrivingdata
AT stborischoy variableselectionalgorithmforamixtureofpoissonregressionforhandlingoverdispersioninclaimsfrequencymodelingusingtelematicscardrivingdata
AT udimakov variableselectionalgorithmforamixtureofpoissonregressionforhandlingoverdispersioninclaimsfrequencymodelingusingtelematicscardrivingdata
AT arielshamir variableselectionalgorithmforamixtureofpoissonregressionforhandlingoverdispersioninclaimsfrequencymodelingusingtelematicscardrivingdata
AT veredshapovalov variableselectionalgorithmforamixtureofpoissonregressionforhandlingoverdispersioninclaimsfrequencymodelingusingtelematicscardrivingdata