Vehicle claims in the south of Minas Gerais: an approach using classification models

Vehicle claims in the south of Minas Gerais: an approach using classification models

Authors

  • Luiz Otávio de Oliveira Pala Universidade Federal de Lavras
  • Marcela de Marillac Carvalho Universidade Federal de Lavras
  • Paulo Henrique Sales Guimarães Universidade Federal de Lavras
  • Thelma Sáfadi Universidade Federal de Lavras

DOI:

https://doi.org/10.5433/1679-0375.2020v41n1p79

Keywords:

Random forest, Random over sampling examples, Logistic regression.

Abstract

With the changes in the patterns of risk, new insurance products are available on the market. Consequently, pricing models are restructured to manage levels of risk and create premiums that maintain the well-being of insurers. This work analyzed the Logistics and Random forests models in the classification of total loss events in the south of Minas Gerais using original and artificial samples, built by the ROSE resampling method, which is a procedure for constructing artificial samples in a smoothing bootstrap. A total loss of a vehicle is considered when the repair costs for the same event exceed a percentage established by contract. As a result, it was obtained that the models with artificial data improved the balanced accuracy rate on unbalanced data.

Metrics

Metrics Loading ...

Author Biographies

Luiz Otávio de Oliveira Pala, Universidade Federal de Lavras

PhD student at Prog. of Statistics and Agricultural Exp., UFLA, Lavras, MG, Brazil

Marcela de Marillac Carvalho, Universidade Federal de Lavras

PhD student at Prog. of Statistics and Agricultural Exp., UFLA, Lavras, MG, Brazil

Paulo Henrique Sales Guimarães, Universidade Federal de Lavras

Prof. Dr., Depto. of Statistics, UFLA, Lavras, MG, Brasil

Thelma Sáfadi, Universidade Federal de Lavras

Profa. Dra., Depto. of Statistics, UFLA, Lavras, MG, Brazil

References

BLAKE, David; CAIRNS, Andrew; COUGHLAN, Guy; DOWD, Kevin; MACMINN, Richard. The new life market. Journal of Risk and Insurance. v. 80, n. 3, p. 501-558, 2013. Disponível em: urlhttp://onlinelibrary.wiley.com/doi/10.1111/j.1539-6975.2012.01514.x/full.

BREIMAN, Leo. Random Forests. Kluwer Academic Publishers. v. 45, p. 5-32, 2001.

DIONNE, Georges. Risk management: history, definition, and critique. Risk Management and Insurance Review. v. 16, n. 2, p. 147-166, 2013. Disponível em: urlhttp://onlinelibrary.wiley.com/doi/10.1111/rmir.12016/abstract.

FILHO, Olívio. Seguros: fundamentos, formação de preço, provisões e funções biométricas. Editora Atlas. São Paulo. 2011.

HASTIE, Trevor; TIBSHIRANI, Robert; FRIEDMAN, Jerome. The elements of statistical learning: data mining, inference, and prediction. 2. ed. Springer Series in Statistics. New York. 2008.

IZBICK, Rafael; SANTOS, Tiago. Machine Learning sob a ótica estatística: uma abordagem preditivista para a estatística com exemplos em R. 2019. Disponível em: urlhttp://www.rizbicki.ufscar.br/sml.pdf.

MCCLISH, Donna. Analyzing a Portion of the ROC Curve. Medical Decision Making. v. 9, n. 3. 1989.

LANTZ, Brett. Machine learning with R. 2. ed. [s. L.]: Packt Publishing, 2015.

LIAW, Andy; WIENER, Matthew. Breiman and Cutler's Random Forests for Classification and Regression. 2018. Disponível em: urlhttps://cran.r-project.org/web/packages/randomForest/randomForest.pdf.

LIN, Weiwei; WU, Ziming; LIN, Longxin; WEN, Angzhan; LI, Jin. An Ensemble Random Forest Algorithm for Insurance Big Data Analysis. IEEE Access. 2017. Disponível em: urlhttps://ieeexplore.ieee.org/document/8005851.

LUNARDON, Nicola; MENARDI, Giovanna; TORELLI, Nicola. ROSE: Random Over-Sampling Examples. 2015. Disponível em: urlhttps://cran.r-project.org/web/packages/ROSE/ROSE.pdf.

MENARDI, Giovanna; TORELLI, Nicola. Training and assessing classification rules with imbalanced data. Data mining and knowledge discovery. v. 28, p. 92-122, 2014.

MORETTIN, Pedro; SINGER, Julio. Introdução à ciência de dados: fundamentos e aplicações. 2020. Disponível em: urlhttps://www.ime.usp.br/~pam/.

PRATI, R; BATISTA, G; MONARD, M. Curvas ROC para avaliação de classificadores. IEEE LATIN AMERICA TRANSACTIONS. v. 6, n. 2, 2008.

PIERRI, Francesca; STANGHELLINI, Elena; BISTONI, Nicoló. Risk analysis and retrospective unbalanced data. REVSTAT. v. 14, n. 2, p. 157-169, 2016.

R CORE TEAM. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2020.

SARLIJA, Natasa; BILANDZIC, Ana; STANIC, Marina. Logistic regression modelling: procedures and pitfalls in developing and interpreting prediction models. Croatian Operational Research Review. v. 8, p. 631-652, 2017.

SPEDICATO, Giorgio; DUTANG, Christophe; PETRINI, Leonardo. Machine Learning methods to perform pricing optimization: a comparison with Standard Generalized Linear Models. Variance Journal. v. 12, n. 1, p. 69-89, 2018.

SUSAC, Marijana; SARLIJA, Natasa; HAS, Adela; BILANDZIC, Ana. Predicting company growth using logistic regression and neural networks. Croatian operational research review. v. 7, p. 229-248, 2016.

Superintendência de Seguros Privados (SUSEP). CIRCULAR SUSEP Nº 145, DE 07 DE NOVEMBRO DE 2.000. 2000. Disponível em: urlhttp://www2.susep.gov.br/bibliotecaweb/docOriginal.aspx?tipo=1&codigo=9058.

Superintendência de Seguros Privados (SUSEP). AUTOSEG: sistema de estatística de automóveis da Susep. 2020. Disponível em: urlhttp://www2.susep.gov.br/menuestatistica/Autoseg/principal.aspx.

TANTITHAMTHAVORN, Chakkrit; HASSAN, Ahmed; MATSUMOTO, Kenichi. The impact of class rebalancing techniques on the performance and interpretation of defect prediction models. IEEE Access. 2018. Disponível em: urlhttps://arxiv.org/pdf/1801.10269.pdf.

ZEMIACKI, Juscelino. Teoria da credibilidade: Uma abordagem Bayesiana para estimação de prêmios de seguros de vida. UFRGS. 2006.

ZHANG, Jue; CHEN, Li. Clustering-based undersampling with random over sampling examples and support vector machine for imbalanced classification of breast cancer diagnosis. Computer Assisted Surgery. v.24, n. 2, p. 62-72. 2019.

Downloads

Published

2020-06-20

How to Cite

Pala, L. O. de O., Carvalho, M. de M., Guimarães, P. H. S., & Sáfadi, T. (2020). Vehicle claims in the south of Minas Gerais: an approach using classification models. Semina: Ciências Exatas E Tecnológicas, 41(1), 79–86. https://doi.org/10.5433/1679-0375.2020v41n1p79

Issue

Section

Original Article
Loading...