Vehicle claims in the south of Minas Gerais: an approach using classification models

Vehicle claims in the south of Minas Gerais: an approach using classification models


  • Luiz Otávio de Oliveira Pala Universidade Federal de Lavras
  • Marcela de Marillac Carvalho Universidade Federal de Lavras
  • Paulo Henrique Sales Guimarães Universidade Federal de Lavras
  • Thelma Sáfadi Universidade Federal de Lavras



Random forest, Random over sampling examples, Logistic regression.


With the changes in the patterns of risk, new insurance products are available on the market. Consequently, pricing models are restructured to manage levels of risk and create premiums that maintain the well-being of insurers. This work analyzed the Logistics and Random forests models in the classification of total loss events in the south of Minas Gerais using original and artificial samples, built by the ROSE resampling method, which is a procedure for constructing artificial samples in a smoothing bootstrap. A total loss of a vehicle is considered when the repair costs for the same event exceed a percentage established by contract. As a result, it was obtained that the models with artificial data improved the balanced accuracy rate on unbalanced data.


Download data is not yet available.

Author Biographies

Luiz Otávio de Oliveira Pala, Universidade Federal de Lavras

PhD student at Prog. of Statistics and Agricultural Exp., UFLA, Lavras, MG, Brazil

Marcela de Marillac Carvalho, Universidade Federal de Lavras

PhD student at Prog. of Statistics and Agricultural Exp., UFLA, Lavras, MG, Brazil

Paulo Henrique Sales Guimarães, Universidade Federal de Lavras

Prof. Dr., Depto. of Statistics, UFLA, Lavras, MG, Brasil

Thelma Sáfadi, Universidade Federal de Lavras

Profa. Dra., Depto. of Statistics, UFLA, Lavras, MG, Brazil


BLAKE, David; CAIRNS, Andrew; COUGHLAN, Guy; DOWD, Kevin; MACMINN, Richard. The new life market. Journal of Risk and Insurance. v. 80, n. 3, p. 501-558, 2013. Disponível em: url

BREIMAN, Leo. Random Forests. Kluwer Academic Publishers. v. 45, p. 5-32, 2001.

DIONNE, Georges. Risk management: history, definition, and critique. Risk Management and Insurance Review. v. 16, n. 2, p. 147-166, 2013. Disponível em: url

FILHO, Olívio. Seguros: fundamentos, formação de preço, provisões e funções biométricas. Editora Atlas. São Paulo. 2011.

HASTIE, Trevor; TIBSHIRANI, Robert; FRIEDMAN, Jerome. The elements of statistical learning: data mining, inference, and prediction. 2. ed. Springer Series in Statistics. New York. 2008.

IZBICK, Rafael; SANTOS, Tiago. Machine Learning sob a ótica estatística: uma abordagem preditivista para a estatística com exemplos em R. 2019. Disponível em: url

MCCLISH, Donna. Analyzing a Portion of the ROC Curve. Medical Decision Making. v. 9, n. 3. 1989.

LANTZ, Brett. Machine learning with R. 2. ed. [s. L.]: Packt Publishing, 2015.

LIAW, Andy; WIENER, Matthew. Breiman and Cutler's Random Forests for Classification and Regression. 2018. Disponível em: url

LIN, Weiwei; WU, Ziming; LIN, Longxin; WEN, Angzhan; LI, Jin. An Ensemble Random Forest Algorithm for Insurance Big Data Analysis. IEEE Access. 2017. Disponível em: url

LUNARDON, Nicola; MENARDI, Giovanna; TORELLI, Nicola. ROSE: Random Over-Sampling Examples. 2015. Disponível em: url

MENARDI, Giovanna; TORELLI, Nicola. Training and assessing classification rules with imbalanced data. Data mining and knowledge discovery. v. 28, p. 92-122, 2014.

MORETTIN, Pedro; SINGER, Julio. Introdução à ciência de dados: fundamentos e aplicações. 2020. Disponível em: url

PRATI, R; BATISTA, G; MONARD, M. Curvas ROC para avaliação de classificadores. IEEE LATIN AMERICA TRANSACTIONS. v. 6, n. 2, 2008.

PIERRI, Francesca; STANGHELLINI, Elena; BISTONI, Nicoló. Risk analysis and retrospective unbalanced data. REVSTAT. v. 14, n. 2, p. 157-169, 2016.

R CORE TEAM. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2020.

SARLIJA, Natasa; BILANDZIC, Ana; STANIC, Marina. Logistic regression modelling: procedures and pitfalls in developing and interpreting prediction models. Croatian Operational Research Review. v. 8, p. 631-652, 2017.

SPEDICATO, Giorgio; DUTANG, Christophe; PETRINI, Leonardo. Machine Learning methods to perform pricing optimization: a comparison with Standard Generalized Linear Models. Variance Journal. v. 12, n. 1, p. 69-89, 2018.

SUSAC, Marijana; SARLIJA, Natasa; HAS, Adela; BILANDZIC, Ana. Predicting company growth using logistic regression and neural networks. Croatian operational research review. v. 7, p. 229-248, 2016.

Superintendência de Seguros Privados (SUSEP). CIRCULAR SUSEP Nº 145, DE 07 DE NOVEMBRO DE 2.000. 2000. Disponível em: url

Superintendência de Seguros Privados (SUSEP). AUTOSEG: sistema de estatística de automóveis da Susep. 2020. Disponível em: url

TANTITHAMTHAVORN, Chakkrit; HASSAN, Ahmed; MATSUMOTO, Kenichi. The impact of class rebalancing techniques on the performance and interpretation of defect prediction models. IEEE Access. 2018. Disponível em: url

ZEMIACKI, Juscelino. Teoria da credibilidade: Uma abordagem Bayesiana para estimação de prêmios de seguros de vida. UFRGS. 2006.

ZHANG, Jue; CHEN, Li. Clustering-based undersampling with random over sampling examples and support vector machine for imbalanced classification of breast cancer diagnosis. Computer Assisted Surgery. v.24, n. 2, p. 62-72. 2019.




How to Cite

Pala, L. O. de O., Carvalho, M. de M., Guimarães, P. H. S., & Sáfadi, T. (2020). Vehicle claims in the south of Minas Gerais: an approach using classification models. Semina: Ciências Exatas E Tecnológicas, 41(1), 79–86.



Original Article

Similar Articles

You may also start an advanced similarity search for this article.
