Vehicle claims in the south of Minas Gerais: an approach using classification models
DOI:
https://doi.org/10.5433/1679-0375.2020v41n1p79Keywords:
Random forest, Random over sampling examples, Logistic regression.Abstract
With the changes in the patterns of risk, new insurance products are available on the market. Consequently, pricing models are restructured to manage levels of risk and create premiums that maintain the well-being of insurers. This work analyzed the Logistics and Random forests models in the classification of total loss events in the south of Minas Gerais using original and artificial samples, built by the ROSE resampling method, which is a procedure for constructing artificial samples in a smoothing bootstrap. A total loss of a vehicle is considered when the repair costs for the same event exceed a percentage established by contract. As a result, it was obtained that the models with artificial data improved the balanced accuracy rate on unbalanced data.Downloads
References
BREIMAN, Leo. Random Forests. Kluwer Academic Publishers. v. 45, p. 5-32, 2001.
DIONNE, Georges. Risk management: history, definition, and critique. Risk Management and Insurance Review. v. 16, n. 2, p. 147-166, 2013. Disponível em: urlhttp://onlinelibrary.wiley.com/doi/10.1111/rmir.12016/abstract.
FILHO, Olívio. Seguros: fundamentos, formação de preço, provisões e funções biométricas. Editora Atlas. São Paulo. 2011.
HASTIE, Trevor; TIBSHIRANI, Robert; FRIEDMAN, Jerome. The elements of statistical learning: data mining, inference, and prediction. 2. ed. Springer Series in Statistics. New York. 2008.
IZBICK, Rafael; SANTOS, Tiago. Machine Learning sob a ótica estatística: uma abordagem preditivista para a estatística com exemplos em R. 2019. Disponível em: urlhttp://www.rizbicki.ufscar.br/sml.pdf.
MCCLISH, Donna. Analyzing a Portion of the ROC Curve. Medical Decision Making. v. 9, n. 3. 1989.
LANTZ, Brett. Machine learning with R. 2. ed. [s. L.]: Packt Publishing, 2015.
LIAW, Andy; WIENER, Matthew. Breiman and Cutler's Random Forests for Classification and Regression. 2018. Disponível em: urlhttps://cran.r-project.org/web/packages/randomForest/randomForest.pdf.
LIN, Weiwei; WU, Ziming; LIN, Longxin; WEN, Angzhan; LI, Jin. An Ensemble Random Forest Algorithm for Insurance Big Data Analysis. IEEE Access. 2017. Disponível em: urlhttps://ieeexplore.ieee.org/document/8005851.
LUNARDON, Nicola; MENARDI, Giovanna; TORELLI, Nicola. ROSE: Random Over-Sampling Examples. 2015. Disponível em: urlhttps://cran.r-project.org/web/packages/ROSE/ROSE.pdf.
MENARDI, Giovanna; TORELLI, Nicola. Training and assessing classification rules with imbalanced data. Data mining and knowledge discovery. v. 28, p. 92-122, 2014.
MORETTIN, Pedro; SINGER, Julio. Introdução à ciência de dados: fundamentos e aplicações. 2020. Disponível em: urlhttps://www.ime.usp.br/~pam/.
PRATI, R; BATISTA, G; MONARD, M. Curvas ROC para avaliação de classificadores. IEEE LATIN AMERICA TRANSACTIONS. v. 6, n. 2, 2008.
PIERRI, Francesca; STANGHELLINI, Elena; BISTONI, Nicoló. Risk analysis and retrospective unbalanced data. REVSTAT. v. 14, n. 2, p. 157-169, 2016.
R CORE TEAM. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2020.
SARLIJA, Natasa; BILANDZIC, Ana; STANIC, Marina. Logistic regression modelling: procedures and pitfalls in developing and interpreting prediction models. Croatian Operational Research Review. v. 8, p. 631-652, 2017.
SPEDICATO, Giorgio; DUTANG, Christophe; PETRINI, Leonardo. Machine Learning methods to perform pricing optimization: a comparison with Standard Generalized Linear Models. Variance Journal. v. 12, n. 1, p. 69-89, 2018.
SUSAC, Marijana; SARLIJA, Natasa; HAS, Adela; BILANDZIC, Ana. Predicting company growth using logistic regression and neural networks. Croatian operational research review. v. 7, p. 229-248, 2016.
Superintendência de Seguros Privados (SUSEP). CIRCULAR SUSEP Nº 145, DE 07 DE NOVEMBRO DE 2.000. 2000. Disponível em: urlhttp://www2.susep.gov.br/bibliotecaweb/docOriginal.aspx?tipo=1&codigo=9058.
Superintendência de Seguros Privados (SUSEP). AUTOSEG: sistema de estatística de automóveis da Susep. 2020. Disponível em: urlhttp://www2.susep.gov.br/menuestatistica/Autoseg/principal.aspx.
TANTITHAMTHAVORN, Chakkrit; HASSAN, Ahmed; MATSUMOTO, Kenichi. The impact of class rebalancing techniques on the performance and interpretation of defect prediction models. IEEE Access. 2018. Disponível em: urlhttps://arxiv.org/pdf/1801.10269.pdf.
ZEMIACKI, Juscelino. Teoria da credibilidade: Uma abordagem Bayesiana para estimação de prêmios de seguros de vida. UFRGS. 2006.
ZHANG, Jue; CHEN, Li. Clustering-based undersampling with random over sampling examples and support vector machine for imbalanced classification of breast cancer diagnosis. Computer Assisted Surgery. v.24, n. 2, p. 62-72. 2019.
Downloads
Published
How to Cite
Issue
Section
License
The Copyright Declaration for articles published in this journal is the author’s right. Since manuscripts are published in an open access Journal, they are free to use, with their own attributions, in educational and non-commercial applications. The Journal has the right to make, in the original document, changes regarding linguistic norms, orthography, and grammar, with the purpose of ensuring the standard norms of the language and the credibility of the Journal. It will, however, respect the writing style of the authors. When necessary, conceptual changes, corrections, or suggestions will be forwarded to the authors. In such cases, the manuscript shall be subjected to a new evaluation after revision. Responsibility for the opinions expressed in the manuscripts lies entirely with the authors.
This journal is licensed with a license Creative Commons Attribution-NonCommercial 4.0 International.