The old-fashioned logistic regression is still the most used method for credit scoring. Recent developments have evolved new instruments coming from the machine learning approach, including random forests. In this paper, we tested the efficiency of logistic regression and XGBoost methods for default forecasting on a sample of 35,535 cases from 7 different business sectors of Italian SMEs, on a set of 28 banking variables and 55 balance sheet ratios for verifying which approach is better supporting the lending decisions. With this aim, we developed an efficiency index for measuring each model's capability to correctly select good borrowers, balancing the different effects of refusing the loan to a good customer and lending to a defaulter. Also, we computed the balancing spread to quantify the different models' efficiency in terms of credit costs for the borrower firms. Results show that different sectors report different results. However, generally speaking, the two methods report similar capabilities, while the cutoff setting can make a substantial difference in the actual use of those models for lending decisions.

Credit scoring: Does XGboost outperform logistic regression?A test on Italian SMEs

Zedda, Stefano
Primo
2024-01-01

Abstract

The old-fashioned logistic regression is still the most used method for credit scoring. Recent developments have evolved new instruments coming from the machine learning approach, including random forests. In this paper, we tested the efficiency of logistic regression and XGBoost methods for default forecasting on a sample of 35,535 cases from 7 different business sectors of Italian SMEs, on a set of 28 banking variables and 55 balance sheet ratios for verifying which approach is better supporting the lending decisions. With this aim, we developed an efficiency index for measuring each model's capability to correctly select good borrowers, balancing the different effects of refusing the loan to a good customer and lending to a defaulter. Also, we computed the balancing spread to quantify the different models' efficiency in terms of credit costs for the borrower firms. Results show that different sectors report different results. However, generally speaking, the two methods report similar capabilities, while the cutoff setting can make a substantial difference in the actual use of those models for lending decisions.
2024
Credit scoring; Logistic regression; XGBoost; Bank lending; SMEs
File in questo prodotto:
File Dimensione Formato  
Does xgboost RIBAF published.pdf

accesso aperto

Descrizione: Full paper as published
Tipologia: versione editoriale (VoR)
Dimensione 516.17 kB
Formato Adobe PDF
516.17 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/404323
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact