The request of effective credit scoring models is rising in these last decades, due to the increase of consumer lending. Their objective is to divide the loan applicants into two classes, reliable or non-reliable, on the basis of the available information. The linear discriminant analysis is one of the most common techniques used to define these models, although this simple parametric statistical method does not overcome some problems, the most important of which is the imbalanced distribution of data by classes. It happens since the number of default cases is much smaller than that of non-default ones, a scenario that reduces the effectiveness of the machine learning approaches, e.g., neural networks and random forests. The Difference in Maximum Entropy (DME) approach proposed in this paper leads toward two interesting results: on the one hand, it evaluates the new loan applications in terms of maximum entropy difference between their features and those of the non-default past cases, using for the model training only these last cases, overcoming the imbalanced learning issue; on the other hand, it operates proactively, overcoming the cold-start problem. Our model has been evaluated by using two real-world data sets with an imbalanced distribution of data, comparing its performance to that of the most performant state-of-the-art approach: random forests.
An entropy based algorithm for credit scoring
Saia, Roberto;CARTA, SALVATORE MARIO
2016-01-01
Abstract
The request of effective credit scoring models is rising in these last decades, due to the increase of consumer lending. Their objective is to divide the loan applicants into two classes, reliable or non-reliable, on the basis of the available information. The linear discriminant analysis is one of the most common techniques used to define these models, although this simple parametric statistical method does not overcome some problems, the most important of which is the imbalanced distribution of data by classes. It happens since the number of default cases is much smaller than that of non-default ones, a scenario that reduces the effectiveness of the machine learning approaches, e.g., neural networks and random forests. The Difference in Maximum Entropy (DME) approach proposed in this paper leads toward two interesting results: on the one hand, it evaluates the new loan applications in terms of maximum entropy difference between their features and those of the non-default past cases, using for the model training only these last cases, overcoming the imbalanced learning issue; on the other hand, it operates proactively, overcoming the cold-start problem. Our model has been evaluated by using two real-world data sets with an imbalanced distribution of data, comparing its performance to that of the most performant state-of-the-art approach: random forests.File | Dimensione | Formato | |
---|---|---|---|
confenis2016.pdf
Solo gestori archivio
Descrizione: Articolo principale
Tipologia:
versione pre-print
Dimensione
123.52 kB
Formato
Adobe PDF
|
123.52 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.