The increase of consumer credit has made it necessary to research more and more effective models for the credit scoring. Such models are usually trained by using the past loan applications, evaluating the new ones on the basis of certain criteria. Although the state of the art offers several different approaches for their definition, this process represents a hard challenge due to several reasons. The most important ones are the data unbalance between the default and the non-default cases that reduces the effectiveness of almost all techniques, and the data heterogeneity, which makes it difficult the definition of a model able to effectively evaluate all the new loan applications. The approach proposed in this paper faces the aforementioned problems by moving the evaluation process from the canonical time domain to a frequency one, using a model based on the past non-default loan applications. It allows us to overcome the data unbalance problem by exploiting only a class of data, also defining a model that is less influenced by the data heterogeneity. The performed experiments show interesting results, since the proposed approach achieves performance closer or better than that of one of the best state-of-the-art approaches of credit scoring, such as random forests, although it operates in a proactive way, only by exploiting the past non-default cases.

A fourier spectral pattern analysis to design credit scoring models

Saia, Roberto;Carta, Salvatore
2017-01-01

Abstract

The increase of consumer credit has made it necessary to research more and more effective models for the credit scoring. Such models are usually trained by using the past loan applications, evaluating the new ones on the basis of certain criteria. Although the state of the art offers several different approaches for their definition, this process represents a hard challenge due to several reasons. The most important ones are the data unbalance between the default and the non-default cases that reduces the effectiveness of almost all techniques, and the data heterogeneity, which makes it difficult the definition of a model able to effectively evaluate all the new loan applications. The approach proposed in this paper faces the aforementioned problems by moving the evaluation process from the canonical time domain to a frequency one, using a model based on the past non-default loan applications. It allows us to overcome the data unbalance problem by exploiting only a class of data, also defining a model that is less influenced by the data heterogeneity. The performed experiments show interesting results, since the proposed approach achieves performance closer or better than that of one of the best state-of-the-art approaches of credit scoring, such as random forests, although it operates in a proactive way, only by exploiting the past non-default cases.
2017
9781450352437
Business intelligence; Classification; Credit scoring; Imbalanced datasets; Metrics; Human-computer interaction; Computer networks and communications; Software
File in questo prodotto:
File Dimensione Formato  
iml2017.pdf

Solo gestori archivio

Tipologia: versione post-print
Dimensione 238.68 kB
Formato Adobe PDF
238.68 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/257023
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 11
  • ???jsp.display-item.citation.isi??? 11
social impact