The problem of frauds is becoming increasingly important in this E-commerce age, where an enormous number of financial transactions are carried out by using electronic instruments of payment such as credit cards. In this scenario it is not possible to adopt human-driven solutions due to the huge number of involved operations. The only approach is therefore to adopt automatic solutions able to discern the legitimate transactions from the fraudulent ones. For this reason, today the development of techniques capable of carrying out this task efficiently represents a very active research field that involves a large number of researchers around the world. Unfortunately, this is not an easy task, since the definition of effective fraud detection approaches is made difficult by a series of well-known problems, the most important of them being the non-balanced class distribution of data that leads towards a significant reduction of the machine learning approaches performance. Such limitation is addressed by the approach proposed in this paper, which exploits three different metrics of similarity in order to define a three-dimensional space of evaluation. Its main objective is a better characterization of the financial transactions in terms of the two possible target classes (legitimate or fraudulent), facing the information asymmetry that gives rise to the problem previously exposed. A series of experiments conducted by using real-world data with different size and imbalance level, demonstrate the effectiveness of the proposed approach with regard to the state-of-the-art solutions.
Unbalanced data classification in fraud detection by introducing a multidimensional space analysis
Saia, Roberto
2018-01-01
Abstract
The problem of frauds is becoming increasingly important in this E-commerce age, where an enormous number of financial transactions are carried out by using electronic instruments of payment such as credit cards. In this scenario it is not possible to adopt human-driven solutions due to the huge number of involved operations. The only approach is therefore to adopt automatic solutions able to discern the legitimate transactions from the fraudulent ones. For this reason, today the development of techniques capable of carrying out this task efficiently represents a very active research field that involves a large number of researchers around the world. Unfortunately, this is not an easy task, since the definition of effective fraud detection approaches is made difficult by a series of well-known problems, the most important of them being the non-balanced class distribution of data that leads towards a significant reduction of the machine learning approaches performance. Such limitation is addressed by the approach proposed in this paper, which exploits three different metrics of similarity in order to define a three-dimensional space of evaluation. Its main objective is a better characterization of the financial transactions in terms of the two possible target classes (legitimate or fraudulent), facing the information asymmetry that gives rise to the problem previously exposed. A series of experiments conducted by using real-world data with different size and imbalance level, demonstrate the effectiveness of the proposed approach with regard to the state-of-the-art solutions.File | Dimensione | Formato | |
---|---|---|---|
iotbds2018-paper.pdf
Solo gestori archivio
Tipologia:
versione pre-print
Dimensione
195.39 kB
Formato
Adobe PDF
|
195.39 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.