The problem of frauds is becoming increasingly important in this E-commerce age, where an enormous number of financial transactions are carried out by using electronic instruments of payment such as credit cards. In this scenario it is not possible to adopt human-driven solutions due to the huge number of involved operations. The only approach is therefore to adopt automatic solutions able to discern the legitimate transactions from the fraudulent ones. For this reason, today the development of techniques capable of carrying out this task efficiently represents a very active research field that involves a large number of researchers around the world. Unfortunately, this is not an easy task, since the definition of effective fraud detection approaches is made difficult by a series of well-known problems, the most important of them being the non-balanced class distribution of data that leads towards a significant reduction of the machine learning approaches performance. Such limitation is addressed by the approach proposed in this paper, which exploits three different metrics of similarity in order to define a three-dimensional space of evaluation. Its main objective is a better characterization of the financial transactions in terms of the two possible target classes (legitimate or fraudulent), facing the information asymmetry that gives rise to the problem previously exposed. A series of experiments conducted by using real-world data with different size and imbalance level, demonstrate the effectiveness of the proposed approach with regard to the state-of-the-art solutions.

Unbalanced data classification in fraud detection by introducing a multidimensional space analysis

Saia, Roberto
2018-01-01

Abstract

The problem of frauds is becoming increasingly important in this E-commerce age, where an enormous number of financial transactions are carried out by using electronic instruments of payment such as credit cards. In this scenario it is not possible to adopt human-driven solutions due to the huge number of involved operations. The only approach is therefore to adopt automatic solutions able to discern the legitimate transactions from the fraudulent ones. For this reason, today the development of techniques capable of carrying out this task efficiently represents a very active research field that involves a large number of researchers around the world. Unfortunately, this is not an easy task, since the definition of effective fraud detection approaches is made difficult by a series of well-known problems, the most important of them being the non-balanced class distribution of data that leads towards a significant reduction of the machine learning approaches performance. Such limitation is addressed by the approach proposed in this paper, which exploits three different metrics of similarity in order to define a three-dimensional space of evaluation. Its main objective is a better characterization of the financial transactions in terms of the two possible target classes (legitimate or fraudulent), facing the information asymmetry that gives rise to the problem previously exposed. A series of experiments conducted by using real-world data with different size and imbalance level, demonstrate the effectiveness of the proposed approach with regard to the state-of-the-art solutions.
2018
9789897582967
Business intelligence; Data imbalance; Fraud detection; Metrics; Pattern mining; Software; Computer networks and communications
File in questo prodotto:
File Dimensione Formato  
iotbds2018-paper.pdf

Solo gestori archivio

Tipologia: versione pre-print
Dimensione 195.39 kB
Formato Adobe PDF
195.39 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/257031
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
  • ???jsp.display-item.citation.isi??? ND
social impact