The Information Systems represent the primary instrument of growth for the companies that operate in the so-called e-commerce environment. The data streams generated by the users that interact with their websites are the primary source to define the user behavioral models. Some main examples of services integrated in these websites are the Recommender Systems, where these models are exploited in order to generate recommendations of items of potential interest to users, the User Segmentation Systems, where the models are used in order to group the users on the basis of their preferences, and the Fraud Detection Systems, where these models are exploited to determine the legitimacy of a financial transaction. Even though in literature diversity and similarity are considered as two sides of the same coin, almost all the approaches take into account them in a mutually exclusive manner, rather than jointly. The aim of this thesis is to demonstrate how the consideration of both sides of this coin is instead essential to overcome some well-known problems that affict the state-of-the-art approaches used to implement these services, improving their performance. Its contributions are the following: with regard to the recommender systems, the detection of the diversity in a user profile is used to discard incoherent items, improving the accuracy, while the exploitation of the similarity of the predicted items is used to re-rank the recommendations, improving their effectiveness; with regard to the user segmentation systems, the detection of the diversity overcomes the problem of the non-reliability of data source, while the exploitation of the similarity reduces the problems of understandability and triviality of the obtained segments; lastly, concerning the fraud detection systems, the joint use of both diversity and similarity in the evaluation of a new transaction overcomes the problems of the data scarcity, and those of the non-stationary and unbalanced class distribution.

Similarity and diversity: two sides of the same coin in the evaluation of data streams

Saia, Roberto
2016-03-07

Abstract

The Information Systems represent the primary instrument of growth for the companies that operate in the so-called e-commerce environment. The data streams generated by the users that interact with their websites are the primary source to define the user behavioral models. Some main examples of services integrated in these websites are the Recommender Systems, where these models are exploited in order to generate recommendations of items of potential interest to users, the User Segmentation Systems, where the models are used in order to group the users on the basis of their preferences, and the Fraud Detection Systems, where these models are exploited to determine the legitimacy of a financial transaction. Even though in literature diversity and similarity are considered as two sides of the same coin, almost all the approaches take into account them in a mutually exclusive manner, rather than jointly. The aim of this thesis is to demonstrate how the consideration of both sides of this coin is instead essential to overcome some well-known problems that affict the state-of-the-art approaches used to implement these services, improving their performance. Its contributions are the following: with regard to the recommender systems, the detection of the diversity in a user profile is used to discard incoherent items, improving the accuracy, while the exploitation of the similarity of the predicted items is used to re-rank the recommendations, improving their effectiveness; with regard to the user segmentation systems, the detection of the diversity overcomes the problem of the non-reliability of data source, while the exploitation of the similarity reduces the problems of understandability and triviality of the obtained segments; lastly, concerning the fraud detection systems, the joint use of both diversity and similarity in the evaluation of a new transaction overcomes the problems of the data scarcity, and those of the non-stationary and unbalanced class distribution.
7-mar-2016
analisi semantica. segmentazione degli utenti
pattern mining
profilazione utenti
recommended systems
semantic analysis
sistemi di raccomandazione
user profiling
File in questo prodotto:
File Dimensione Formato  
PhD_Thesis_Saia.pdf

accesso aperto

Tipologia: Tesi di dottorato
Dimensione 753.04 kB
Formato Adobe PDF
753.04 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/266878
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact