Research around the process of automatic price prediction of stock markets indicates that published news are an important asset to solve this problem.We further elaborate on an NLP-based approach to generate industry-specific lexicons from news documents exploiting the distributed technology of Apache Spark, with a focus on individuating on a day-to-day scale the correlation between significant stock price variations and the words collected from press releases. Thereafter we apply a binary classification algorithm that builds upon our newly generated lexicons to predict the magnitude of fluctuation of stock market price. Subsequently, by processing news belonging to a large collection of news articles from the most prestigious press agencies, we validate our approach by conducting an experiment on the market history of the US companies belonging to the Standard & Poor 500 index. We also test the performance of the algorithm on a multi-lingual setting, in particular focusing on the Italian stock market and the Italy 40 (FTSE MIB) index. Final data about classification results let us assess the mutual dependence between terms and prices, and help us evaluating the predictive power of our created lexicons.

A Big Data framework based on Apache Spark for Industry-specific Lexicon Generation for Stock Market Prediction

Reforgiato Recupero D.
;
Stanciu M. M.
2021-01-01

Abstract

Research around the process of automatic price prediction of stock markets indicates that published news are an important asset to solve this problem.We further elaborate on an NLP-based approach to generate industry-specific lexicons from news documents exploiting the distributed technology of Apache Spark, with a focus on individuating on a day-to-day scale the correlation between significant stock price variations and the words collected from press releases. Thereafter we apply a binary classification algorithm that builds upon our newly generated lexicons to predict the magnitude of fluctuation of stock market price. Subsequently, by processing news belonging to a large collection of news articles from the most prestigious press agencies, we validate our approach by conducting an experiment on the market history of the US companies belonging to the Standard & Poor 500 index. We also test the performance of the algorithm on a multi-lingual setting, in particular focusing on the Italian stock market and the Italy 40 (FTSE MIB) index. Final data about classification results let us assess the mutual dependence between terms and prices, and help us evaluating the predictive power of our created lexicons.
2021
9781450387347
Apache Spark
Big Data
Financial Technology.
Machine Learning
Natural Language Processing
Stock Market Forecasting
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/334853
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact