UNICA IRIS Institutional Research Information System

Research around the process of automatic price prediction of stock markets indicates that published news are an important asset to solve this problem.We further elaborate on an NLP-based approach to generate industry-specific lexicons from news documents exploiting the distributed technology of Apache Spark, with a focus on individuating on a day-to-day scale the correlation between significant stock price variations and the words collected from press releases. Thereafter we apply a binary classification algorithm that builds upon our newly generated lexicons to predict the magnitude of fluctuation of stock market price. Subsequently, by processing news belonging to a large collection of news articles from the most prestigious press agencies, we validate our approach by conducting an experiment on the market history of the US companies belonging to the Standard & Poor 500 index. We also test the performance of the algorithm on a multi-lingual setting, in particular focusing on the Italian stock market and the Italy 40 (FTSE MIB) index. Final data about classification results let us assess the mutual dependence between terms and prices, and help us evaluating the predictive power of our created lexicons.

A Big Data framework based on Apache Spark for Industry-specific Lexicon Generation for Stock Market Prediction

Angioni S.;Carta S.;Consoli S.;Reforgiato Recupero D.;Stanciu M. M.

2021-01-01

Abstract

Research around the process of automatic price prediction of stock markets indicates that published news are an important asset to solve this problem.We further elaborate on an NLP-based approach to generate industry-specific lexicons from news documents exploiting the distributed technology of Apache Spark, with a focus on individuating on a day-to-day scale the correlation between significant stock price variations and the words collected from press releases. Thereafter we apply a binary classification algorithm that builds upon our newly generated lexicons to predict the magnitude of fluctuation of stock market price. Subsequently, by processing news belonging to a large collection of news articles from the most prestigious press agencies, we validate our approach by conducting an experiment on the market history of the US companies belonging to the Standard & Poor 500 index. We also test the performance of the algorithm on a multi-lingual setting, in particular focusing on the Italian stock market and the Italy 40 (FTSE MIB) index. Final data about classification results let us assess the mutual dependence between terms and prices, and help us evaluating the predictive power of our created lexicons.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2021
			
	Codice ISBN
	
				9781450387347
			
	Parole chiave
	
				Apache Spark
Big Data
Financial Technology.
Machine Learning
Natural Language Processing
Stock Market Forecasting
			
	Tipologia:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/334853

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

0

ND

social impact