This paper analyzes the problem of figurative language detection on social media, with a focus on the use of semantic features for identifying irony and sarcasm. Framester, a novel resource that acts as a hub between FrameNet, WordNet, VerbNet, BabelNet, DBpedia, Yago, DOLCE-Zero and others, has been used to extract semantic features from text. These semantic features are used to enrich the representations of tweets with event information using frames and word senses in addition to lexical units. The data set used for experimentation purposes contains tweets taken from different corpora including both figurative (containing irony and sarcasm) and non-figurative language. Two major tasks were performed: (i) detecting figurative language in tweets in a dataset containing both figurative and non-figurative tweets, (ii) classifying tweets containing irony and sarcasm. A 10-fold cross-validation experiment shows that the obtained accuracy for both tasks increases significantly when the semantic features such as linguistic frames and word senses are used in addition to lexical units, indicating that they may be important clues for figurative language. The approach was developed on top of Apache Spark so that it is easily scalable to much higher volumes of data, allowing for real-time analysis.

Frame-Based Detection of Figurative Language in Tweets

Diego Reforgiato Recupero;Davide Buscaldi;Farideh Tavazoee
2019-01-01

Abstract

This paper analyzes the problem of figurative language detection on social media, with a focus on the use of semantic features for identifying irony and sarcasm. Framester, a novel resource that acts as a hub between FrameNet, WordNet, VerbNet, BabelNet, DBpedia, Yago, DOLCE-Zero and others, has been used to extract semantic features from text. These semantic features are used to enrich the representations of tweets with event information using frames and word senses in addition to lexical units. The data set used for experimentation purposes contains tweets taken from different corpora including both figurative (containing irony and sarcasm) and non-figurative language. Two major tasks were performed: (i) detecting figurative language in tweets in a dataset containing both figurative and non-figurative tweets, (ii) classifying tweets containing irony and sarcasm. A 10-fold cross-validation experiment shows that the obtained accuracy for both tasks increases significantly when the semantic features such as linguistic frames and word senses are used in addition to lexical units, indicating that they may be important clues for figurative language. The approach was developed on top of Apache Spark so that it is easily scalable to much higher volumes of data, allowing for real-time analysis.
2019
Social networking (online); Semantics;Feature extraction; Linguistics; Cluster computing; Real-time systems
File in questo prodotto:
File Dimensione Formato  
pubb6_Reforgiato_Recupero_Diego_ords_31D_1220_01B1.pdf

accesso aperto

Tipologia: versione editoriale (VoR)
Dimensione 1.2 MB
Formato Adobe PDF
1.2 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/305865
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 15
  • ???jsp.display-item.citation.isi??? 13
social impact