This paper analyzes the problem of figurative language detection on social media, with a focus on the use of semantic features for identifying irony and sarcasm. Framester, a novel resource that acts as a hub between FrameNet, WordNet, VerbNet, BabelNet, DBpedia, Yago, DOLCE-Zero and others, has been used to extract semantic features from text. These semantic features are used to enrich the representations of tweets with event information using frames and word senses in addition to lexical units. The data set used for experimentation purposes contains tweets taken from different corpora including both figurative (containing irony and sarcasm) and non-figurative language. Two major tasks were performed: (i) detecting figurative language in tweets in a dataset containing both figurative and non-figurative tweets, (ii) classifying tweets containing irony and sarcasm. A 10-fold cross-validation experiment shows that the obtained accuracy for both tasks increases significantly when the semantic features such as linguistic frames and word senses are used in addition to lexical units, indicating that they may be important clues for figurative language. The approach was developed on top of Apache Spark so that it is easily scalable to much higher volumes of data, allowing for real-time analysis.
Frame-Based Detection of Figurative Language in Tweets
Diego Reforgiato Recupero;Davide Buscaldi;Farideh Tavazoee
2019-01-01
Abstract
This paper analyzes the problem of figurative language detection on social media, with a focus on the use of semantic features for identifying irony and sarcasm. Framester, a novel resource that acts as a hub between FrameNet, WordNet, VerbNet, BabelNet, DBpedia, Yago, DOLCE-Zero and others, has been used to extract semantic features from text. These semantic features are used to enrich the representations of tweets with event information using frames and word senses in addition to lexical units. The data set used for experimentation purposes contains tweets taken from different corpora including both figurative (containing irony and sarcasm) and non-figurative language. Two major tasks were performed: (i) detecting figurative language in tweets in a dataset containing both figurative and non-figurative tweets, (ii) classifying tweets containing irony and sarcasm. A 10-fold cross-validation experiment shows that the obtained accuracy for both tasks increases significantly when the semantic features such as linguistic frames and word senses are used in addition to lexical units, indicating that they may be important clues for figurative language. The approach was developed on top of Apache Spark so that it is easily scalable to much higher volumes of data, allowing for real-time analysis.File | Dimensione | Formato | |
---|---|---|---|
pubb6_Reforgiato_Recupero_Diego_ords_31D_1220_01B1.pdf
accesso aperto
Tipologia:
versione editoriale (VoR)
Dimensione
1.2 MB
Formato
Adobe PDF
|
1.2 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.