This article presents a case study on the analysis of semi-structured interviews conducted in Italy, using a small dataset. Two supervised approaches are applied to identify key questions to retrieve information and compare responses to the same questions across different respondents. The first approach is based on a bag-of-words model, while the second relies on embeddings. These approaches are compared with two topic modeling methods (LDA and BERTopic). The results highlight the differences between the methods: key-question-based approaches seem to be more suitable when the goal is to compare responses to specific questions, whereas topic modeling techniques are better suited for identifying latent topics.
Natural language processing for data analysis: an application on the well-being
Boi, Samuele
Primo
;Tedesco, NicolaSecondo
;Salaris, LuisaUltimo
2026-01-01
Abstract
This article presents a case study on the analysis of semi-structured interviews conducted in Italy, using a small dataset. Two supervised approaches are applied to identify key questions to retrieve information and compare responses to the same questions across different respondents. The first approach is based on a bag-of-words model, while the second relies on embeddings. These approaches are compared with two topic modeling methods (LDA and BERTopic). The results highlight the differences between the methods: key-question-based approaches seem to be more suitable when the goal is to compare responses to specific questions, whereas topic modeling techniques are better suited for identifying latent topics.| File | Dimensione | Formato | |
|---|---|---|---|
|
s11135-026-02894-9.pdf
accesso aperto
Tipologia:
versione editoriale (VoR)
Dimensione
3.13 MB
Formato
Adobe PDF
|
3.13 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


