Reflection is an essential counselling skill where the therapist communicates their understanding of the client’s words to the client. Recent studies have explored language-model-based reflection generation, but automatic quality evaluation of generated reflections remains under-explored. In this work, we investigate automatic evaluation on one fundamental quality aspect: coherence and context-consistency. We test a range of automatic evaluators/metrics and examine their correlations with expert judgement. We find that large language models (LLMs) as zero-shot evaluators achieve the best performance, while other metrics correlate poorly with expert judgement. We also demonstrate that diverse LLM-as-evaluator configurations need to be explored to find the best setup.

Towards Effective Automatic Evaluation of Generated Reflections for Motivational Interviewing

Reforgiato Recupero D.
;
Riboni D.
2023-01-01

Abstract

Reflection is an essential counselling skill where the therapist communicates their understanding of the client’s words to the client. Recent studies have explored language-model-based reflection generation, but automatic quality evaluation of generated reflections remains under-explored. In this work, we investigate automatic evaluation on one fundamental quality aspect: coherence and context-consistency. We test a range of automatic evaluators/metrics and examine their correlations with expert judgement. We find that large language models (LLMs) as zero-shot evaluators achieve the best performance, while other metrics correlate poorly with expert judgement. We also demonstrate that diverse LLM-as-evaluator configurations need to be explored to find the best setup.
2023
9798400703218
automatic evaluation; motivational interviewing; reflection
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/390605
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact