Student success models might be prone to develop weak spots, i.e., examples hard to accurately classify due to insufficient representation during model creation. This weakness is one of the main factors undermining users' trust, since model predictions could for instance lead an instructor to not intervene on a student in need. In this paper, we unveil the need of detecting and characterizing unknown unknowns in student success prediction in order to better understand when models may fail. Unknown unknowns include the students for which the model is highly confident in its predictions, but is actually wrong. Therefore, we cannot solely rely on the model's confidence when evaluating the predictions quality. We first introduce a framework for the identification and characterization of unknown unknowns. We then assess its informativeness on log data collected from flipped courses and online courses using quantitative analyses and interviews with instructors. Our results show that unknown unknowns are a critical issue in this domain and that our framework can be applied to support their detection. The source code is available at https://github.com/epfl-ml4ed/unknown-unknowns.

Do Not Trust a Model because It is Confident: Uncovering and Characterizing Unknown Unknowns to Student Success Predictors in Online-Based Learning

Galici R.;Fenu G.;Marras M.
2023-01-01

Abstract

Student success models might be prone to develop weak spots, i.e., examples hard to accurately classify due to insufficient representation during model creation. This weakness is one of the main factors undermining users' trust, since model predictions could for instance lead an instructor to not intervene on a student in need. In this paper, we unveil the need of detecting and characterizing unknown unknowns in student success prediction in order to better understand when models may fail. Unknown unknowns include the students for which the model is highly confident in its predictions, but is actually wrong. Therefore, we cannot solely rely on the model's confidence when evaluating the predictions quality. We first introduce a framework for the identification and characterization of unknown unknowns. We then assess its informativeness on log data collected from flipped courses and online courses using quantitative analyses and interviews with instructors. Our results show that unknown unknowns are a critical issue in this domain and that our framework can be applied to support their detection. The source code is available at https://github.com/epfl-ml4ed/unknown-unknowns.
2023
Fairness
Machine Learning
Student Success
Trust
Uncertainty
Unknown Unknowns
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/432658
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? ND
social impact