Algorithms deployed in education can shape the learning experience and success of a student. It is therefore important to understand whether and how such algorithms might create inequalities or amplify existing biases. In this paper, we analyze the fairness of models which use behavioral data to identify at-risk students and suggest two novel pre-processing approaches for bias mitigation. Based on the concept of intersectionality, the first approach involves intelligent oversampling on combinations of demographic attributes. The second approach does not require any knowledge of demographic attributes and is based on the assumption that such attributes are a (noisy) proxy for student behavior. We hence propose to directly oversample different types of behaviors identified in a cluster analysis. We evaluate our approaches on data from (i) an open-ended learning environment and (ii) a flipped classroom course. Our results show that both approaches can mitigate model bias. Directly oversampling on behavior is a valuable alternative, when demographic metadata is not available. Source code and extended results are provided in https://github.com/epfl-ml4ed/behavioral-oversampling.

Protected Attributes Tell Us Who, Behavior Tells Us How: A Comparison of Demographic and Behavioral Oversampling for Fair Student Success Modeling

Marras M.;
2023-01-01

Abstract

Algorithms deployed in education can shape the learning experience and success of a student. It is therefore important to understand whether and how such algorithms might create inequalities or amplify existing biases. In this paper, we analyze the fairness of models which use behavioral data to identify at-risk students and suggest two novel pre-processing approaches for bias mitigation. Based on the concept of intersectionality, the first approach involves intelligent oversampling on combinations of demographic attributes. The second approach does not require any knowledge of demographic attributes and is based on the assumption that such attributes are a (noisy) proxy for student behavior. We hence propose to directly oversample different types of behaviors identified in a cluster analysis. We evaluate our approaches on data from (i) an open-ended learning environment and (ii) a flipped classroom course. Our results show that both approaches can mitigate model bias. Directly oversampling on behavior is a valuable alternative, when demographic metadata is not available. Source code and extended results are provided in https://github.com/epfl-ml4ed/behavioral-oversampling.
2023
Behavioral data
Bias
Fairness
Machine Learning
Oversampling
Student success
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/432659
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact