The Impact of Balancing Real and Synthetic Data on Accuracy and Fairness in Face Recognition

Atzori, A.; Cosseddu, P.; Fenu, G.; Marras, M.

doi:10.1007/978-3-031-91907-7_17

Over the recent years, the advancements in deep face recognition have fueled an increasing demand for large and diverse datasets. Nevertheless, the authentic data acquired to create those datasets is typically sourced from the web, which, in many cases, can lead to significant privacy issues due to the lack of explicit user consent. Furthermore, obtaining a demographically balanced, large dataset is even more difficult because of the natural imbalance in the distribution of images from different demographic groups. In this paper, we investigate the impact of demographically balanced authentic and synthetic data, both individually and in combination, on the accuracy and fairness of face recognition models. Initially, several generative methods were used to balance the demographic representations of the corresponding synthetic datasets. Then a state-of-the-art face encoder was trained and evaluated using (combinations of) synthetic and authentic images. Our findings emphasized two main points: (i) the increased effectiveness of training data generated by diffusion-based models in enhancing accuracy, whether used alone or combined with subsets of authentic data, and (ii) the minimal impact of incorporating balanced data from pre-trained generative methods on fairness (in nearly all tested scenarios using combined datasets, fairness scores remained either unchanged or worsened, even when compared to unbalanced authentic datasets). Source code and data are available at https://cutt.ly/AeQy1K5G for reproducibility.

The Impact of Balancing Real and Synthetic Data on Accuracy and Fairness in Face Recognition

Atzori A.;Cosseddu P.;Fenu G.;Marras M.

2025-01-01

Abstract

Over the recent years, the advancements in deep face recognition have fueled an increasing demand for large and diverse datasets. Nevertheless, the authentic data acquired to create those datasets is typically sourced from the web, which, in many cases, can lead to significant privacy issues due to the lack of explicit user consent. Furthermore, obtaining a demographically balanced, large dataset is even more difficult because of the natural imbalance in the distribution of images from different demographic groups. In this paper, we investigate the impact of demographically balanced authentic and synthetic data, both individually and in combination, on the accuracy and fairness of face recognition models. Initially, several generative methods were used to balance the demographic representations of the corresponding synthetic datasets. Then a state-of-the-art face encoder was trained and evaluated using (combinations of) synthetic and authentic images. Our findings emphasized two main points: (i) the increased effectiveness of training data generated by diffusion-based models in enhancing accuracy, whether used alone or combined with subsets of authentic data, and (ii) the minimal impact of incorporating balanced data from pre-trained generative methods on fairness (in nearly all tested scenarios using combined datasets, fairness scores remained either unchanged or worsened, even when compared to unbalanced authentic datasets). Source code and data are available at https://cutt.ly/AeQy1K5G for reproducibility.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Codice ISBN
	
				9783031919060
9783031919077
			
	Parole chiave
	
				Biometrics
Face Recognition
Fairness
Synthetic Data
			
	Tipologia:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
2409.02867v1 AAM.pdf Open Access dal 13/05/2026 Tipologia: versione post-print (AAM) Dimensione 1.16 MB Formato Adobe PDF Visualizza/Apri	1.16 MB	Adobe PDF	Visualizza/Apri

I metadati presenti in IRIS UNICA sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono protetti da diritto d'autore, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/459096

Citazioni

ND

2

1

ND

UNICA IRIS Institutional Research Information System

The Impact of Balancing Real and Synthetic Data on Accuracy and Fairness in Face Recognition

Atzori A.;Cosseddu P.;Fenu G.;Marras M.

2025-01-01

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

social impact

UNICA IRIS Institutional Research Information System

The Impact of Balancing Real and Synthetic Data on Accuracy and Fairness in Face Recognition

Atzori A.;Cosseddu P.;Fenu G.;Marras M.

2025-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)