UNICA IRIS Institutional Research Information System

Human motion generation aims to synthesize realistic and diverse body movements from inputs such as text, poses, or environmental cues. Effective evaluation of generated motion requires metrics that capture realism, accuracy, and diversity. Existing approaches tend to overemphasize similarity to ground truth, assume Gaussian distributions, ignore temporal structure, or rely heavily on subjective assessments. To overcome these limitations, we introduce a novel evaluation method based on Conditional Glow Normalizing Flows, which model the full data distribution without assuming gaussianity and without requiring paired reference samples.Trained on two contrasting datasets, AMASS, which contains high-fidelity motion capture performed by real actors, and Mixamo, a synthetic dataset with retargeted animations, our model provides exact likelihood estimates that quantify the degree to which a motion sequence aligns with real or synthetic distributions. This enables not just binary classification, but a nuanced evaluation of physical and kinematic plausibility. We conduct extensive experiments across several state-of-the-art generative models, demonstrating that our likelihood-based metric offers an interpretable tool for motion validation.

Learning to judge motion: likelihood-based evaluation with normalizing flows

Choueb, Mahamat Issa;Sekharamantry, Praveen Kumar;Martinelli, Giulia;Floris, Alessandro;Conci, Nicola

2025-01-01

Abstract

Human motion generation aims to synthesize realistic and diverse body movements from inputs such as text, poses, or environmental cues. Effective evaluation of generated motion requires metrics that capture realism, accuracy, and diversity. Existing approaches tend to overemphasize similarity to ground truth, assume Gaussian distributions, ignore temporal structure, or rely heavily on subjective assessments. To overcome these limitations, we introduce a novel evaluation method based on Conditional Glow Normalizing Flows, which model the full data distribution without assuming gaussianity and without requiring paired reference samples.Trained on two contrasting datasets, AMASS, which contains high-fidelity motion capture performed by real actors, and Mixamo, a synthetic dataset with retargeted animations, our model provides exact likelihood estimates that quantify the degree to which a motion sequence aligns with real or synthetic distributions. This enables not just binary classification, but a nuanced evaluation of physical and kinematic plausibility. We conduct extensive experiments across several state-of-the-art generative models, demonstrating that our likelihood-based metric offers an interpretable tool for motion validation.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Codice ISBN
	
				979-8-3315-7515-1
979-8-3315-7516-8
			
	Parole chiave
	
				Measurement; Training; Visualization; Manuals; Information processing; Probabilistic logic; Data models; Motion capture; Reliability; Synthetic dat
			
	Tipologia:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
pub - Learning_to_Judge_Motion.pdf Solo gestori archivio Descrizione: VoR Tipologia: versione editoriale (VoR) Dimensione 5.51 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	5.51 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
469454_AAM.pdf accesso aperto Descrizione: AAM Tipologia: versione post-print (AAM) Dimensione 6.25 MB Formato Adobe PDF Visualizza/Apri	6.25 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/469454

Citazioni

ND

0

ND

social impact