Human motion generation aims to synthesize realistic and diverse body movements from inputs such as text, poses, or environmental cues. Effective evaluation of generated motion requires metrics that capture realism, accuracy, and diversity. Existing approaches tend to overemphasize similarity to ground truth, assume Gaussian distributions, ignore temporal structure, or rely heavily on subjective assessments. To overcome these limitations, we introduce a novel evaluation method based on Conditional Glow Normalizing Flows, which model the full data distribution without assuming gaussianity and without requiring paired reference samples.Trained on two contrasting datasets, AMASS, which contains high-fidelity motion capture performed by real actors, and Mixamo, a synthetic dataset with retargeted animations, our model provides exact likelihood estimates that quantify the degree to which a motion sequence aligns with real or synthetic distributions. This enables not just binary classification, but a nuanced evaluation of physical and kinematic plausibility. We conduct extensive experiments across several state-of-the-art generative models, demonstrating that our likelihood-based metric offers an interpretable tool for motion validation.

Learning to judge motion: likelihood-based evaluation with normalizing flows

Floris, Alessandro;
2025-01-01

Abstract

Human motion generation aims to synthesize realistic and diverse body movements from inputs such as text, poses, or environmental cues. Effective evaluation of generated motion requires metrics that capture realism, accuracy, and diversity. Existing approaches tend to overemphasize similarity to ground truth, assume Gaussian distributions, ignore temporal structure, or rely heavily on subjective assessments. To overcome these limitations, we introduce a novel evaluation method based on Conditional Glow Normalizing Flows, which model the full data distribution without assuming gaussianity and without requiring paired reference samples.Trained on two contrasting datasets, AMASS, which contains high-fidelity motion capture performed by real actors, and Mixamo, a synthetic dataset with retargeted animations, our model provides exact likelihood estimates that quantify the degree to which a motion sequence aligns with real or synthetic distributions. This enables not just binary classification, but a nuanced evaluation of physical and kinematic plausibility. We conduct extensive experiments across several state-of-the-art generative models, demonstrating that our likelihood-based metric offers an interpretable tool for motion validation.
2025
979-8-3315-7515-1
979-8-3315-7516-8
Measurement; Training; Visualization; Manuals; Information processing; Probabilistic logic; Data models; Motion capture; Reliability; Synthetic dat
File in questo prodotto:
File Dimensione Formato  
pub - Learning_to_Judge_Motion.pdf

Solo gestori archivio

Descrizione: VoR
Tipologia: versione editoriale (VoR)
Dimensione 5.51 MB
Formato Adobe PDF
5.51 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
469454_AAM.pdf

accesso aperto

Descrizione: AAM
Tipologia: versione post-print (AAM)
Dimensione 6.25 MB
Formato Adobe PDF
6.25 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/469454
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact