Human motion generation aims to synthesize realistic and diverse body movements from inputs such as text, poses, or environmental cues. Effective evaluation of generated motion requires metrics that capture realism, accuracy, and diversity. Existing approaches tend to overemphasize similarity to ground truth, assume Gaussian distributions, ignore temporal structure, or rely heavily on subjective assessments. To overcome these limitations, we introduce a novel evaluation method based on Conditional Glow Normalizing Flows, which model the full data distribution without assuming gaussianity and without requiring paired reference samples.Trained on two contrasting datasets, AMASS, which contains high-fidelity motion capture performed by real actors, and Mixamo, a synthetic dataset with retargeted animations, our model provides exact likelihood estimates that quantify the degree to which a motion sequence aligns with real or synthetic distributions. This enables not just binary classification, but a nuanced evaluation of physical and kinematic plausibility. We conduct extensive experiments across several state-of-the-art generative models, demonstrating that our likelihood-based metric offers an interpretable tool for motion validation.
Learning to judge motion: likelihood-based evaluation with normalizing flows
Floris, Alessandro;
2025-01-01
Abstract
Human motion generation aims to synthesize realistic and diverse body movements from inputs such as text, poses, or environmental cues. Effective evaluation of generated motion requires metrics that capture realism, accuracy, and diversity. Existing approaches tend to overemphasize similarity to ground truth, assume Gaussian distributions, ignore temporal structure, or rely heavily on subjective assessments. To overcome these limitations, we introduce a novel evaluation method based on Conditional Glow Normalizing Flows, which model the full data distribution without assuming gaussianity and without requiring paired reference samples.Trained on two contrasting datasets, AMASS, which contains high-fidelity motion capture performed by real actors, and Mixamo, a synthetic dataset with retargeted animations, our model provides exact likelihood estimates that quantify the degree to which a motion sequence aligns with real or synthetic distributions. This enables not just binary classification, but a nuanced evaluation of physical and kinematic plausibility. We conduct extensive experiments across several state-of-the-art generative models, demonstrating that our likelihood-based metric offers an interpretable tool for motion validation.| File | Dimensione | Formato | |
|---|---|---|---|
|
pub - Learning_to_Judge_Motion.pdf
Solo gestori archivio
Descrizione: VoR
Tipologia:
versione editoriale (VoR)
Dimensione
5.51 MB
Formato
Adobe PDF
|
5.51 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
|
469454_AAM.pdf
accesso aperto
Descrizione: AAM
Tipologia:
versione post-print (AAM)
Dimensione
6.25 MB
Formato
Adobe PDF
|
6.25 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


