Detecting deepfake videos remains a challenging task, especially in scenarios involving unknown manipulation methods or unseen data distributions. Most existing video deepfake detection methods rely on high-level semantic features, which often lead to overfitting of facial identity information and poor transferability. In this work, we explore a novel perspective by modeling videos through 3D differential operations along temporal and spatial dimensions. To exploit the spatial–temporal variation information of the video content, the proposed approach decomposes videos into single-axis 1D differential signals, which are then transformed into 2D representations for efficient learning. This procedure enables the use of lightweight 2D CNNs while retaining directional forgery cues. Our experiments, aimed at analyzing whether these differential signals capture discriminative patterns useful for distinguishing real from fake content, show that the proposed method achieves strong intra-dataset performance and reveals complementary information across dimensions. These findings suggest that differential signals could potentially support generalization when integrated into broader detection frameworks.
3D differential decomposition for video deepfake detection with identity suppression
Jie Gao;Marco Micheletto
;Giulia Orru';Gian Luca Marcialis
2026-01-01
Abstract
Detecting deepfake videos remains a challenging task, especially in scenarios involving unknown manipulation methods or unseen data distributions. Most existing video deepfake detection methods rely on high-level semantic features, which often lead to overfitting of facial identity information and poor transferability. In this work, we explore a novel perspective by modeling videos through 3D differential operations along temporal and spatial dimensions. To exploit the spatial–temporal variation information of the video content, the proposed approach decomposes videos into single-axis 1D differential signals, which are then transformed into 2D representations for efficient learning. This procedure enables the use of lightweight 2D CNNs while retaining directional forgery cues. Our experiments, aimed at analyzing whether these differential signals capture discriminative patterns useful for distinguishing real from fake content, show that the proposed method achieves strong intra-dataset performance and reveals complementary information across dimensions. These findings suggest that differential signals could potentially support generalization when integrated into broader detection frameworks.| File | Dimensione | Formato | |
|---|---|---|---|
|
1-s2.0-S0923596526000482-main.pdf
accesso aperto
Tipologia:
versione editoriale (VoR)
Dimensione
4.2 MB
Formato
Adobe PDF
|
4.2 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


