Detecting deepfake videos remains a challenging task, especially in scenarios involving unknown manipulation methods or unseen data distributions. Most existing video deepfake detection methods rely on high-level semantic features, which often lead to overfitting of facial identity information and poor transferability. In this work, we explore a novel perspective by modeling videos through 3D differential operations along temporal and spatial dimensions. To exploit the spatial–temporal variation information of the video content, the proposed approach decomposes videos into single-axis 1D differential signals, which are then transformed into 2D representations for efficient learning. This procedure enables the use of lightweight 2D CNNs while retaining directional forgery cues. Our experiments, aimed at analyzing whether these differential signals capture discriminative patterns useful for distinguishing real from fake content, show that the proposed method achieves strong intra-dataset performance and reveals complementary information across dimensions. These findings suggest that differential signals could potentially support generalization when integrated into broader detection frameworks.

3D differential decomposition for video deepfake detection with identity suppression

Jie Gao;Marco Micheletto
;
Giulia Orru';Gian Luca Marcialis
2026-01-01

Abstract

Detecting deepfake videos remains a challenging task, especially in scenarios involving unknown manipulation methods or unseen data distributions. Most existing video deepfake detection methods rely on high-level semantic features, which often lead to overfitting of facial identity information and poor transferability. In this work, we explore a novel perspective by modeling videos through 3D differential operations along temporal and spatial dimensions. To exploit the spatial–temporal variation information of the video content, the proposed approach decomposes videos into single-axis 1D differential signals, which are then transformed into 2D representations for efficient learning. This procedure enables the use of lightweight 2D CNNs while retaining directional forgery cues. Our experiments, aimed at analyzing whether these differential signals capture discriminative patterns useful for distinguishing real from fake content, show that the proposed method achieves strong intra-dataset performance and reveals complementary information across dimensions. These findings suggest that differential signals could potentially support generalization when integrated into broader detection frameworks.
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S0923596526000482-main.pdf

accesso aperto

Tipologia: versione editoriale (VoR)
Dimensione 4.2 MB
Formato Adobe PDF
4.2 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/485365
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact