Deepfakes leverage artificial intelligence to generate highly realistic but falsified visual content, raising concerns for security and trust in digital media. Detecting such manipulations becomes more challenging when videos are compressed, as compression algorithms introduce artifacts that obscure forensic evidence. One possible solution is to train separate models for different compression levels; however, this approach increases computational costs and limits scalability. To address this challenge, we introduce a unified framework designed to improve robustness against varying degrees of video compression. Our approach combines (i) a dedicated MPEG-based augmentation strategy tailored for compressed videos, and (ii) two architectural designs named Multi-Head (MHN) and the Multi-Branch Network (MBN). The MHN extends a standard backbone by appending lightweight output layers, or "heads", that jointly predict deepfake likelihood and compression level, enabling compression-aware detection with minimal architectural changes. The MBN combines multiple MHNs into a modular, parallel architecture, offering an alternative to conventional depth-based model scaling. Experiments on the FaceForensics++ and Celeb-DF datasets show that both MHN and MBN improve detection performance in compressed scenarios. Notably, MHN applied to a lightweight backbone outperforms deeper and more complex models without the multi-head extension, making the proposed solution well-suited for deployment in resource-constrained settings.
Robust deepfake detection in compressed videos with scalable network strategies
Perelli G.;Micheletto M.
;Puglisi G.;Luca Marcialis G.
2026-01-01
Abstract
Deepfakes leverage artificial intelligence to generate highly realistic but falsified visual content, raising concerns for security and trust in digital media. Detecting such manipulations becomes more challenging when videos are compressed, as compression algorithms introduce artifacts that obscure forensic evidence. One possible solution is to train separate models for different compression levels; however, this approach increases computational costs and limits scalability. To address this challenge, we introduce a unified framework designed to improve robustness against varying degrees of video compression. Our approach combines (i) a dedicated MPEG-based augmentation strategy tailored for compressed videos, and (ii) two architectural designs named Multi-Head (MHN) and the Multi-Branch Network (MBN). The MHN extends a standard backbone by appending lightweight output layers, or "heads", that jointly predict deepfake likelihood and compression level, enabling compression-aware detection with minimal architectural changes. The MBN combines multiple MHNs into a modular, parallel architecture, offering an alternative to conventional depth-based model scaling. Experiments on the FaceForensics++ and Celeb-DF datasets show that both MHN and MBN improve detection performance in compressed scenarios. Notably, MHN applied to a lightweight backbone outperforms deeper and more complex models without the multi-head extension, making the proposed solution well-suited for deployment in resource-constrained settings.| File | Dimensione | Formato | |
|---|---|---|---|
|
1-s2.0-S0957417426006743-main.pdf
accesso aperto
Tipologia:
versione editoriale (VoR)
Dimensione
8.01 MB
Formato
Adobe PDF
|
8.01 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


