Android malware detectors are now widely implemented with machine learning algorithms, trained on large datasets of goodware and malware applications gathered at a fixed moment in time. However, as recent work showed, this domain is not stationary, causing detectors to show degrading performance over time. While recent work pinpoints the presence of such drift, little has been done to isolate its causes and understand the underlying reasons. In this work, we show which features cause the data drift, i.e., new features to appear and old ones that become unreliable. Our experimental evaluation highlights that particular feature groups cause the data drift. However, we also show that removing these highly variable features from the feature set is insufficient to achieve good classification performance.

Data drift in Android malware detection

Minnei, Luca;Eddoubi, Hicham;Sotgiu, Angelo;Pintor, Maura;Demontis, Ambra;Biggio, Battista
2025-01-01

Abstract

Android malware detectors are now widely implemented with machine learning algorithms, trained on large datasets of goodware and malware applications gathered at a fixed moment in time. However, as recent work showed, this domain is not stationary, causing detectors to show degrading performance over time. While recent work pinpoints the presence of such drift, little has been done to isolate its causes and understand the underlying reasons. In this work, we show which features cause the data drift, i.e., new features to appear and old ones that become unreliable. Our experimental evaluation highlights that particular feature groups cause the data drift. However, we also show that removing these highly variable features from the feature set is insufficient to achieve good classification performance.
2025
Machine Learning; Android; Cybersecurity
File in questo prodotto:
File Dimensione Formato  
Data_Drift_in_Android_Malware_Detection.pdf

Solo gestori archivio

Tipologia: versione editoriale (VoR)
Dimensione 366.5 kB
Formato Adobe PDF
366.5 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
ICMLC-drift-malware.pdf

accesso aperto

Tipologia: versione pre-print
Dimensione 306.46 kB
Formato Adobe PDF
306.46 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/469666
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact