UNICA IRIS Institutional Research Information System

The evolving nature of malware poses significant challenges for machine learning-based detectors, demanding frequent updates to handle new threats. As keeping all historical data is impractical due to storage constraints, Continual Learning (CL) algorithms come to help by incrementally updating the detectors without retraining over all previously collected data. Unfortunately, updating the model might cause inconsistencies: the new model can have false positives for goodware that was previously correctly classified, and malware that was detected by the previous model can become undetected by the new one. This issue, referred to as security regression, is often overlooked in concurrent work but can undermine user trust despite overall detection performance improvements. In this work, we address this issue by proposing a learning strategy that combines a replay-based CL method with a regression-aware penalty to preserve the correct decisions of earlier models. Specifically, we adapt the Positive Congruent Training (PCT) strategy to a CL setting, presenting the first regression-aware CL algorithm. Experiments conducted on the ELSA Android dataset demonstrate how this approach significantly reduces security regression while keeping up with the data drift, maintaining high detection performances over time.

Understanding Regression in Continual Learning for Malware Detection

Daniele Angioni^Secondo;Angelo Sotgiu;Maura Pintor^Penultimo;Battista Biggio^Ultimo

2025-01-01

Abstract

The evolving nature of malware poses significant challenges for machine learning-based detectors, demanding frequent updates to handle new threats. As keeping all historical data is impractical due to storage constraints, Continual Learning (CL) algorithms come to help by incrementally updating the detectors without retraining over all previously collected data. Unfortunately, updating the model might cause inconsistencies: the new model can have false positives for goodware that was previously correctly classified, and malware that was detected by the previous model can become undetected by the new one. This issue, referred to as security regression, is often overlooked in concurrent work but can undermine user trust despite overall detection performance improvements. In this work, we address this issue by proposing a learning strategy that combines a replay-based CL method with a regression-aware penalty to preserve the correct decisions of earlier models. Specifically, we adapt the Positive Congruent Training (PCT) strategy to a CL setting, presenting the first regression-aware CL algorithm. Experiments conducted on the ELSA Android dataset demonstrate how this approach significantly reduces security regression while keeping up with the data drift, maintaining high detection performances over time.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Parole chiave
	
				Android Malware; Continual Learning; Negative Flips; Regression Testing
			
	Tipologia:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
itasec2025.pdf accesso aperto Descrizione: versione editoriale dell'articolo Tipologia: versione editoriale (VoR) Dimensione 1.12 MB Formato Adobe PDF Visualizza/Apri	1.12 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/444025

Citazioni

ND

ND

ND

social impact