UNICA IRIS Institutional Research Information System

While machine-learning algorithms have demonstrated a strong ability in detecting Android malware, they can be evaded by sparse evasion attacks crafted by injecting a small set of fake components, e.g., permissions and system calls, without compromising intrusive functionality. Previous work has shown that, to improve robustness against such attacks, learning algorithms should avoid overemphasizing few discriminant features, providing instead decisions that rely upon a large subset of components. In this work, we investigate whether gradient-based attribution methods, used to explain classifiers’ decisions by identifying the most relevant features, can be used to help identify and select more robust algorithms. To this end, we propose to exploit two different metrics that represent the evenness of explanations, and a new compact security measure called Adversarial Robustness Metric. Our experiments conducted on two different datasets and five classification algorithms for Android malware detection show that a strong connection exists between the uniformity of explanations and adversarial robustness. In particular, we found that popular techniques like Gradient*Input and Integrated Gradients are strongly correlated to security when applied to both linear and nonlinear detectors, while more elementary explanation techniques like the simple Gradient do not provide reliable information about the robustness of such classifiers.

Do gradient-based explanations tell anything about adversarial robustness to android malware?

Melis M.;Scalas M.;Demontis A.;Maiorca D.;Biggio B.;Giacinto G.;Roli F.

2022-01-01

Abstract

While machine-learning algorithms have demonstrated a strong ability in detecting Android malware, they can be evaded by sparse evasion attacks crafted by injecting a small set of fake components, e.g., permissions and system calls, without compromising intrusive functionality. Previous work has shown that, to improve robustness against such attacks, learning algorithms should avoid overemphasizing few discriminant features, providing instead decisions that rely upon a large subset of components. In this work, we investigate whether gradient-based attribution methods, used to explain classifiers’ decisions by identifying the most relevant features, can be used to help identify and select more robust algorithms. To this end, we propose to exploit two different metrics that represent the evenness of explanations, and a new compact security measure called Adversarial Robustness Metric. Our experiments conducted on two different datasets and five classification algorithms for Android malware detection show that a strong connection exists between the uniformity of explanations and adversarial robustness. In particular, we found that popular techniques like Gradient*Input and Integrated Gradients are strongly correlated to security when applied to both linear and nonlinear detectors, while more elementary explanation techniques like the simple Gradient do not provide reliable information about the robustness of such classifiers.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2022
			
	Parole chiave
	
				Adversarial machine learning; Adversarial robustness; Android malware; Explainable artifcial intelligence; Interpretability
			
	Tipologia:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Melis2021_Article_DoGradient-basedExplanationsTe.pdf Solo gestori archivio Descrizione: articolo Early Access Tipologia: versione editoriale (VoR) Dimensione 2.35 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	2.35 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
melis_20_pp.pdf Open Access dal 25/10/2022 Tipologia: versione post-print (AAM) Dimensione 1.25 MB Formato Adobe PDF Visualizza/Apri	1.25 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/322408

Citazioni

ND

21

15

social impact