UNICA IRIS Institutional Research Information System

Learning in adversarial settings is becoming an important task for application domains where attackers may inject malicious data into the training set to subvert normal operation of data-driven technologies. Feature selection has been widely used in machine learning for security applications to improve generalization and computational efficiency, although it is not clear whether its use may be beneficial or even counterproductive when training data are poisoned by intelligent attackers. In this work, we shed light on this issue by providing a framework to investigate the robustness of popular feature selection methods, including LASSO, ridge regression and the elastic net. Our results on malware detection show that feature selection methods can be significantly compromised under attack (we can reduce LASSO to almost random choices of feature sets by careful insertion of less than 5% poisoned training samples), highlighting the need for specific countermeasures.

Is feature selection secure against training data poisoning?

Xiao, Huang;BIGGIO, BATTISTA;Brown, Gavin;FUMERA, GIORGIO;Eckert, Claudia;ROLI, FABIO

2015-01-01

Abstract

Learning in adversarial settings is becoming an important task for application domains where attackers may inject malicious data into the training set to subvert normal operation of data-driven technologies. Feature selection has been widely used in machine learning for security applications to improve generalization and computational efficiency, although it is not clear whether its use may be beneficial or even counterproductive when training data are poisoned by intelligent attackers. In this work, we shed light on this issue by providing a framework to investigate the robustness of popular feature selection methods, including LASSO, ridge regression and the elastic net. Our results on malware detection show that feature selection methods can be significantly compromised under attack (we can reduce LASSO to almost random choices of feature sets by careful insertion of less than 5% poisoned training samples), highlighting the need for specific countermeasures.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2015
			
	Codice ISBN
	
				9781510810587
9781510810587
			
	Parole chiave
	
				Human-Computer Interaction; Computer Science Applications1707 Computer Vision and Pattern Recognition
			
	Tipologia:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
xiao15.pdf accesso aperto Tipologia: versione editoriale (VoR) Dimensione 1.57 MB Formato Adobe PDF Visualizza/Apri	1.57 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/177533

Citazioni

ND

325

244

social impact