UNICA IRIS Institutional Research Information System

Big data come with new challenges for network intrusion detection as it provides large-scale data with a variety of sophisticated attacks (e.g., malware, advanced persistent threats APTs, zero-day attacks). For that, the demand for new tools and approaches specialized in big data analytics is increasing. In addition, The false alarm rate of anomaly-based intrusion detection systems (IDS) is a major concern. The majority of the existing methods for large-scale network intrusion detection reach a high false-positive rate (FPR) due to the class imbalance of large-scale intrusion datasets, which can affect the network. Subsequently, the critical challenge is to reduce FPR with the lowest decrease in true-positive rate (TPR) to retain detection quality at a feasible level. To face up to these challenges, we have proposed a new network intrusion detection system for big network intrusion based on the negative selection principle and big data frameworks. One of the promising negative selection methods of the artificial immune system (AIS) for network intrusion detection is the variable-sized detector algorithm. Unfortunately, this algorithm cannot analyze big datasets, because the generation of the radius of each detector is related to the self-space, and it will be more complex when the self-space is too big. Furthermore, the search for new detectors is done randomly, and the generated detectors do not have maximum coverage of the self and non-self-space. To confront the shortcoming of this algorithm, we have proposed an extended V-detector algorithm that is built using clonal selection and fuzzy rules, and it is implemented on Apache Spark. The proposed algorithm is scalable and more efficient when applied to large-scale imbalanced datasets. The proposed framework is implemented in a fully distributed cluster of Apache Spark workers and evaluated on the KDDcup99 benchmark dataset, on a large up-to-date dataset CICIDS2017, and on large-scale synthetic datasets. Results reveal that the proposed algorithm outperforms state-of-the-art baselines and achieves high detection accuracy of 0.9984 and 0.9994 and very low positive rates of 0.0002 and 0.0001 with comparable detection rates for the KDDcup99 dataset and the imbalanced dataset CICIDS2017, respectively. Moreover, it improves the scalability and execution time, key for big intrusion detection analysis in real-time.

Fuzzy optimized V-detector algorithm on Apache Spark for class imbalance issue of intrusion detection in big data

Kourid A.;Chikhi S.;reforgiato Recupero Diego

2023-01-01

Abstract

Big data come with new challenges for network intrusion detection as it provides large-scale data with a variety of sophisticated attacks (e.g., malware, advanced persistent threats APTs, zero-day attacks). For that, the demand for new tools and approaches specialized in big data analytics is increasing. In addition, The false alarm rate of anomaly-based intrusion detection systems (IDS) is a major concern. The majority of the existing methods for large-scale network intrusion detection reach a high false-positive rate (FPR) due to the class imbalance of large-scale intrusion datasets, which can affect the network. Subsequently, the critical challenge is to reduce FPR with the lowest decrease in true-positive rate (TPR) to retain detection quality at a feasible level. To face up to these challenges, we have proposed a new network intrusion detection system for big network intrusion based on the negative selection principle and big data frameworks. One of the promising negative selection methods of the artificial immune system (AIS) for network intrusion detection is the variable-sized detector algorithm. Unfortunately, this algorithm cannot analyze big datasets, because the generation of the radius of each detector is related to the self-space, and it will be more complex when the self-space is too big. Furthermore, the search for new detectors is done randomly, and the generated detectors do not have maximum coverage of the self and non-self-space. To confront the shortcoming of this algorithm, we have proposed an extended V-detector algorithm that is built using clonal selection and fuzzy rules, and it is implemented on Apache Spark. The proposed algorithm is scalable and more efficient when applied to large-scale imbalanced datasets. The proposed framework is implemented in a fully distributed cluster of Apache Spark workers and evaluated on the KDDcup99 benchmark dataset, on a large up-to-date dataset CICIDS2017, and on large-scale synthetic datasets. Results reveal that the proposed algorithm outperforms state-of-the-art baselines and achieves high detection accuracy of 0.9984 and 0.9994 and very low positive rates of 0.0002 and 0.0001 with comparable detection rates for the KDDcup99 dataset and the imbalanced dataset CICIDS2017, respectively. Moreover, it improves the scalability and execution time, key for big intrusion detection analysis in real-time.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2023
			
	Parole chiave
	
				Apache Spark
Artificial immune systems (AIS)
Big data
Big data analytics
Clonal selection
Cybersecurity
Fuzzy logic
Intrusion detection
MapReduce
Negative selection
V-detector
			
	Tipologia:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
s00521-023-08783-8.pdf Solo gestori archivio Tipologia: versione editoriale (VoR) Dimensione 4.61 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	4.61 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I metadati presenti in IRIS UNICA sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono protetti da diritto d'autore, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/390747

Citazioni

ND

7

6

ND

social impact