Big data come with new challenges for network intrusion detection as it provides large-scale data with a variety of sophisticated attacks (e.g., malware, advanced persistent threats APTs, zero-day attacks). For that, the demand for new tools and approaches specialized in big data analytics is increasing. In addition, The false alarm rate of anomaly-based intrusion detection systems (IDS) is a major concern. The majority of the existing methods for large-scale network intrusion detection reach a high false-positive rate (FPR) due to the class imbalance of large-scale intrusion datasets, which can affect the network. Subsequently, the critical challenge is to reduce FPR with the lowest decrease in true-positive rate (TPR) to retain detection quality at a feasible level. To face up to these challenges, we have proposed a new network intrusion detection system for big network intrusion based on the negative selection principle and big data frameworks. One of the promising negative selection methods of the artificial immune system (AIS) for network intrusion detection is the variable-sized detector algorithm. Unfortunately, this algorithm cannot analyze big datasets, because the generation of the radius of each detector is related to the self-space, and it will be more complex when the self-space is too big. Furthermore, the search for new detectors is done randomly, and the generated detectors do not have maximum coverage of the self and non-self-space. To confront the shortcoming of this algorithm, we have proposed an extended V-detector algorithm that is built using clonal selection and fuzzy rules, and it is implemented on Apache Spark. The proposed algorithm is scalable and more efficient when applied to large-scale imbalanced datasets. The proposed framework is implemented in a fully distributed cluster of Apache Spark workers and evaluated on the KDDcup99 benchmark dataset, on a large up-to-date dataset CICIDS2017, and on large-scale synthetic datasets. Results reveal that the proposed algorithm outperforms state-of-the-art baselines and achieves high detection accuracy of 0.9984 and 0.9994 and very low positive rates of 0.0002 and 0.0001 with comparable detection rates for the KDDcup99 dataset and the imbalanced dataset CICIDS2017, respectively. Moreover, it improves the scalability and execution time, key for big intrusion detection analysis in real-time.

Fuzzy optimized V-detector algorithm on Apache Spark for class imbalance issue of intrusion detection in big data

reforgiato Recupero Diego
2023-01-01

Abstract

Big data come with new challenges for network intrusion detection as it provides large-scale data with a variety of sophisticated attacks (e.g., malware, advanced persistent threats APTs, zero-day attacks). For that, the demand for new tools and approaches specialized in big data analytics is increasing. In addition, The false alarm rate of anomaly-based intrusion detection systems (IDS) is a major concern. The majority of the existing methods for large-scale network intrusion detection reach a high false-positive rate (FPR) due to the class imbalance of large-scale intrusion datasets, which can affect the network. Subsequently, the critical challenge is to reduce FPR with the lowest decrease in true-positive rate (TPR) to retain detection quality at a feasible level. To face up to these challenges, we have proposed a new network intrusion detection system for big network intrusion based on the negative selection principle and big data frameworks. One of the promising negative selection methods of the artificial immune system (AIS) for network intrusion detection is the variable-sized detector algorithm. Unfortunately, this algorithm cannot analyze big datasets, because the generation of the radius of each detector is related to the self-space, and it will be more complex when the self-space is too big. Furthermore, the search for new detectors is done randomly, and the generated detectors do not have maximum coverage of the self and non-self-space. To confront the shortcoming of this algorithm, we have proposed an extended V-detector algorithm that is built using clonal selection and fuzzy rules, and it is implemented on Apache Spark. The proposed algorithm is scalable and more efficient when applied to large-scale imbalanced datasets. The proposed framework is implemented in a fully distributed cluster of Apache Spark workers and evaluated on the KDDcup99 benchmark dataset, on a large up-to-date dataset CICIDS2017, and on large-scale synthetic datasets. Results reveal that the proposed algorithm outperforms state-of-the-art baselines and achieves high detection accuracy of 0.9984 and 0.9994 and very low positive rates of 0.0002 and 0.0001 with comparable detection rates for the KDDcup99 dataset and the imbalanced dataset CICIDS2017, respectively. Moreover, it improves the scalability and execution time, key for big intrusion detection analysis in real-time.
2023
Apache Spark
Artificial immune systems (AIS)
Big data
Big data analytics
Clonal selection
Cybersecurity
Fuzzy logic
Intrusion detection
MapReduce
Negative selection
V-detector
File in questo prodotto:
File Dimensione Formato  
s00521-023-08783-8.pdf

Solo gestori archivio

Tipologia: versione editoriale
Dimensione 4.61 MB
Formato Adobe PDF
4.61 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/390747
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
social impact