Anyone working in the field of network intrusion detection has been able to observe how it involves an everincreasing number of techniques and strategies aimed to overcome the issues that affect the state-of-the-art solutions. Data unbalance and heterogeneity are only some representative examples of them, and each misclassification made in this context could have enormous repercussions in different crucial areas such as, for instance, financial, privacy, and public reputation. This happens because the current scenario is characterized by a huge number of public and private network-based services. The idea behind the proposed work is decomposing the canonical classification process into several sub-processes, where the final classification depends on all the sub-processes results, plus the canonical one. The proposed Training Data Decomposition (TDD) strategy is applied on the training datasets, where it applies a decomposition into regions, according to a defined number of events and features. The reason that leads this process is related to the observation that the same network event could be evaluated in a different manner, when it is evaluated in different time periods and/or when it involves different features. According to this observation, the proposed approach adopts different classification models, each of them trained in a different data region characterized by different time periods and features, classifying the event both on the basis of all model results, and on the basis of the canonical strategy that involves all data.
Decomposing Training Data to Improve Network Intrusion Detection Performance
Saia R.;Podda A. S.;Fenu G.;Balia R.
2021-01-01
Abstract
Anyone working in the field of network intrusion detection has been able to observe how it involves an everincreasing number of techniques and strategies aimed to overcome the issues that affect the state-of-the-art solutions. Data unbalance and heterogeneity are only some representative examples of them, and each misclassification made in this context could have enormous repercussions in different crucial areas such as, for instance, financial, privacy, and public reputation. This happens because the current scenario is characterized by a huge number of public and private network-based services. The idea behind the proposed work is decomposing the canonical classification process into several sub-processes, where the final classification depends on all the sub-processes results, plus the canonical one. The proposed Training Data Decomposition (TDD) strategy is applied on the training datasets, where it applies a decomposition into regions, according to a defined number of events and features. The reason that leads this process is related to the observation that the same network event could be evaluated in a different manner, when it is evaluated in different time periods and/or when it involves different features. According to this observation, the proposed approach adopts different classification models, each of them trained in a different data region characterized by different time periods and features, classifying the event both on the basis of all model results, and on the basis of the canonical strategy that involves all data.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.