About

Research Motivations

Since 2010, with Stuxnet Worm, APTs and Cyber-warfare over Cyber-Physical Systems are a major concern on Cybersecurity.
Plenty of number of known examples to describe a critical situation. And there are more that remain non-disclosed.

Year	Attack Name	Target/System Affected
2010	Stuxnet Worm	Iranian Nuclear CPS
2015	BlackEnergy Malware	Ukrainian Power Grid
2016	CrashOverride Malware	Ukrainian Power Grid
2017	Triton Malware	Petrochemical plant in Saudi Arabia
2017	NotPetya Ransomware	Cyber-physical systems of Maersk and Merck
2021	Oldsmar APT	Chemical levels in the water plant supply (US)
2021	Colonial Pipeline Ransomware	Fuel pipelines in the US
2021	Water Sector Attacks	Chemical levels in water treatment facilities (US)
2021	Iranian Railway System Attack	Iran’s railway system

Intrusion Detection Systems, or IDS in short, are systems design to detect cyber-attacks, including APTs.
State-of-the-art IDS are, of course, AI & Big Data-driven.
The following picture is modified from the original to describe the most relevant stages regarding how researchers propose or improve new technologies for IDS. We just added dotted-line-boxes and add some labels for a better depiction of the stages.

Paper: “Learn-IDS: Bridging Gaps between Datasets and Learning-Based Network Intrusion Detection” https://doi.org/10.3390/electronics13061072

Basically, researchers employ public datasets containing some form of network information where attacks happen. Then, there is a raw-data pre-processing stage in which the different datasets are conditioned and uniformly formatted for the next stages. Next stage is Data Customizing in which features are extracted following a tabular, time-series, array, or graph formats. These features subsequently are input into the IDS AI-model in which the model is trained and evaluated.
So, continuing with the following picture, as easy to deduce, researchers efforts are targeting new technologies and algorithms for the Pre-processing stage and the Model-related stages.
- Novel data-pipeline techniques are being proposed in the pre-processing stage.
- Novel feature selection methods, novel deep-learning and/or deep-reinforcement-learning models, and novel optimization techniques are being proposed for the modelling stage.

Paper: “Learn-IDS: Bridging Gaps between Datasets and Learning-Based Network Intrusion Detection” https://doi.org/10.3390/electronics13061072

However, despite advancements in intrusion detection systems, researchers continue to face significant challenges due to limitations in the datasets being used:
- Most existing datasets focus on a single data type, such as network traffic or PCAP files, which limits the comprehensiveness of the analysis.
- Attacks within these datasets are often isolated, lacking the complex, multi-stage correlations found in Advanced Persistent Threat (APT) attacks. In APTs, every action across the network or system carries a strategic significance, unlike in isolated attack scenarios.
- Dataset labeling remains a time-consuming and labor-intensive process, requiring extensive manual effort to ensure accuracy.
- Real-world APT datasets are generally unavailable due to non-disclosure policies enforced by affected organizations, making it difficult to develop solutions based on real, high-impact incidents.

Paper: “Learn-IDS: Bridging Gaps between Datasets and Learning-Based Network Intrusion Detection” https://doi.org/10.3390/electronics13061072

Create our own dataset by performing our own complex APT attacks over our own virtual network, first, over an mini-network and, later, over a more complex and more scaled network.

Description

Three Linux-based hosts hosting different elements of our network implementation.

Server	External IP Address	Elements being hosted	Network
ITM2	114.71.51.40	Ubuntu 22.04 server & Windows 2022 server	192.168.1.0/24
ITM4	114.71.51.42	Windows 10 & Windows 11 clients	192.168.2.0/24
ITMX	114.71.51.XX	Event-data collector based on Elasticsearch framework	192.168.3.0/24

Each server will host different sub-networks composed by virtual machines, virtual switches, and virtual routers.
To communicate these sub-networks between each other, and to provide internet access to these sub-networks, each of the virtual routers’ WAN interface should be in ‘Bridge’ mode.
For the ‘Bridge’ mode to work properly, it will be necessary to assign an IPv4 address from the same network range as the IPv4 addresses of each host, in the range 114.71.51.0/24.
Another solution can be to set the router’s WAN interface in ‘NAT’ mode. However, more complex configuration is needed. ‘Bridge’ mode is the simplest guaranteed strategy to provide external-network access to the sub-networks.