Saturday, April 20, 2013

GANIDS (beta 0.9) - Genetic Algorithms for Deriving Network Intrusion Rules

GANIDS (beta 0.9) - Genetic Algorithms for Deriving Network Intrusion Rules

    For the past month since late March 2013 to today 20th April 2013, I have been developing a Genetic Algorithm that can be used to derive rules for signature-based Network Intrusion Detection Systems(i.e. Snort, Bro etc.) using Python 2.7.3 and DEAP 0.9 (a Python Evolutionary Algorithm library), and DARPA dataset as training and testing data.

"In the computer science field of artificial intelligence, a genetic algorithm (GA) is a search heuristic that mimics the process of natural evolution. This heuristic (also sometimes called a metaheuristic) is routinely used to generate useful solutions to optimization and search problems" - Wikipedia

    By following the papers of Wei Li and Ren Hui Gong, and Brian E. Lavender, I'm able to implement my own with many modifications, serveral add-ons, and optimizations.

    Li suggested an approach to use GA in IDS for anomaly detection and provided a fitness function and chromosome structure. Li promised to deliver the code but it was never published.

   Gong then used Li's approach to provide pseudo codes and class diagram, and certain amount of clarity about the evolutionary process have been given. However, there was not much guideline on how the selection, crossover, and mutation operators could be be implemented. Gong suggested using ECJ Java Library to code the genetic algorithm but his code was never published anywhere.

    Brian E. Lavender was the first person who successfully implemented a genetic algorithm for this approach following the guidelines of the first two. Brian also provided a clearer modified version of pseudo code, a detailed guideline on how to build selection, crossover, and mutation operators. He is currently also the only person who published his code in his project report. His program is called netGA.

    However, while netGA meets its functional requirement and can generate rules with optimized fitness values, it still lacks in extensibility. It was modeled to run only on one sample of DARPA audit training and testing dataset. Many certain options and optimizations could be added if it was to run well on other datasets. And that is what I plan to implement and improve on.

    Nonetheless, Brian had paved for me a stepping stone that conclusively proves the possibility of integration between Network Intrusion Detection System and Genetic Algorithms. He has also been providing help and advices in the emails we have been exchanging. So I'd like to thank Brian here.

    At the moment I call my GA program GANIDS (Renamed to AceGA). It works well on different DARPA datasets, but still needs revisions.

I'll be sure to update the details and write a documentation about it soon.

Please feel free to have a test run and constructively critique.


(p.s. change the link colors for ****'s sake!)