The evaluation of algorithms and techniques to implement intrusion detection systems heavily rely on the existence of well designed datasets. In the last years, a lot of efforts have been done toward building these datasets. Yet, there is still room to improve. In this paper, a comprehensive review of existing datasets is first done, making emphasis on their main shortcomings. Then, we present a new dataset that is built with real traffic and up-to-date attacks. The main advantage of this dataset over previous ones is its usefulness for evaluating IDSs that consider long-term evolution and traffic periodicity. Models that consider differences in daytime/nighttime or weekdays/weekends can also be trained and evaluated with it. We discuss all the requirements for a modern IDS evaluation dataset and analyze how the one presented here meets the different needs.
Financed by the National Centre for Research and Development under grant No. SP/I/1/77065/10 by the strategic scientific research and experimental development program:
SYNAT - “Interdisciplinary System for Interactive Scientific and Scientific-Technical Information”.