Fisher:s iris Z. Ji and D. Dasgupata (2004) "Augmented Negative Selection Algorithm with Variable-Coverage Detectors." Proceedings of the Congress on Evolutionary Computation Ayara, M., J. Timmis, R. D. Lemos, L. N. D. Castro, and R. Duncan (2002) "Negative Selection: How to Generate Detectors." Proceedings of 1st International Conference on Artificial Immune Systems (ICARIS) pp. 89--98. ayara 6.1.1 Generating self data As earlier stated, the 8-bit data used were randomly generated. The pseudorandom number generator of the Java 2 Platform (Standard Edition version 1.3) API was used to generate integer numbers between 0 and 255, which were then converted to 8-bit binary strings. During the experiments, there was a need to generate different sizes of self set. This was carried out by creating separate files for different population sizes of self sets. 6.1.2 Setting the matching threshold The affinity between these binary strings (for the self-set, detector set and test data) was determined using the r-contiguous bits matching rule. The optimal value for matching threshold ( r ) had to be obtained by changing values of r from 1 to l. This process was done in order to obtain the combined values of correct and incorrect classification by detectors generated using a specific threshold. Correct classification value is derived from the sum of true positive (rate at which non-self is correctly detected) and true negative (rate at which self is correctly not detected). While incorrect classification is the sum of false positive (rate at which self is incorrectly detected) and false negative (rate at which non-self is not detected). Both the correct and incorrect classification values are used to determine the appropriate values of r. This is different from the approach used by (Kim and Bentley 2001) as well as the suggested method in (D'haeseleer, Forrest et al. 1996). In (Kim and Bentley 2001), the value of r was determined from the equations in (Forrest, Perelson et al. 1994), which yielded poor values of matching threshold for the corresponding data. While (D'haeseleer, Forrest et al. 1996) proposed an approach based on the greedy algorithm. Both approaches reveal that there is no hard-and-fast rule for setting this parameter, rather various values can be tested in order to select the optimal one. The following procedure was carried out to determine this parameter: J. Gomez, F. Gonzalez, and D. Dasgupta (2003) "An Immuno-Fuzzy Approach to Anomaly Detection" proceedings of the 12th IEEE International Conference on Fuzzy Systems, Vol. 2, pp. 1219-1224. KDD Cup 99 This data set is a version of the 1998 DARPA intrusion detection evaluation data set prepared and managed by MIT Lincoln Labs Experiments were conducted with the ten percent that is available at the University of Irvine Machine Learning repository 1. Forty-two attributes, that usually characterize network traffic behavior, compose each record of the 10% data set (twenty-two of them numerical). Also, the number of records in the 10% is huge (492021). 1) Experimental settings: We generated a reduced version of the 10% data set including only the numerical attributes, i.e., the categorical attributes were removed from the data set. Therefore, the reduced 10% data set is composed by thirty-three attributes. The attributes were normalized between 0 and 1 using the maximum and minimum values found. 80% of the normal samples were picked randomly and used as training data set, while the remaining 20% was used along with the abnormal samples as a testing set. Five fuzzy sets were defined for the 33 attributes. For reducing the time complexity of the ERD algorithm, 1% of the normal data set (randomly generated), was used as a training data set. C. Darpa 99 This data set, is also obtained from the MIT-Lincoln Lab [28]. It represents both normal and abnormal information collected in a test network, where simulated attacks were performed. The data set is composed of network traffic data (tcpdump, inside and outside network traffic), audit data (bsm), and file systems data. We used the outside tcpdump network data for a specific computer (e.g., hostname: marx), and then we applied the tool tcpstat to get traffic statistics. The first week's data was used for training (attack free), and the second week's data for testing (this includes some attacks). We only considered the network attacks in our experiments. 1) Experimental Settings: Three parameters were selected (bytes per second, packets per second and ICMP packets per second), to detect some specific type of attacks. These parameters were sampled each minute (using tcpstat) and normalized. Because each parameter can be seen as a time series function, the features were extracted using a sliding overlapping window of size n=3. Therefore, two sets of 9-dimensional feature vectors were generated: one as training data set and the other as testing data set. Ten fuzzy sets were defined for each feature extracted. Author cite [28] MIT lincoln labs. 1999 darpa intrusion detection evaluation. In http://www.ll.mit.edu/IST/ideval/index.html, 1999. the page assearts ------ Intrusion detection systems monitor network state looking for unauthorized usage, denial of service, and anomalous behavior. Such systems have never been formally evaluated ... until now. The Information Systems Technology Group (IST) of MIT Lincoln Laboratory, under Defense Advanced Research Projects Agency (DARPA ITO) and Air Force Research Laboratory (AFRL/SNHS) sponsorship, has collected and distributed the first standard corpora for evaluation of computer network intrusion detection systems. We have also coordinated, with the Air Force Research Laboratory, the first formal, repeatable, and statistically-significant evaluations of intrusion detection systems. Such evaluation efforts have been carried out in 1998 and 1999. These evaluations measure probability of detection and probability of false-alarm for each system under test. These evaluations are contributing significantly to the intrusion detection research field by providing direction for research efforts and an objective calibration of the current technical state-of-the-art. They are of interest to all researchers working on the general problem of workstation and network intrusion detection. The evaluation is designed to be simple, to focus on core technology issues, and to encourage the widest possible participation by eliminating security and privacy concerns, and by providing data types that are used commonly by the majority of intrusion detection systems. Downloads Off-line data sets are available to provide researchers with extensive examples of attacks and background traffic. Two data sets are the result of the DARPA Intrusion Detection Evaluations. 1998 DARPA Intrusion Detection Evaluation Data Sets 1999 DARPA Intrusion Detection Evaluation Data Sets Three additional data sets are the result of experiments run in 2000 to address specific scenarios. 2000 DARPA Intrusion Detection Scenario Specific Data Sets ===== http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html Abstract This is the data set used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99 The Fifth International Conference on Knowledge Discovery and Data Mining. The competition task was to build a network intrusion detector, a predictive model capable of distinguishing between ``bad'' connections, called intrusions or attacks, and ``good'' normal connections. This database contains a standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network environment. back, buffer_overflow, ftp_write, guess_passwd, imap, ipsweep, land, loadmodule, multihop, neptune, nmap, normal, perl, phf, pod, portsweep, rootkit, satan, smurf, spy, teardrop, warezclient, warezmaster. duration: continuous. protocol_type: symbolic. service: symbolic. flag: symbolic. src_bytes: continuous. dst_bytes: continuous. land: symbolic. wrong_fragment: continuous. urgent: continuous. hot: continuous. num_failed_logins: continuous. logged_in: symbolic. num_compromised: continuous. root_shell: continuous. su_attempted: continuous. num_root: continuous. num_file_creations: continuous. num_shells: continuous. num_access_files: continuous. num_outbound_cmds: continuous. is_host_login: symbolic. is_guest_login: symbolic. count: continuous. srv_count: continuous. serror_rate: continuous. srv_serror_rate: continuous. rerror_rate: continuous. srv_rerror_rate: continuous. same_srv_rate: continuous. diff_srv_rate: continuous. srv_diff_host_rate: continuous. dst_host_count: continuous. dst_host_srv_count: continuous. dst_host_same_srv_rate: continuous. dst_host_diff_srv_rate: continuous. dst_host_same_src_port_rate: continuous. dst_host_srv_diff_host_rate: continuous. dst_host_serror_rate: continuous. dst_host_srv_serror_rate: continuous. dst_host_rerror_rate: continuous. dst_host_srv_rerror_rate: continuous. --- training_attack_types A list of intrusion types. back dos buffer_overflow u2r ftp_write r2l guess_passwd r2l imap r2l ipsweep probe land dos loadmodule u2r multihop r2l neptune dos nmap probe perl u2r phf r2l pod dos portsweep probe rootkit u2r satan probe smurf dos spy r2l teardrop dos warezclient r2l warezmaster r2l the 1 st data 0,tcp,http,SF,181,5450,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,9,9,1.00,0.00,0.11,0.00,0.00,0.00,0.00,0.00,normal.