\documentstyle[twocolumn,eclepsf,IEEEtran]{article}
%%\textheight 250mm
\topmargin -2mm
%%\textwidth 180mm
%%\oddsidemargin -7mm


\def\BibTeX{{\rm B\kern-.05em{\sc i\kern-.025em b}\kern-.08em
    T\kern-.1667em\lower.7ex\hbox{E}\kern-.125emX}}
\def\bs{-0.5ex}
\newtheorem{theorem}{Theorem}
\newtheorem{excersize}{Problem}
\newtheorem{algorithm}{Algorithm}

\begin{document}
%\title{How Many Parachuters will be Needed to Find a Needle in a Pastral?}
\title{How Many Parachutists will be Needed to Find \\ a Needle in a Pastoral?}

\author{{\bf Akira Imada} \\
            \normalsize Brest State Technical University \\ [\bs]
            \normalsize Moskowskaja 267 Brest 224017 Belarus \\ [\bs]
            \normalsize akira@bstu.by \\ [\bs]}

\pagestyle{empty}
\maketitle

\begin{abstracttext}
\noindent
This article is a consideration on computer network intrusion detection using 
artificial neural networks, or whatever else using machine learning techniques. 
We assume an intrusion to a network is like 
a needle in a haystack not like a family of iris flower, and we consider how
an attack can be detected by an intelligent way, if any.
\end{abstracttext}

\bigskip
\section{Introduction}
\begin{quotation}
\begin{small}
\noindent
{\it 
The parachute drop went smoothly ... slithering down the chute and out into space ...
Flick landed perfectly, with her knees bent and her arms tucked into her sides
as she fell to the ground ... She folded her parachute into a neat bundle,
then set out to find the other Jackdaws. -- ``Jackdaws'' by Ken Follett.
}
\end{small}
\end{quotation}

\noindent
Most banks nowadays facilitate their ATM (automated teller machine) in which we may have
a personal account to which we can access with PIN-code, usually four digits of 
decimal numeral.
For security reason, if we failed to enter the PIN correctly more than three times in a row,
the PIN would loose its validity thereafter. Then what we are curious is, 
``How many trials would be needed for random challenges to reveal the secret PIN
if an infinite number of trials were permitted?''
Let's formalize this problem.

\bigskip
\begin{excersize}[Breaking a PIN]
Assuming $p$-bit octal~\footnote{You will see the reason why
{\it ``octal''} not {\it ``decimal''} later in the sub-section concerning {\it ``intron''}
in the section {\tiny EXPERIMENTS}.}numeral is employed to construct a PIN,
only one out of those $8^{p}$ possible combinations is the secret PIN.
No one except for the owner of the PIN knows it. 
Then question is, ``How many average trials-and-errors 
will be needed for a non-owner to know the PIN under a specific strategy?''
\end{excersize}

\bigskip
\noindent
This might be reminiscent of the famous problem called {\it a~needle~in~a~haystack}
which was originally proposed by Hinton \& Nowlan in 1987~\cite{hint}. 
The needle in the proposal was exactly the one configuration of 20-bit binary string,
that is, the search space is made up of $2^{20}$ points and 
only one point is the needle to be searched for.
No information such as how close is a currently searching point to the needle,
or how likely is a searching point to be the needle. See Figure~1.\\

\begin{figure}[hb]
 \begin{center}
 \epsfile{file=a-needle-0.eps,width=8cm,height=1.5cm}
 \caption{A fictitious sketch of fitness landscape of {\it a needle in a haystack}.
The haystack here is drawn as a 2-dimensional flat plane of fitness zero.}
 \end{center}
\end{figure}

\noindent
We assume that TCP connections to a computer network are represented with $n$-dimensional vectors
and those represented by intrusions are like {\it needles}
among huge amount of normal transactions which might look like a {\it haystack} or
{\it pastoral.}


\bigskip
\section{Network Intrusion Detection}
\begin{quotation}
\begin{small}
\noindent
{\it Those highly qualified hackers who provide security services to companies during 
the daytime and then go home at night to conduct totally illegal hacking are the ones 
who are the most dangerous. -- by Enis Senerdem from Turkish Daily News on 29 March 2006}.
\end{small}
\end{quotation}

\noindent
When we are to design a network intrusion detection system, which is one of the hottest topics
these days, by means of so-called a {\it soft computing} such as {\it artificial immune system},
{\it fuzzy logic}, {\it evolutionary computations}, {\it neural networks}, whatever it might be,
we need a set of sample data to train the system and to test the system afterwards.


\bigskip
\subsection{When a Family of Iris Flower is Normal Then are Others Abnormal?
--- Where is an Outlier?}
\noindent
%% As is often mentioned, 
The Spearman's iris flower 
database\footnote{University of California Urvine Machine Learning Repository. 
ics.uci.edu: pub/machine-learning-databases.}is a frequently used dataset
%% is perhaps one of the most often 
in pattern recognition/classification, data mining, etc.
As such, there have been fair amount of studies in which this 
iris flower database is employed as a dataset to train and to test 
the intrusion detection system.\\

\noindent
A total of 150 samples consists of three species: 
{\it setosa, versicolor} and {\it virginica},
each of which includes 50 samples. 
Each sample is a four-dimensional vector 
representing four attributes of the iris flower, that is,
{\it sepal length, sepal width, petal length}, and {\it petal width}.\\

\noindent
Let us take an example where this iris flower dataset was employed.
Castellano et al.~\cite{cast} assumed one family to be normal 
whilst the other two to be abnormal.
The whole dataset was divided into 10 parts each of which has  
15 samples uniformly drawn from the three classes.
The system is trained by the remaining 135 samples.
The originally picked up 15 samples are used to test the results. 
After this 10-fold cross validation, 
the authors concluded that %% after training the system shows
the abnormal detection rate is 96\% while the false alarm rate is 0.6\%.
%% for dataset kept by test the system.
How nice, isn't it?
In reality, however, it is not so simple.
It might not be difficult at all for a hacker to find an unlearned region
which could work to invade the system.\\

\noindent
We now look at the Figure 2 to see how the three species  are distributed in 
the whole search space. This is depicted by the Sammon Mapping. \\

\noindent
Sammon Mapping maps a set of points in a high-dimensional space to 
the 2-dimensional space with the distance relation being preserved as much as possible,
or equivalently, the distances in the $n$-dimensional space are approximated 
by distances in the 2-dimensional space with a minimal error.\\

\noindent
Just a brief look at the figure reveals us that there remains an enormously
wide region of unlearned for outliers.

\begin{figure}[h]\label{fig:iris}
\begin{center}
%%   \epsfile{file=final-2d.eps,width=8cm,height=4cm}
     \epsfile{file=iris.eps,width=8cm,height=5cm}
\caption{A 2-dimensional visualization of iris flower data by Sammon Mapping.
Three different families of iris flower each contains 50 samples
are represented in the figure with circles, triangles and squares.}
\end{center}
\end{figure}


\bigskip
\subsection{Intrusion Might Look Like a Needle in a Hay!}
\noindent
The other type of dataset, naturally more often employed in the context of network intrusion,
%%is a so called 
is the KDD-cup-1999 dataset
which was prepared by MIT Lincoln Laboratory
as a dataset for the 1998 DARPA intrusion detection 
evaluation \cite{stol}.
This dataset has been, and still is going to be,  a common benchmark for evaluation of
intrusion detection techniques.\\

\noindent
%%As any literature survey suggests,
KDD dataset, beside {\it Normal} data,
covers four major categories of {\it attacks}:
(i) {\it Probing} attacks which attack by proving a vulnerability of the network;
(ii) {\it Denial-of-Service (DoS)} attacks which try an invasion by
                                   denying legitimate requests to a system; 
(iii) {\it User-to-Root (U2R)} attacks which tries an unauthorized access 
                                   to local super-user or root; and 
(iv) {\it Remote-to-Local (R2L)} attacks 
which is an unauthorized local access from a remote machine.
These four categories of attacks include  a total of 32 different attack types.\\

\noindent
The dataset consists of two sub-datasets. The one is provided as {\it training} data
and contains 4,898,430 records
each of which is labeled as either normal, 
or attack indicating one specific attack out of the 32 types.\footnote{The labeled 
training dataset includes 
972,780 Normals, 41,102 Probes, 3,883,370 DoSs, 52 U2Rs, and 1,126 R2Ls.}
The second is unlabeled and  contains 311,029 records, 
which is provided as {\it testing} data.\\

\noindent
What a huge dataset! In fact, the Sammon Mapping we had tried in the iris dataset above
wouldn't work any more. Therefore, many have tried various approaches to reduce the dimension.
Let's start our small literature survey with this topic of {\it dimension reduction.}\\

\noindent
Kuchimanchi et al.~\cite{kuch} 
used the {\it principal component analysis} (PCA), 
and calculated the first most important 
19 attributes.\footnote{They are {\it src\_bytes, 
dst\_bytes, duration, is\_guest\_login, is\_host\_login, srv\_diff\_host\_rate,
diff\_srv\_rate, service, flag, protocol\_type, num\_root, hot, num\_compromised,
dst\_host\_same\_srv\_rate, dst\_host\_count, rerror\_rate, srv\_count}, and
{\it dst\_host\_srv\_diff\_host\_rate}.}Then they evaluated 
the result of this dimension reduction by providing 
both the original 41-dimensional data and 
those 19-dimensional data reduced by PCA to a {\it decision-tree-classifier} independently, 
comparing {\it detection accuracies} and 
{\it false 
positive}\footnote{I.e., recognizing attack as normal.}rates.
They showed {\it detection accuracy} and {\it false positive} rate were
99.92\% and 0.26\%, respectively, on the 19-dimensional PCA data, 
while 99.94\% and 0.23\%, respectively, on the original 41-dimensional data.\footnote{This was 
not the main purpose of the paper. The authors rather exploited the other 
methods of dimension reduction such as {\it neural-network-PCA} or 
{\it nonlinear-component-analysis}, expecting more efficiency and higher accuracy. 
The evaluation was carried out not only 
by {\it decision-tree classifier} but also by {\it non-linear classifier}.}What a successful 
result! However, is this still very huge, is it not?.\\

%% Yes, indeed. 
%% Nevertheless, however, we have had a lot of successes reported so far.
%% Let us name a few more. 

\noindent
Let's see one more example.
Joshi et al.~\cite{josh} wrote, 
{\it ``Exploiting only 5 out of 41 
attributes\footnote{I.e., {\it src\_bytes, dst\_bytes, duration, is\_host\_login}, and 
{\it is\_guest\_login}.}the best results was 79\% accuracy in correctly detecting attacks,
and 21\% is accounted 
for {\it false positive} rate 
plus {\it false negative}\footnote{I.e., recognizing normal as attack.}rate.''}~\footnote{Most
of the phrases cited in this article appeared hereafter like {\it ``$\cdots$''} are the ones
paraphrased, more or less, by the author of this article. As such, 
if there are some incorrect expressions, it is the author of this article 
who is responsible for, not the original authors.}\\

\noindent
Though it might not be so successful as the above result by Kuchimanchi et al.,
if we consider {\it 5 out of 41 attributes}, it is amazing. Wow!\\

\noindent
Anyway, it is good to know we can reduce the dimension of the original
KDD-cup-1999 dataset into at least about half with the result remaining intact.\\

\noindent
Then, our next interest will be, ``Are all of the attack types in the KDD-cup-1999 dataset 
equally willing to wait to be detected?''
Some of the reports were from this point of view. Let us name a few.\\

\noindent
Pan et al.~\cite{pan1} exploited 
{\it three-layer (70-14-6) feed-forward neural network} with a sigmoid transfer function
trained with back-propagation using scaled conjugate gradient decent,
to detect five typical types of attacks
-- neptune, portsweep, satan, buffer\_overflow, and guess\_passwd --
as well as normal samples.\\

\noindent
Let's see what they observed. Authors wrote,
{\it ``The test result indicates that 99.6\% of the normal examples 
were recognized correctly, and
for three attacks of neptune, satan, and portsweep,
we obtained the average detect rate of 96.6\% and the false positive rate of 0.049\%.
However, for all the five kinds of attacks, 
we only obtained the average detect rate of 64.9\% and the false positive rate of 26.7\%. 
This is because all buffer\_overflow and guess\_passwd attacks failed to be classified by
this back-propagation neural network.
Then we tried an expert system,
and found that buffer\_overflow and guess\_passwd attacks can be more 
accurately detected by this rule-based detector than neural network.''}
And then concluded, 
{\it ``The model based on both neural network and expert system finally achieved 
the average detection rate of 93.28\% and false positive rate of 0.2\% 
for all of these five attack types.''}\\

\noindent
We, however, would be rather more interested in 
why this neural network failed to classify 
{\it buffer\_overflow} and {\it guess\_passwd} attacks, than the performance
improvement by using rule-based detector.\\

\noindent
Pan et al.~reported yet another result in their different article~\cite{pan2}.
With the same architecture of neural network and with the same target of five attacks as above,
they reported that 
correctly predicted (normal, neptune, satan, portsweep, buffer\_overflow, guess\_passwd)
by this back-propagation neural network 
was (73.3\%, 99.2\%, 94.6\%, 94.2\%, 0.0\%, 0.0\%).
And concluded, ``{\it The back-propagation  network can't detect 
the buffer\_overflow and guess\_passwd attacks.''} This sounds like 
a realistic assertion, and the one we want.\footnote{Again they reported
a successful improvement of this result by a hybridization with C4.5.}\\

\noindent
Thus far, such more careful conclusions appear in the recent literatures.
For example, Stibor et al.~\cite{stib} wrote,
{\it ``The real-valued negative selection with variable-sized detectors has 
poor classification performance on the high-dimensional KDD dataset.''}\\
%% Univ of Memphis

\noindent
When this artificial immune system based detector was proposed by Ji et al.~\cite{ji},
the result of applying it to the {\it iris dataset} was not that bad. That is,
the correct detection rate of 
({\it setosa}, {\it versicolor}, {\it virginica}) was 
(99.98\%, 85.95\%, 81.87\%), while false alarm rates were all zero!\\

\noindent
As another example of  such {\it implicit} report of failure,
Dam et al.~\cite{dam} claimed, 
{\it ``The evolutionary classifier system, devised
to make its performance improved than the traditional one,
resulted in the detection rate of 
(95.7\%, 49.1\%, 93.0\%, 8.5\%, 3.9\%)
for (normal, DoS, Probe, U2R, R2L).''}\\

\noindent
Again, we are rather more interested in why detection rate is so low for U2R and R2L
than whether result is satisfactory or not.\\

\noindent
Finally, it would be interesting to take a look
what Sabhnani et al.~\cite{sabh} reported.
See Table 1 to have a bird's eye view of those results above.\\


\begin{table}[h]
\begin{center}
\caption{Detection rate for 4 attack types each
with 9 different machine learning technique.
From Sabhnani et al.~\cite{sabh}.}%% (without permission)}
\begin{small}
\hspace*{1in}\\
\begin{tabular}{l c c c c} \hline
\hspace*{.1in}                     & Probe  & DoS  & U2R  &  R2L \\ \hline
Multi-layer Perceptron              & 88.7   & 97.2 & 13.2 & 5.6  \\
Gaussian Classifier                & 90.2   & 82.4 & 22.8 & 9.6  \\
$K$-mean Clustering                & 87.6   & 97.3 & 29.8 & 6.4  \\
Nearest Cluster Algorithm          & 88.8   & 97.1 &  2.2 & 3.4  \\
Radial Basis Function              & 93.2   & 73.0 &  6.1 & 5.9  \\
Leader Algorithm                   & 83.8   & 97.2 &  6.6 & 1.0  \\
Hypersphere Algorithm              & 84.8   & 97.2 &  8.3 & 1.0  \\
Fuzzy Art Map                      & 77.2   & 97.0 &  6.1 & 3.7  \\
C4.5 Decision Tree                 & 80.8   & 97.0 &  1.8 & 4.6  \\ \hline
\end{tabular} 
\end{small}
\end{center}
\end{table}

\noindent
Also note that KDD-cup-1999 winner's detection rate
for (Probe, DoS, U2R, and R2L) was (83.3\%, 97.1\%, 13.2\%, 8.4\%).\\

\bigskip
\noindent
{\bf Our Conjecture}\\
Here, we, conjecture that those sometimes observed poor results
are because some of the attack data are like needles in a haystack of huge amount of normal data.
If we were able to fully visualize such large size of normal samples 
together with
a few data picked up from abnormal samples,
the latter might look like a needle in a hay stack of the former, like in Figure~3. 
Though we are not yet ready, we plan to show a visualization of this assumption of us 
elsewhere, to study this conjecture further in detail.\\

\begin{figure}[hb]
 \begin{center}
 \epsfile{file=rnd-500-5.eps,width=6cm,height=4cm}
%\epsfile{file=attack-in-normals.eps,width=5cm,height=3cm}
 \caption{We conjecture that some attack data (filled circles) are like needles
in a hay of normal data (empty circles). Plots in this figure are all fictitious.}
 \end{center}
\end{figure}

\bigskip
\noindent
To summerize this section, we ask the readers,
%%\bigskip
\begin{excersize}[A Challenge in KDD-cup-99 dataset]
Design an intrusion detection system which has 41 inputs corresponding to attributes from
KDD-cup-1999 dataset, and 5 YES/NO outputs indicating that the input is either  normal,
Probe, DoS, U2R, or R2L. The question is, ``Such design is possible or not?''
\end{excersize}

\bigskip
\noindent
Also see Figure 4 to get an image of real implementation by a neural network, as an example.
\begin{figure}[hb]
 \begin{center}
 \epsfile{file=b-box-kdd.eps,width=8cm,height=4cm}
\caption{A simple architecture of neural network we desire to design
to classify KDD-cup-99 dataset.}
 \end{center}
\end{figure}


\bigskip
\section{Experiments} %% Let's Look for a Needle!
\begin{quotation}
\begin{small}
\noindent
{\it 
Flick remembered
the legend of the Jackdaw of Rheims, the bird that stole the bishop's ring.
The monks couldn't figure out who had taken it, so the bishop
cursed the unknown thief. Next thing they knew, the jackdaw
appeared all bedraggled, and they realized he was suffering from the effects
of curse, and must be the culprit.
Sure enough they found the ring in his nest. -- ``Jackdaws'' by Ken Follett.
}
\end{small}
\end{quotation}

\bigskip
\noindent
Assuming our conjecture that real attack samples are like needles in a haystack of normal samples,
we now look at how easy or difficult to find them.
Let's start with a random search.
Note that some proposed algorithms 
which were reported as success
actually were not good as asserted,
and sometimes found to be worse than a random search.

\bigskip
\subsection{Random Fall of Parachutists}
\begin{algorithm}[Random Fall]
(1) Create a $p$-bit octal PIN at random.
(2) Create randomly one $3p$-bit of binary string.
(3) Translate the string into $p$-bit octal code.
(4) Check if the translated code matches the PIN.
(5) If matches, end the run. Otherwise go back to 2.
\end{algorithm}


\bigskip
\noindent
Let us allow to use a metaphor here.
We now assume only one needle in a pastoral, 
and parachutists fall from the airplane in the sky to the pastoral
one by one, then our interest is on how would it be likely for a parachutist
to fall just on the needle.
This might be taken as a random search,
and will be our criterion of comparison hereafter. \\ 
%% in other words, our experiment for {\it placebo}.\\

\noindent
Note that one parachutist is represented by our genotypes of 
a $3p$-digit binary strings. Let's take an example of $p=4$.
A genotype
\begin{displaymath}
    ((100) (111) (000)(010))
\end{displaymath}
maps into its phenotype (4 7 0 2).\\

\noindent
At the start, one $p$-bit octal PIN is created which we assume no one knows {\it a priori}.
With $p$ being increasing from~2, we count the number of 
randomly created genotypes until its phenotype strictly matches to the hidden PIN.
The average number, during 1024 runs, of parachutists
needed until we found the parachutist who fell on the needle just by chance,
for $p = 2, 3, 4, 5, 6, 7$ were, respectively,
\begin{equation}
   (66, 512, 3951, 32154, 254673, 2058527).
\end{equation}
See Figure 5.

\bigskip
\subsection{What if Parachutists are Allowed to Walk after Fall?}
\begin{algorithm}[Exploration after Fall]
(1) Create a $p$-bit octal PIN at random.
(2) Create randomly one $3p$-bit of binary string.
(3) Translate the string into $p$-bit octal code.
(4) Check if the translated code matches the PIN.
(5) If matches, end the run. Otherwise give a mutation
by flipping a bit chosen at random\footnote{We will call 
this a {\it ``point-wise mutation''} hereafter.}with a probability of $1/3p$
until the translation matches the PIN, or number of steps exceeds 1000.
(6) If still none has matched, then repeat from 2.
\end{algorithm}
\bigskip

\noindent
What will happen if the fallen parachutist is allowed to explore, say, 1000 random
steps, around the spot they fall?
This  might remind you of the seminal experiment once made by 
Hinton \& Nowlan \cite{hint} who referred it to
{\it ``lifetime learning --- Baldwin Effect,''}
though our parachutist in this paper ends her life without creating a next generation.
The results, again over the average of 1024 runs,  
for $p = 4, 5, 6, 7, 8$ were, respectively,
\begin{equation}
   (5, 36, 308, 2436, 23087).
\end{equation}


\noindent
The results are depicted also in Figure 5 together with the result of our random parachutists
in the previous sub-section.
In both cases, we can see that complexity to find the needle is an exponential order.
But look! How impressive an {\it exploration-after-fall} improves the performance!\\

\noindent
As you have probably noticed already, however, it's not fair just to compare the number of 
parachutists. The total number of points searched by those walking parachutists
is plotted as a function of $p$ in Figure 6. 
We can see that the result was rather worse than our random parachutists,
despite of its superficial good looking of the result.
%% In fact, for $p = 7$ we cannot find any parachutist even if we run the algorithm
%% for 3 days in a raw using a standard pentium PC.


\begin{figure}[hb]
\begin{center}
\epsfile{file=parachuters.eps,width=8cm,height=5cm}
\caption{(a) Number of random creations of candidate until the PIN is matched 
(filled circles), and (b) The number when created candidate is allowed random walks
of 1000 steps (empty circles). Both are the average of 1024 runs.}
\end{center}
\end{figure}

\begin{figure}[hb]
\begin{center}
\epsfile{file=actual-how-many.eps,width=8cm,height=5cm}
\caption{Average number of points explored by all the randomly created candidates
who are allowed further random walks of 1000 steps until the needle is found.
Average are taken among 1024 runs.}
 \end{center}
\end{figure}


\bigskip
\subsection{Neutral Mutation}
\begin{algorithm}[Walk by Neutral Mutations]
(1) Create a PIN at random.
(2) Create one genotype at random.
(3) Try point-wise mutation on the genotype such that the result maps 
into the same phenotype as the one  before the mutation.
(4) Assess all possible single-mutation-neighbors of the new genotype 
to determine whether any new phenotype is discovered.
(5) Step 3 to 4 are repeated untill the phenotype matches the PIN, or
untill a pre-fixed number of steps is reached. 
\end{algorithm}

\bigskip
\noindent
This is a paraphrase of the algorithm proposed by Shipman et al.~\cite{ship}
who called the step 3 a {\it neutral mutation} (Note that the mutation in step 4 
is a standard one).
Its efficiency was studied in their paper by applying it to
a random Boolean network and
telecommunication networks.
But why not more simple example is to be explored, if it is to work universally?\\

\noindent
To apply this in our problem of searching for the needle, that is,
octal $p$-bit PIN, we design our genotype 
as $15p$-bit binary string such that the {\it number-of-1 (mod 8)}
in each of those 15-digit blocks in the string 
maps into one bit of the corresponding 
octal code.\footnote{A simple consideration might give us the idea that
7-bit binary for each octal is enough. However, we implemented in this way
so that each octal from 0 to 7 are created uniformly at random.}
For example
\begin{displaymath}
    ((100011000000100) (111111111111111) (111110001101010))
\end{displaymath}
maps into (4 7 1).\\

\noindent
The average number, during 1024 runs, of parachutists needed 
until we found the one who firstly reached the needle 
for $p = 2, 3, 4, 5, 6$ were, respectively,
\begin{equation}
   (71728, 583593, 4930624, 36592634, 314817878).
\end{equation}
Much worse than our random search.

\bigskip
\subsection{Does Neutral Mutation on Intron Enhance Efficiency of Search?}
\begin{algorithm}[Neutral Network]
(1) Create a PIN at random.
(2) Create randomly an initial individual
which is considered to be the winner to the next generation.
(3) Carry out point-wise mutation on the winning parent to generate 4 offspring.
(4) Construct a new generation with the winner and its offspring.
(5) Select a winner from the current population using the following rules. 
      (i)  If any offspring has a better fitness than the parent,
           the one with highest fitness becomes the winner.
      (ii) If fitness of all offspring have the same fitness as the parent,
              one offspring is randomly selected, and
           if the parent-offspring pair has a Hamming distance within the permitted range, 
              the offspring becomes the winner, otherwise the parent remains as the winner.
(5) Back to step 2 unless the maximum number of generations reaches, or a solution is found.
\end{algorithm}

\bigskip
\noindent
The description of the  algorithm above is a paraphrase from Yu \& Miller~\cite{tina}.
As for the application of this algorithm, 
we had an interesting discussion between 
Yu \& Miller's paper 
{\it ``Finding needles in haystack is not hard with neutrality''} (2002)
vs.~Collin's 
{\it ``Finding needles in haystack is harder with neutrality''} (2005).\\

\noindent
What Yu \& Miller~\cite{tina} attacked as a type of {\it a needle in a haystack} problem
was to make a genetic algorithm construct
an even-$n$-parity logic circuit
by employing only {\it XORs} and {\it EQs}, not {\it ANDs} and {\it ORs} and so on,
which shows a peculiar fitness landscape.
The even-$n$-parity logic has $n$-bit binary inputs and if and only if 
the number of ``1'' is even, it returns 1 and otherwise returns 0.
Hence, we can evaluate the fitness value of any one candidate of the solution, 
by giving all the possible configurations of 0 and 1 and counting how many correct outputs.
Thus, from a combinatorial point of view, we have $2^n$ cases of fitness values.
In reality, however,
we have only three different values, that is, $2^n$, $2^{(n-1)}$ and 0. In other words,
the output is all correct, half correct, or not correct at all. 
For example, candidates of even-3-parity constructed only by {\it XORs} and {\it EQs}
returns either 8, 4 or 0 correct outputs for the eight possible inputs 
(000), (001), (010), ..., (111).\\

\noindent
Yu \& Miller wrote,
{\it ``In the case of random creation of 4,000,000 candidates of even-12-parity,
the solution (fitness 4096) was never emerged, while even-10-parity
100,000 random creations of candidate yielded 540 solutions (fitness 1024).
On the other hand, when neutral mutation was applied
to the candidates of even-12-parity,
the 48 out of 100 runs reached
solution(s) with each run being only within 10,000 iterations.''}\\

\noindent
Collins argued back 
concluding, {\it ``Reported success is due to a bias of the selection''} \cite{coll}.
In the other  Collin's work~\cite{coll-2}, it was analytically shown that
the number of possible candidates of even-12-parity
is $1.315 \times 10^{139}$ in which number of
real solutions is $2.568 \times 10^{132}$,  claiming {\it ``Yu \& Miller's result is, therefore,
worse than a possible random search.''}\\

\noindent
Again what we want to emphasize here is, if the assertion by Yu \& Miller is universally true,
it would work in yet more principally simple examples. \\

\noindent
Before going further, let's see 
what is {\it intron} that Yu \& Miller assumed to play an important role in their evolution.
For example, take a look at a genotype representing an even-3-parity,
\begin{displaymath}
((EQ,A,B)(EQ,C,D)(XOR,1,E)(EQ,F,G)(EQ,3,H))
\end{displaymath}
where each gene which corresponds to one unit
constructs triples, with the 1st being which logic to be used (EQ or XOR);
and with the 2nd and 3rd being connections to either one of 
the inputs or the outputs of a previous unit.
Note that the 2nd and 4th genes in the above example 
do not contribute to construct the phenotype 
since those two genes will not be connected to any other unit,
and hence are called {\it intron} as a biological metaphor.
Any mutation on an intron has no effect on phenotype, and as such, 
they are  called {\it neutral}.
The above genotype can be interpreted as the phenotype shown in Figure 7.\\
\begin{figure}[hb]
 \begin{center}
 \epsfile{file=even-5-parity.eps,width=7cm,height=1.8cm}
\caption{An example of phenotype of even-3-parity constructed only by EQs and XORs.}
 \end{center}
\end{figure}


\noindent
Now we try to apply this to our finding PIN problem.
This time we use 4-digit binary, instead of 3-digit as before, to represent one octal
numeral in the candidate of PIN. Then translation is into decimal, instead of octal,
and when the translated decimal is larger than seven we consider it an {\it intron}.
For example
\begin{displaymath}
((0001)(1100)(0101)(0010)(0111))
\end{displaymath}
is translated into $(1, 5, 2, 7)$ since the second gene is translated into 12 and 
supposed to be an {\it intron}.\\

\noindent
The average number, during 1024 runs, of parachutists needed to firstly find the needle 
for $p = 2, 3, 4, 5$ were, respectively,
\begin{equation}
   (65, 488, 3751, 33710).
\end{equation}
Alas, if we compare it with (1) we will see that the result is 
almost the same as our random parachutists.\\


\bigskip
\noindent
{\bf Our Second Conjecture}\\
We have no such algorithm that can more efficiently look for 
{\it a needle in a haystack} than a random search. 
%% , if not the scale is small enough. 
No way to find needles in a pastoral.

\bigskip
\section{Discussion}
\noindent
As Laskov et al.~\cite{lask} claimed in their paper,
{\it ``Labels can be extremely difficult or impossible to obtain. Analysis of network 
traffic or audit logs is very time-consuming and usually only a small
portion of the available data can be labeled. Furthermore, in certain cases,
for example at a packet level, it may be impossible to unambiguously assign
a label to a data instance.''} Authors further wrote,
{\it ``In a real application, one can never be sure that a set of available labeled
examples covers all possible attacks. If a new attack appears, examples of it
may not have been seen in training data.''} Then our next question is,\\

%%\bigskip
\begin{excersize}[Attacks by Mutants]
Pick up at random a set of $n$ normal samples from KDD-cup-1999 dataset.
All of those $n$ samples are given a point-wise mutation and taken as attack data.
Train your intrusion detection system using half of the normal samples 
 and half of the attack samples (the number of both is $n/2$), 
then test the system using the remaining samples. 
Can the system detect those mutants as intrusion?
\end{excersize}

\bigskip
\subsection{Can a Sommelier be Trained without Bootlegs?} 
\noindent
Though we have not remarked so far, there remains further difficult issue, that is,
``How the system can learn only from normal data to detect abnormal?'' 
We usually have enormous amount of normal data but we have no information
about coming attacks untill it's too late.\\ 

\noindent
Gomez et al.~\cite{gome} claimed, {\it ``A new technique for generating
a set of fuzzy rules that can characterize the non-self (abnormal) space 
using only self (normal) samples.''}
Their experiment  employed 10\% dataset, also given as a part of KDD-cup-1999 dataset,
which reduced the number of records into 10\% of the original ones. 
Further, they removed categorical attributes and 
normalized these remaining 33 numerical attributes
between 0 and 1 using the maximum and minimum values found.
Then 80\% of the {\it normal} samples were picked up at random for training 
while the remaining 20\% along with the same number of abnormal samples were used for testing. 
Gomez et al.~designed the detector with what they called an  {\it ``immuno-fuzzy approach''}
and system they call  an {\it ``evolving fuzzy rules detectors''}
claiming, {\it ``It detects attacks with the detection rate 98.30\% 
and false alarm rate 2.0\%.''} Really satisfactory, if it's really true.\\

\noindent
The report didn't mention about the categories of attacks, which implies
the reported success is an average over all attack types. 
It seems to be too good if we consider the results of the other not-so-happy reports 
mentioned above.\\

\noindent
More important thing to notice here is the system learned from
{\it ``only with normal data''} to establish this success.
It would be terrific if it was really true, but we are fishy more or less.\\

\noindent
This issue is something like we require a wine-taster 
to recognize bootleg champagne by only providing him/her 
plenty of real champagne to learn.\footnote{Or, in an opposite way.
I usually enjoy Georgian sparkling wine like once a week, but still a real champagne
would be able to pretend to be a Georgian one to me.''}\\

\noindent
Though this {\it training-only-with-normal}
is our ultimate goal, but not so simple to be realized.
To study how this is difficult, why not try the following?\\

\bigskip
\begin{excersize}[Dummy Attacks]
(1) Prepare two sub-datasets from KDD-cup-1999 dataset.
One is picked up from normal samples and call it  $D_{\mbox{\small normal}}$.
The other is from  attack samples and call it $D_{\mbox{\small attack}}$.
(2) Furthermore, randomly create an attack dataset -- dummy attacks, and call it 
$D_{\mbox{\small dummy}}$. 
(3) Train your intrusion detection system only with $D_{\mbox{\small normal}}$.
(4) Then, try two tests, one with only $D_{\mbox{\small attack}}$, and 
the other with only $D_{\mbox{\small dummy}}$, avoiding any a~priori prediction.
\end{excersize}

\bigskip
\subsection{Don't We Expect the Result a priori?}
\noindent
{\it ``Artificial immune system detects an attack by computer viruses!''}
How fantastic it sounds. 
While we wish it would work, we are afraid it might be just a {\it fantasy.}
So, we need a placebo experiment.\\

\noindent
Of the 311,029 records in the test set of KDD-cup-1999,
the rate of (Normal, Prove, DoS, U2R, R2L) is 
(19.5\%, 1.3\%, 73.9\%, 5.2\%, 0.1\%), respectively.
This suggests that even the {\it always-return-U2R strategy}\footnote{which returns 
U2R whatever the input is.} for instance,
would result in the accurate detection rate of 
(Normal, Probe, DoS, U2R, R2L) = (0.0\%, 0.0\%, 0.0\%, 5.2\%, 0.0\%).
Or, the {\it always-return-a-random-output strategy}\footnote{which returns 
either Normal, Probe, DoS, U2R, or R2L at random regardless of the input.}would have quite 
a high score to detect DoS attacks.\\

\noindent
The two strategies above might be more intelligent than some of the {\it artificial intelligent}
techniques so far proposed. Yes, rather than {\it ignorant.}\\

\noindent
We have to be careful, because we sometimes tend to
{\it unconsciously} pick up only a set of data that will be 
suitable to draw our {\it a priori} expected conclusion,
if not {\it intentionally} at all.\\

\noindent
In the way that just a  {\it powder-from-sugar} sometimes 
has a same effect as, or more efficient than, a medicine under developing
enough to cure a disease for a group of innocent volunteers.
Let's conclude the discussion with the following final question.\\

\bigskip
\begin{excersize}[Placebo Experiment]
(1) Create a simple device which randomly returns 
either one of Normal, Prove, DoS, U2R, or R2L for any input. 
(2) Prepare a test dataset including enough amount of 
records uniformly from Normal, Prove, DoS, U2R, and R2L.
(3) Compare the performances of the detector you designed
with the random-reply-machine created in step 1,
feeding the same dataset prepared in step 2.
\end{excersize}


\bigskip
\section{Concluding Remarks}
\noindent
%Statistics again.
As we have described so far,
KDD-cup-1999 intrusion detection dataset has 
4,898,430 records in the labeled dataset for {\it training} purposes
of which 75.6111\% are normal.
On the other hand, 
we have 311,029 records in the unlabeled dataset for {\it testing} purposes
of which only 0.0733\% are U2R, for example.
Under this situation,
a likely interpretation would be the U2R attack patterns are 
like needles in a haystack of normal patterns when they undergo a test, 
if we are not very lucky.
Considering we have not had so satisfactory results to detect U2R attacks,
we do not seem to be so lucky.\\

\noindent
In addition, if we take it account that a hacker is a person who is extremely good at finding 
a pattern which is very close to the normal traffic,
the point that might be located by a hacker is not a randomly located point.\\

\noindent
It is said that we have two kind of intrusion detection.
One is called {\it misuse} detection which recognizes known attack patterns. 
The other is called {\it anomaly} detection which detects no-normal unknown patterns.
We are not interested in the former.
All we want is to detect unknown outliers.
And an {\it outlier} usually lies not far from normal 
but very close to it. We could not be so optimistic.\\

\noindent
As for using an {\it artificial immune system}, for example,
since that real sensational proposition in 1994 by Forrest, Perelsen et al.~\cite{forr}
that claimed, {\it ``Negative selection of a metaphor of our real biological immune system
can detect anomaly as non-self in computers,''} 
we have had tremendously  lots of intelligent challenges for more than two decades, 
but all in vain in a real sense.
Still this topic is not fruitful at all from a practical point of view, as far as we know.\\

\noindent
Probably the most intelligent way of detecting a network intrusion is to 
{\it curse it and wait for the effect of the curse.}\\

\noindent
Needless to say, however, this article is not to negate the possibility, but
we hope this will be a serious challenge to intrusion detection community
to emerge real innovative ideas.


\bigskip
\begin{thebibliography}{99}

\bibitem{hint}G.~E.~Hinton, and S.~J.~Nowlan (1987)
{\em ``How Learning can Guide Evolution.''}
Complex Systems, 1, 
pp.~495--502. 

\bibitem{cast}
G.~Castellano, and A.~M.~Fanelli (2000)
{\em ``Fuzzy Inference and Rule Extraction using a Neural Network.''}
Neural Network World Journal, Vol.~3, 
pp.~361-- 371.

\bibitem{stol}
S.~J.~Stolfo, F.~Wei, W.~Lee, A.~Prodromidis, and P.~K.~Chan (1999) 
``KDD Cup knowledge discovery and data mining competition.''
{\small http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html}

\bibitem{kuch}
G.~K.~Kuchimanchi, V.~V.~Phoha, K.~S.~Balagani, and S.~R.~Gaddam (2004) 
{\em ``Dimension Reduction Using Feature Extraction Methods for Real-time Misuse 
Detection Systems.''}
Proceedings of Workshop on Information Assurance and Security, 
%% United States Military Academy, West Point , NYT, 
pp.~1555--1563.

\bibitem{josh}
S.~S.~Joshi, and V.~ V. Phoha (2005)
{\em ``Investigating Hidden Markov Models Capabilities in Anomaly Detection.''}
Proceedings of the 43rd ACM Southeast Conference, Vol.~1,
pp.~99--103.

\bibitem{pan1}
Z.~Pan, H.~Lian, G.~Hu, and G.~Ni (2005)
{\em ``An Integrated Model of Intrusion Detection Based on Neural Network and Expert System.''}
Proceedings of IEEE International Conference on 
Tools with Artificial Intelligence,  
pp.~671--672.

\bibitem{pan2}
Z.~Pan, S.~Chen, G.~Hu, and D.~ Zhangn (2003)
{\em ``Hybrid Neural Network and C4.5 for Misuse Detection.''}
Proceedings of International Conference on Machine Learning and Cybernetics, 
pp.~2463--2467.

\bibitem{stib}
T.~Stibor, J.~Timmis, and C.~Eckert (2005)
{\em ``A comparative Study of Real-valued Negative Selection to Statistical Anomaly
Detection Techniques.''}
Proceedings of International Conference on Artificial Immune Systems,
Lecture Notes in Computer Science, Vol.~3627, Springer,
pp.~262--275.

\bibitem{ji}
Z. Ji, and D. Dasgupta (2004)
{\em ``Real-valued Negative Selection Algorithm with Variable-sized Detectors.''}
Proceedings of Genetic and Evolutionary Computation Conference,
Lecture Notes in Computer Science Vol.~3102, Springer,
pp.~287--298.

\bibitem{dam}
H.~H.~Dam, K.~Shafi, and  H.~A.~Abbass (2005)
{\em ``Can Evolutionary Computation Handle Large Dataset?''}
Technical Report: The Artificial Life and Adaptive Robotics Laboratory,
TR-ALAR-200507011.

\bibitem{sabh}
M.~Sabhnani, and G.~Serpen (2003)
{\em ``Application of Machine Learning Algorithms to KDD Intrusion Detection Dataset 
within Misuse Detection Context.''}
Proceedings of the International Conference on Machine Learning:
Models, Technologies and Applications,
pp.~209--215.

\bibitem{ship}
R.~Shipman, M.~Shackleton, and I.~Harvey (2000)
{\em ``The Use of Neutral Genotype-phenotype Mappings for Improved Evolutionary Search.''}
BT Technology Journal, Vol.~18, No.~4, 
pp.~103--111.

\bibitem{tina}
Tina Yu and J. Miller (2002)
{\em ``Finding Needles in Haystacks is Not Hard with Neutrality.''}
Proceedings of EuroGP 2002, 
Lecture Notes in Computer Science Vol.~2278, Springer,
pp.~13--25.

\bibitem{coll}
M.~Collins (2005)
{\em ``Finding Needles in Haystacks is Harder with Neutrality.''}
Proceedings of Genetic and Evolutionary Computation Conference,
pp.~1613--1618.

\bibitem{coll-2}
M.~Collins (2004)
{\em ``Counting Solutions in Reduced Boolean Parity.''}
CD-ROM of Workshop Proceedings in Genetic and Evolutionary Computation Conference.

\bibitem{lask}
P.~Laskov, P.~D\"{u}ssel, C.~Sch\"{a}fer, and K.~Rieck (2005)
{\em ``Learning Intrusion Detection: Supervised or Unsupervised?''}
Proceedings of International Conference on Image Analysis and Processing,
Lecture Notes in Computer Science, Vol.~3617, Springer,
pp.~50--57.

\bibitem{gome}
J.~Gomez, F.~Gonzalez, and  D.~Dasgupta (2003)
{\em ``An Immuno-Fuzzy Approach to Anomaly Detection.''}
Proceedings of IEEE International Conference on Fuzzy Systems, 
Vol.~2, pp.~1219--1224.

\bibitem{forr}
S.~Forrest, A.~S.~Perelson, L.~Allen, and R.~Cherukuri (1994)
{\em ``Self Nonself Discrimination in a Computer.''}
Proceedings of IEEE Symposium on Research in Security and Privacy, 
pp.~202-- 212.

\end{thebibliography}
\end{document}