A Survey on Data Mining by Artificial Immune System
by Akira Imada (October 2004)
Data is not a knowledge but just a set of data. To simply put,
Data Mining is to extract a knowledge from a set of data.
To tell you the truth, I even haven't known that. All I thought was
"Data Mining is a tool to mine the important data from enormous
amount of mixed data from a golden one to a maddy one.
I know Data Mining is a tool to extract knowledge from data.
But how? Actually, the title of the paper I happened to come accross is
"An Artificial Immune System for Fuzzy-Rule Induction in Data Mining"
My ignorance also went for Fuzzy Logic (FL). Once I thought
"Fuzzy Logic is a logic to fuzzify an idea if we are not so clear.
Well, the latter is a joke, of course, but juking aside,
it's true that I start this topic almost from the scratch.
It will be a fun, however, to start quit a new topic. Feel free to join us
with this topic.
At this moment, this document is only a frame work
preparing for the description of the results of my survey of this topic.
Data-Mining based on Fuzzy-Logic by Artificial Immune System.
In his nicely organized book, Kecman [99] wrote, praphrasing Zadeh ---
the orignator of the Fuzzy Logic theory,
"... the Fuzzy Logic is a tool for embedding human knowledge,
which is approximate rather than exact,
into computer algolithm by using
a set of IF-THEN rules as a linguistic form of structured
human knowledge..." (also paraphrased by me).
From this point of view, Fuzzy Logic is a good candidate tool to create
a method for data-mining.
Here, we see two studys where Fuzzy Logic is used in Data Mining based on
Artificial Immune System,
The first one bellow is our TARGET paper for the time beeing.
- [0]
Alves et al. (2004)
"An Artificial Immune System for Fuzzy-Rule Induction in Data Mining."
In proceeding of Parallel Problem Solving from Nature - VIII,
LNCS 3242, pp.1011-1020. Springer.
The idea of creating Fuzzy Rules using
Artificial Immune System for the purpose of Data Mining
in this paper is very clearly described above all. As such it would be a good start to survey
this topic.
They applied their algorithm on 6 data-base from the testbed data set
given in the public domain
and thye are also put in our server. Those are
For example, CRX incledes 690 samples each of whose attributes are made up of
6 continuous 9 categorical .
The number of classes is 2 each of which inclueds 307 and 383 data, respectively.
You may start with "Votes" if you are not so familiar with Fuzzy Logic,
since all the data in "Votes" are categorical and there's no point in
fuzzuifying them.
On the contrary, if your interest is fuzzification, the data
in "BUPA", "Wisconsin" and "Ljubljana" are all continuous. The rest of 2 data set
are made up of both categorical and continuous attributes.
The performance is compared with the results by C4.5Rule --- a very popular data mining
algorithm for discovering (crisp) classification rules
You might read my article if you really explore the idea and
hints are needed to implement the algorithm.
You might see also [2] bellow which is a kind of sister paper of [0].
You would find our current target paper was totally influenced by this paper.
Hence it's very helpful to understand the background of the method
described in [0] too.
The second paper using Fuzzy Logic and Artificail Immune System for Data Mining purpose is:
Evolution of Fuzzy Rules by Evolutionary Computations
(AIS is not employed)
Not using Artificial Immune System but only using Evolutionary Computations
to evolve Fuzzy rules.
The bellow is good survey paper for the purpose.
Also a good survey from this vew point is in
- [7]
Ramao et al.
"A Genetic Algorithm for Discovering Interesting Fuzzy Prediction
Rules: applications to science and technology data"
Proccedings of Genetic and Evolutionary Computation Conf. (GECCO-2002).
In this paper the authors divide the works previously done into two categories:
"(a) Evolutionary Algorithms (EAs) evolving one or more aspects of membership
functions, such as the number of membership functions
(linguistic terms) for each attribute, the shape of the
membership functions, etc. and (b) EAs using user-defined membership functions, and
evolving only the combinations of attribute values
considered relevant for predicting a goal attribute."
While the authors preference lies on the latter category, I prefer the former. So I
attached the three papers which was refered as examples of the former's.
- [8]
Xiong and Litz (1999).
"Generating Linguistic Fuzzy Rules for Pattern Classification with Genetic Algorithms"
PKDD-99, pp. 574-579.
- [9]
Mota, Ferreira and Rosa (1999).
"Independent and Simultaneous Evolution of Fuzzy Sleep Classifiers by Genetic Algorithms"
GECCO-99, pp. 1622-1629.
They evolved the center and width of membership functions. Alas!
This is my inspiration when I first read our target paper [0].
- [10]
Mendes, Voznika, Freitas and Nievola (2001)
"Discovering fuzzy classification rules with genetic programming and co-evolution.
Principles of Data Mining and Knowledge Discovery"
Proceedings of PKDD, Lecture Notes in Artificial Intelligence 2168,
pp. 314-325. Springer-Verlag.
The algorithm proposed is based on co-evolution between two populations.
The population of fuzzy rule sets is evolved by a Genetic Programming (GP),
while population of membership function definitions is evolved by
a simple GA. Note that GP evolves tree sturctures like programs in LISP while
GA evolves single strings called chromosomes. At this moment I'm not interested in
emplying GP. So the reason I cited this paper is how the evolve membership functions.
Since they adopted "trapezoid" as membership function, and three linguistic attribute's
value, e.g., high, medium, and low, they desined three membership functions with
4 parameters. Not that the number of these parameters should be 8 in general, but
they reduce the number with some compromise to avoid a huge search space.
The idea of emplying the concept of Co-evolution is, as authers wrote, from the folloing
papers. If you are specifically interested in co-evolution, I recommend to read rather this one.
- [11]
Delgado et al. (1999)
"Modular and hierarchical evolutionary design of fuzzy systems"
Proceedings of Genetic and Evolutionary Computation Conf. (GECCO-99),
pp 180 -- 187.
Other than the topic of "co-evolution",
we find the topic of "designing fuzzy inference system by GA" we should go back
to year of 1991 at the latest, as the authors cited the following 2 papers:
[A] P. Thrift (1991) "Fuzzy Logic Synthesis with Genetic Algorithms."
Proceedings of 4th International Conference on Genetic Algorithms pp 509--513.
[B] C.L.Karr (1991) "Aplying Genetics to Fuzzy Logic." AI Expert, Vol. 6 No. 3, pp. 38--43
Those two papers, might be included when we are to write a paper on this topic.
The author sumarize that the former (1) evolves
chromosomes which represent a set of fuzzy rules with the goal being
to select adequet fuzzy sets for each of the THEN parts of those rules.
The shape of the membership functions is fixed. The latter, on the other hand,
evolves menbership functions as well as fuzzy rules.
[C] Valenzuela-Rendon (1991) "The fuzzy classifier system:
A classifier system for continuously varying variables
" Proceedings of 4th International Conference on Genetic Algorithms pp 346--353.
is also refered. In this 3rd one in 1991, each chromosome represent one rule includeing both
IF-part and THEN-part being encoded binary gene.
The other implementations hereafter to determine the shape of membership functions
are also described. Note, however, the oledest ones referred in the paper above
are both in 1991.
The one I was attracted is the paper that uses "Messy GA" in which
the length of clomothmos should not be constant, but not argue here. The paper is
[d] R. R. Yager (1993) "On a Hierarchical Structure for Fuzzy Modeling and Control"
IEEE Transactions on Systems Mand, and Cybernetics: 23(4) pp. 1189-1197.
Yet another approach is "Evolution of Fuzzy Rules." A little old paper but IMHO still
worth to read is
- [12]
F. Hoffmann (2000)
"Soft Computing Techniques for the Design of Mobile Robot Behaviours"
Journal of Information Sciences, Vol. 122, No, 2-4, pp. 241-258.
The author uses a "Messy-Genetic-Algorithm" to evolve Fuzzy Rules, and
cites two papers as related works.
- [13] M. Munirul, M. Chowdury and Yun Li (1996)
"Messy Genetic Algorithm Based New Learning Method for Structurally Optimised
Neurofuzzy Controllers."
IEEE International Conference on Industrial Technology, Shanghai, China.
Writing
"Chowdury et al. proposed a messy genetic algorithm which optimizes
the structure of a neuro-fuzzy controller."
And
- [14] D. Leitch (1995)
"A New Genetic Algorithm for the Evolution of Fuzzy Systems"
PhD thesis, University of Oxford.
Writing
Leitch developed a context dependent coding scheme for fuzzy logic controllers
in which the meaning of a gene is not determined by its absolute position but depends
on the surrounding genes.
Or, much simplere and clearer version of the same author et al. can be found
Evolution of Rules to Classify Data but NOT based on Fuzzy Logic
The paper bellow gives as an overview of "What is Data Mining" and
in the GA implementation a very simple design of chromosome is proposed.
That is each gene is made up of two continuous values like (a, b)
which means the attribute is supposed to lie betwee these two values.
Although our target paper [0] was fairly clearly written,
some descriptions are not enough at
all for real installation. For example, the authors employes
"data pruning based on information gain" just as "a statistical procedure".
The author, instead, refer to the next paper, and it's a must
to read it in order to know "what is data pruning and how?"
Data Mining & Artificial Immune System.
Not using Fuzzy Logic nor Evolutionary Computations
but only Artificial Immune System.
Data Mining & Ant Colony Optimization (ACO)
It is quite natural when we think of ACO to solve the Traveling Sales-person Problem
since real ants in nature are good at looking for a near shortest path from their nest
to a food souce. But here ACO is applied to Data Mining. Interesting topic isn't it?
We found papers on this topics bellow.
More detailed version of the paper above by the same authors.
Above two papers are also by Fereitas's group as in [2], [3].
It seems to be really a good group work, right?
The other paper concerning Data mining using Ant Colony is
Yet another application of ACO to Data Mining
An-Artificial-Immune-Model-for-Network-Intrusion-Detection
The Other Topics
This is a big topic and a different page is provided.
In case you are interested in, click the title immideately above.
Data Mining in Finance.
Though this item might be in a different way of categorization,
we add this item for a possible corraboration with parties
whose specialty is this topic.
Bibliography
[99] V. Kecman
"Learning and Soft Computing
Support Vector Machines, Neural Netorks, and Fuzzy Logic Models"
MIT press 2001