To Start the Topic of Data-Mining
By AKIRA IMADA (September 2005)
What is Data-Mining?
Data is not a knowledge but just a set of data. To simply put,
Data Mining is a technique to extract a knowledge from a set of data.
Well, as a start, to have an idea of what is Data-Mining
a little more in detail, try to explore the following two documents
which I have collected by surfing Internet web site via google.
- From Web-documents in HTML format
- From Articles in PDF format
- Not totally different but a more specifiñ target --- Text-Mining
Let's give it a try!
In order to try a very elementary approach,
it might be good to exploit "C4.5 Rule" which
is well established method.
- To learn what is C4.5 Rule, explore the following Web-documents in HTML format.
Or the other option is "k-mean algorithm." To study what is k-mean algorithm,
the following is one of good basic articles.
- Then one of the good target data set might be "IRIS flower data set"
which is made up of 150 4-dimensional continuous data
or 4 diffrent families of IRIS flower.
- Now, as examples,
let's see how data-mining technique is applied to IRIS data set or
Wisconsin Brest Cancer data set
A Taxonomy by Targets:
(Now under construction though, you might explore some of the followings.)
- Apprication to Financial Phenomena.
- General financial problems
- Stock Market
- Apprication to Medical Database.
- Exploration of Wisconsin Breast Cancer Data-set.
- Basic Idea for the Application
- A Neural Network Approach.
- A Genetic Algorithm Approach.
- A Fuzzy Rule Approach.
- An Artificial Immune System Approach.
- A Genetic Programming Apploach.
- An Ant Colony Optimization Apploach.
- An Agent-oriented Programming Apploach.
- Other Approaches
- A Data-mining from Thyloyd Data-set.
Approach to Seismic Data base --- To predict an Earthquake.
Approach to Network Intrusion Detection.
(Not prepared yet.)
A Taxonomy by Methods:
(Now under construction though, you might explore some of the followings.)
- Ant Colony Optimization (ACS).
- If you would like to start ACO from scratch, the following
tutorial paper will be really helpful.
- Although the purpose of this page is
"What we start the Data-Mining research with?"
let's try really direct application of ACS to the Traveling Sales-person
Problem (TSP)."
- It would be good to start with the following paper to learn
how we apply ACO to TSP.
- Here we have an example of modest size --- 52 cities in Berlin
- If you are more challanging here we have much larger
example --- 13,509 cities in USA.
(Above two data are from http://elib.zib.de/pub/Packages/mp-testdata/tsp/tsplib/tsplib.html)
- Or, why don't you try cities in Belarus. Here you have map.
You might pick up fair amount of cities and give an x-y-axis on
the map. Then specify the (x,y) coordinate by mesuring them by yourself.
- Then Topics of Data-Mining:
- Very Simple Way of Ant Colony Optimization to a Data-mining Problem
- A Little More Sophisticated Imprimentation.
- A Little More Practical Imprimentation.
- Ant Colony Optimization + Fuzzy Logic.
- Applications to a Real World Problems
Analysis on Fitness Landscape:
- Is a Needle in a Haystack Problem not Difficult?
- The Other Analysis of Fitness Landscape
- Specific target with specific approach, but...