BSTU-semester-project-2012

2012 Semester Project

In this Semester Project we study machine learning teqniques such as Genetic Algorithm, Reinforcement Learning, Neural Network, Ant Colony Optimization and so on, by a popular benchmark simulation in a gridworld.

I. Let's look for a lucky dog! -- A navigation of a dog looking for a fresh ham. ("Cliff Walking Problem.")
General Description
First of all you should read the paper I wrote for this semester project in this year. See A Problem Description -- Cliff Walking

Reference and Examples of What Should be Done.
Via Google, you may collect articles and/or web pages related to the topics you are interested in. For example with keywords "cliff-walk+gridworld+genetic-algorithm." Below find such papers I collected and I think those are quite good.
Reinforcement Learning (RL)
A brief explanation of Cliff-Walking-Problem can be found in Cliff Walking - 1.

Also in Cliff Walking - 2.

A long but a good tutorial on RL is Tutorial.

Genetic Algorithm (GA)
Try to google search by yourself. There are many such papers.

Ant Colony Optimization (ACO)
Interesting alternative to one dog might be a big group of ants starting from the same position and seeking for a chocolate instead of ham. Those ants falling in the cliff will die.

The most populer application of ACO is to Traveling Saleseperson Problem (TSP). It would be good to start with learning ACO via TSP. See for example TSP by ANT. If you want to know ACO further in detail again long but good tutorial is here Tutorial.

Neural Networks (NN) with Sigmoid

II. A dog's navigation in a gridworld called 'T-maze' looking for a fresh ham avoiding an old rotten ham (T-maze Problem)
General Description
First of all you should read the paper I wrote for this semester project in this year. See A Problem Description -- T-maze Walking

An Example of What Should be Done.

In T-maze, three points are specified in a T-shaped gridworld. (i) Start point at the bottom region of the gridworld where a dog start from; (ii) A food-site-1 at the left wing of the top area of the gridworld where a good food like sosages are to be found; and (iii) A food-site-2 at the right wing of the top area of the gridworld where a not good food like potatos, or sometimes bad food like very old sosages, or even poisonous food are to be found.

Then try the following experiments
Random Walk ...
Only good sosages are found at the food-site-1. No food at all at the food-site-2.

Dog moves, from one step to the next, at random to up, down, right or left.

If the movement is to cross the border of the gridworld, do not move, and wait for the next step. The same holds the followin all the experiment.

Dog is not allowed to move forever. Only allowed to a limited number of steps, e.g., 100 steps for 15x15 gridworld.

Usually the dog is not lucky enough to reach the food. So try to find how many dogs should try untill we found at least one lucky dog.

Also try to study about the above by chaging the size of gridworld from 9x9 to, say, 60x60.

Genetic Algorithm
Again only good sosages are found at the food-site-1. No food at all at the food-site-2.

Dog's movement is also to up, down, right, or left from one step to the next, but the movement is according to the dog's chromosome.

So, one cromosome is made up of four genes 1, 2, 3, and 4, each corresponding to up, down, right and left. And the number of genes is total steps to allowed. For example 100 genes for 15x15 gridworld.

Fitness is the Hamming-distance to the food when dog shold stop after moving all the steps allowed.

Neural Network
In this experiment, we have good sosages at site-1, and potatos, or very old sosages (poison) at site-2.

Neural network three sensor input which detect smell of its front-left, front, front-right. And the number of output is four and follow the winner-take-all rule, for example. Each output corresponds to up, down, right and left. Also neural net work has a hidden layer with three neurons.

In both of the good smell and bad smell, the closer to the food the stronger the semll.

Starting with 100 stupid neural networks with random 21 weights specified by its cromosome, evolve them by genetic algorithm.

Reinforcement Learning
Also we have good sosages at site-1, and potatos, or very old sosages (poison) at site-2.

Movement is also to up, down, right, or left from one step to the next, but is according to Q-table (See guidlein more in detail).

Ant's Exploration
Here, sugers are at site-1 and poisons (ant-killer) are at site-2.

An infinite number of ants repeatedly start from its nest located at the start points.

In each of the grid where ant steps in, ant put a chemical called phelomone.

A probability of movement of one ant is proportinal of total amount of phelomones so far accumulated. That is, if the amount of pheolomone of up, down, right, and left is 20, 40, 10, and 30 mg, respectively. the probability of the movement to up, down, right, and left will be 0.2, 0.4, 0.1 and 0.3. So, most likely to down. (At the beginning, there's no phelomon so equal probability at random.)

A happy ant try to go back to the nest with the same rule, but ants failed to find the food or eat poison will die.

Pheromons also evaporates at a rate, say, 1mg every 100 time-steps.

After enough time passed, you may observe all ants reach the sugare site and go back to the nest efficiently.

Reference
Not yet ready. Hopefully in a day or two.

Hope you will enjoy this project. Your feedback will be welcomed.