----------------------------------------------------------------------------------------- Spike timing dependent plasticity (STDP) to make robot navigation intelligent ----------------------------------------------------------------------------------------- This is not a success report but a guidline of our ongoing project or a report of what we are currently planning. ----------------------------------------------------------------------------------------- To start with a benchmark to know if what we are planning will work or not. ----------------------------------------------------------------------------------------- Benchmark here will be Path Planning or Robot Navigation where aim is usually to minimize the route from start to goal, but... ----------------------------------------------------------------------------------------- Task of minimizing navigation on its own is not so difficult => A-star, Genetic Algorithm, Reinforcement Learning, Neural Network, Ant Colony Optimization, etc ----------------------------------------------------------------------------------------- Genetic Algorithm (GA) => Agent decides actions following its chromosome e.g. with chromosomes like (4 1 1 2 1 3 2 3 3 4 2 1 3 4 3 2 3 3 2 1 1 3 2 1 3 ...) ----------------------------------------------------------------------------------------- Random walk evolved to be minimized => fig-1 (178 steps) & fig-2 (48 steps) ----------------------------------------------------------------------------------------- Task was easy => fig ----------------------------------------------------------------------------------------- Let's extend it => From Minimization to Maximization => We don't care goal points but to maximize an exploration} ----------------------------------------------------------------------------------------- A Camel in a Desert The 52nd problem in the --- (in Latin) by Alcuin of York (732--804) ----------------------------------------------------------------------------------------- Can a camel maximize its exploration in a desert surviving with grains on his back? ----------------------------------------------------------------------------------------- Modern version: From Camel to Jeep in a Desert ----------------------------------------------------------------------------------------- Can a jeep maximize its exploration in a desert with a tank of fuel? ----------------------------------------------------------------------------------------- Still it's not so difficult by GA for example => fig-1 & fig-2 (300 steps) ----------------------------------------------------------------------------------------- Task was rather easier => fig => The complexity is $O(N)$ ----------------------------------------------------------------------------------------- Why not extend it further => Let's make it return to the point it started => while also maximizing an exploration} ----------------------------------------------------------------------------------------- Planet Land-rover Problem (Mars) => From A (start) to B (goal) minimizing its route => From A to A maximizing its route (This is our proposal here as a benchmark) ----------------------------------------------------------------------------------------- Can the rover return to the base with its exploration being maximum spending a limited energy given when it started? ----------------------------------------------------------------------------------------- Much more demanding => Taking GA as an example, what might be a fitness? e.g. {total length} & {how often it has crossed previous route?} (So we need another criteria like...) => with Multi Objective GA (MOGA) (I have tried some of the versions of this algorithms but so far I've not had a satisfactory result ) ----------------------------------------------------------------------------------------- A Heuristic create such a route => fig-1 => What heuristic do you guess? (I make the number of to the north and to the south equal ... ----------------------------------------------------------------------------------------- Topic of today's talk: What is intelligence? => Intelligence should be spontaneous, flexible, or unpredictable, more or less ----------------------------------------------------------------------------------------- Stolle et al. (2002) ... every day we might be cooking a different breakfast, but the kitchen layout is the same from day to day. ----------------------------------------------------------------------------------------- I beg your pardon? => Intelligent people try a different explanation for an easier understanding => while others repeat the expression, maybe with a bigger voice ----------------------------------------------------------------------------------------- What if your canary stop singing? Legendary three strategies in Japan. (1) Just wait until she sings again. (2) Try something so that she sings again. (3) Kill her if she doesn't sing any more? (This is symbolic ... the better suggestion would be ...) ----------------------------------------------------------------------------------------- To be intelligent, action should be different more or less even in an exactly identical situation ----------------------------------------------------------------------------------------- Performance is not intelligent if we use deterministic A-star, GA, NN with fixed weights? ----------------------------------------------------------------------------------------- Let's make a learning occur during behavior => Evolution of Learning ----------------------------------------------------------------------------------------- (A notation) Hebbian Learning of NN => fig (one synapse has two Ns in its both ends ,,. one is called pre-synaptic neuron and the other ...) w_{ij}(t+1) = w_{ij}(t) + \eta x_j(t)y_i(t) (x_j, y_i \in [0,1]) ----------------------------------------------------------------------------------------- Floreano's approach (2000) => Modification of $w_{ij}$ during exploration with either one of four Hebbian and Hebbian-like rules ----------------------------------------------------------------------------------------- Hebbian learning (1) \Delta w = (1-w)xy ----------------------------------------------------------------------------------------- Hebbian type learning Weaken if the post-synaptic is active, while the pre-synaptic is not (2) \Delta w = w(-1+x)y + (1-w)xy Weaken if the pre-synaptic is active, while the post-synaptic is not (3) \Delta w = wx(-1+y) + (1-w)xy ----------------------------------------------------------------------------------------- And => Strengthen if the two have similar activity and weaken otherwise. => (4) eq ----------------------------------------------------------------------------------------- (when we speak of) Evolution of leaning (it implies) Which rule and what parameter's value should be assigned to each of the synapses? => Evolution leads the combination to the optimum. ----------------------------------------------------------------------------------------- Stanley's implementation (2003) => eq (including just two parameters) Evolution is on $(\eta_1, \eta_2)$ of each synaptic weight ----------------------------------------------------------------------------------------- Their result => Starting with random weights every time anew the weights are modified step by step by the rule it learned => Every run is different depending on the initial random weights ----------------------------------------------------------------------------------------- More general learning (Durr et al. 2008)} => \Delta w = \eta(Axy+Bx+Cy+D) ----------------------------------------------------------------------------------------- An example of exploration by a recurrent NN with random weights => fig Can it learn so that path will finish at the start point? (we haven't ... because it has become somehow old topic) ----------------------------------------------------------------------------------------- Our Aim => Learning should occur during a random exploration with a spiking neural network seeking its biological plausibility} ----------------------------------------------------------------------------------------- Learning (should be) by STDP (Spiking neuron's version of Hebbian Learning) => The topic is still open! ----------------------------------------------------------------------------------------- Meunier et al. (2005) Up to now, nobody has been able to show how it is possible to learn with STDP... ----------------------------------------------------------------------------------------- Farries et al. (2007) Although synaptic plasticity is widely believed to be a major component of learning, it is unclear how STDP itself could serve as a mechanism for general purpose learning. ----------------------------------------------------------------------------------------- Why not challenge? ----------------------------------------------------------------------------------------- Architecture we are planning => fig => a.k.a. Echo State Network, Liquid State Network or Reservoir Computing ----------------------------------------------------------------------------------------- Integrate-and-Fire Model => fig u_i(t) = u_r +(u_i(t-\delta t)-u_r)\exp(-\delta t/\tau)+\sum_j w_{ij}f_j (t-\delta t) ----------------------------------------------------------------------------------------- What is STDP? => eq => fig ... where \Delta t = t_{post} - t_{pre} the smaller the diffirence the larger value the wate value depending on in sign ----------------------------------------------------------------------------------------- In short (occur) potentiation => when a pre-synaptic neuron fires shortly before a post-synaptic neuron and depression => when the pre-synaptic neuron fires shortly after. ----------------------------------------------------------------------------------------- It's too basic to be applied to a real simulation ----------------------------------------------------------------------------------------- Reward-modulated STDP Learning (Florian 2007) w_{ij} (t + \delta t) = w_{ij} (t) + \gamma r(t + \delta t) \zeta_{ij}(t) where => want skip but r(t) is reward the agent receives at time t (As you noticed he applied a reinforcement learning) ----------------------------------------------------------------------------------------- Reinforcement Learning (RL) => What an agent learns is a Policy and Policy is how to select an action in a given situation (state) maximizing total rewards it occasionary will receive from the environment ----------------------------------------------------------------------------------------- Policy indicates agent => which action should be chosen in any possible situation While in GA => Agent decides actions following its chromosome in RL => Agent decides actions following its policy ----------------------------------------------------------------------------------------- However => our world is with no obstacle, no wall, no corridor, only one goal Everywhere no reward except for one point (goal) => A needle in a Haystack i.e. (Rewards are not likely to be encountered by a random exploration.) ----------------------------------------------------------------------------------------- (for this reason) We have not succeeded yet Nevertheless we think of a further extention of the benchmark as a bigger challange => What if the rover carries container(s) for fuel? ----------------------------------------------------------------------------------------- Jeep Problem Where jeep explore is 1-D desert -> Jeep can unload its fuels anywhere in the desert. -> Fuels can be filled only at the base. -> Jeep can go back to the base $n$ times to re-fill its tank. => thus The jeep should maximize its penetration ----------------------------------------------------------------------------------------- An example of a success => fig (... jeep needs a 18 to go back... so it puts 6 there) ----------------------------------------------------------------------------------------- Let's give the rover the same condition A very tough benchmark, and => infinitely large number of solutions in 2-D => Good as a benchmark to check our intelligence of a different action in a same environment ----------------------------------------------------------------------------------------- Let me conclude (what I want to emphasize is:) To claim artificial NN to be intelligent a different action more or less should be made under identical situation. => We want to, or we are going to realize it by Spiking NN with learning by STDP also to be more biologically plausible. ----------------------------------------------------------------------------------------- SONY's AIBO => fig can excellently learn but still repeats same action in the same situation ----------------------------------------------------------------------------------------- Hoping collaborations to design an agent like real human intelligence (this concludes my talk) => Thank You!