Emmanuel Dauce, Frederic Henry "HEBBIAN LEARNING IN LARGE RECURRENT NEURAL NETWORKS." ================================================================================================== "... continually running simulation: there is no possibility to re-play the trajectories. No resetting or reinitialization of the system (even if the system goes in a dead-end). on-line learning : the adaptation of the system parameters takes place immediately every time a new reinforcement signal occurs." "One can not use global gfitness functionh; (ii) as we use continuously running simulations, there is no final reward, but a series of reinforcements that punctually occur during the experiment" -------------------------------------------------------------------------------------------------- The experimental setup we have defined favours the use of direct reinforcement mechanisms (direct policy learning) [6], without any explicit or implicit model of the environment. Bartlett and Baxter, in [7], give a plausible interpretation of direct policy learning in terms of local synaptic adaptation. Their main innovation (to our knowledge) is their interpretation of the classical TD(\lambda) trace in terms of local synaptic traces which store the most recent co-activations between the pre-synaptic and the postsynaptic neurons. This trace doesn't take effect immediately on the synaptic value. When a reward occurs, the synaptic weight is modified according to the trace in a positive or negative way, depending on the reward sign. We are about to generalize the use of synaptic traces in the case of random recurrent neural networks controllers. -------------------------------------------------------------------------------------------------- Some succesful applications of that principle are given in [8]. The question of the plausibility of such traces remains, as their existence is not proven at the present time. [6] R. J. Williams (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229.256. => done [7] Peter L. Bartlett and Jonathan Baxter. (1999) Hebbian synaptic modifications in spiking neurons that learn. Technical report, Research School of Information Sciences and Engineering, Australian National University. => done [8] E. Dauce (2004) Hebbian reinforcement learning in a modular dynamic network. In Proceedings of the Eighth International Conference on Simulation of Adaptive Behavior (SABf04), pages 305.314. => done -------------------------------------------------------------------------------------------------- (3) the action module: (i) the motor command may be extracted from a mean activity (mean over several neurons activity, mean of one neuron over time...) in order to avoid hectic outputs;