we will consider here networks of stochastic leaky integrate-and-fire neurons, that evolve in discrete time according to: (4) We may choose to model the escape noise as a bounded exponential function (8) The activations were normalized between 0 and 1. The input neurons fired Poisson spike trains, with a firing rate proportional to the activation, between 0 and 50 Hz. In the first experiment, the worm was controlled by a network of probabilistic integrate-and-fire neurons, for which the reinforcement learning algorithm was derived. The neurons had a time constant i = 20ms, reset potential Vr = 10 mV, threshold potential i = 16 mV, exponential escape noise with у = 20 ms and = 0.2 mV.1. The synaptic weights evolved according to Eq. 9, were hard-bounded within [wmin, wmax] and were initialized with random values at the beginning of the experiment. For synapses from input neurons, wmin = .0.1 mV, wmax = 1.5 mV; for the other synapses, wmin = .0.4 mV, wmax = 1 mV. There were no axonal delays. The parameters of the learning algorithm were z = 5ms, = 0.025 mV2 E(wmax.wmin).