we will consider here networks of stochastic leaky integrate-and-fire neurons, that evolve in discrete time according to:
(4)


We may choose to model the escape noise σ as a bounded exponential function
(8)


The activations were normalized between 0 and 1. The input neurons fired Poisson spike trains, with a firing rate proportional to the activation, between 0 and 50 Hz.


In the first experiment, the worm was controlled by a network
of probabilistic integrate-and-fire neurons, for which
the reinforcement learning algorithm was derived. 

The neurons
had a time constant τi = 20ms, reset potential Vr = 10
mV, threshold potential θi = 16 mV, exponential escape
noise with τσ = 20 ms and βσ = 0.2 mV.1. The synaptic
weights evolved according to Eq. 9, were hard-bounded
within [wmin, wmax] and were initialized with random values
at the beginning of the experiment. For synapses from
input neurons, wmin = .0.1 mV, wmax = 1.5 mV; for the
other synapses, wmin = .0.4 mV, wmax = 1 mV. There
were no axonal delays. The parameters of the learning algorithm
were τz = 5ms, γ = 0.025 mV2 ・(wmax.wmin).