================================================================================================== Robert Legenstein, Dejan Pecevski, Wolfgang Maass (2008) "A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback" -------------------------------------------------------------------------------------------------- Hence it also provides a possible functional explanation for trial-to-trial variability, which is characteristic for cortical networks of neurons but has no analogue in currently existing artificial computing systems. Numerous experimental studies (see [1] for a review; [2] discusses more recent in-vivo results) have shown that the efficacy of synapses changes in dependence of the time difference $\delta t = t_{post} - t_{pre}$ between the firing times $t_{pre}$ and $t_{post}$ of the pre- and postsynaptic neurons. This effect is called spike-timing-dependent plasticity (STDP). [1] Abbott L. F. and Nelson S.B. (2000) "Synaptic plasticity: taming the beast." Nat. Neurosci. 3: 1178-1183. Corresponding spike-based rules for synaptic plasticity of the form $\frac{d}{dt} w_{ji} = c_{ji}d(t)$ have been proposed in [12] and [13] where $w_{ij}$ is the weight of a synapse from neuron i to neuron j, $c_{ji}(t)$ is an eligibility trace of this synapse which collects weight changes proposed by STDP, and $d(t) = h(t) - h$ results from a neuromodulatory signal h(t) with mean $h$. It was shown in [12] that a number of interesting learning tasks in large networks of neurons can be accomplished with this simple rule in Equation 1. [12] Izhikevich E.M. (2007) "Solving the distal reward problem through linkage of STDP and dopamine signaling." Cereb. Cortex. 17: 2443--2452. or if one maximizes the likelihood of postsynaptic firing at desired firing times [16]. [13] Florian R. V. (2007) "Reinforcement learning through modulation of spike-timing dependent synaptic plasticity." Neural. Comput. 6: 1468--1502. [14] Baxter J, Bartlett PL (1999) Direct gradient-based reinforcement learning: I. Gradient estimation algorithms. Technical report. Research School of Information Sciences and Engineering, Australian National University. [15] Baras D, Meir R (2007) "Reinforcement learning, spike-time-dependent plasticity, and the bcm rule." Neural. Comput. 19: 2245.2279. [16] Pfister J. P., Toyoizumi T., Barber D.. and Gerstner W. (2006) "Optimal spike-timing dependent plasticity for precise action potential firing in supervised learning." Neural. Comput. 18: 1318.1348. ================================================================================================== [1] L. F. Abbott and Sacha B. Nelson (2000) "Synaptic plasticity: taming the beast" Recent experimental results suggest several novel mechanisms for regulating levels of activity in conjunction with Hebbian synaptic modification. We review three of them.synaptic scaling, spiketiming dependent plasticity and synaptic redistribution, and discuss their functional implications. ================================================================================================== Eugene M. Izhikevich Solving the Distal Reward Problem through Linkage of STDP and Dopamine Signaling to read Dorit Baras and Ron Meir Reinforcement Learning, Spike Time Dependent Plasticity and the BCM Rule ================================================================================================== LECTURE 5 Spike timing.dependent plasticity (STDP) In many regions of the brain, neurons are found to exhibit bidirectional plasticity in which the strength of a synapse can increase or decrease depending on the stimulus protocol [109, 110, 111, 112, 113]. Long term potentiation (LTP) is a persistent increase in synaptic efficacy produced by high-frequency stimulation of presynaptic afferents or by the pairing of low frequency presynaptic stimulation with robust postsynaptic depolarization. Long.term synaptic depression (LTD) is a long-lasting decrease in synaptic strength induced by low-frequency stimulation of presynaptic afferents. More recent experimental studies suggest that both the sign and degree of synaptic modification arising from repeated pairing of preand postsynaptic action potentials depend on their relative timing [114, 115, 116]. Long-term strengthening of synapses occurs if presynaptic action potentials precede postsynaptic firing by no more than about 50 ms. Presynaptic action potentials that follow postsynaptic spikes produce long-term weakening of synapses. The largest changes in synaptic efficacy occur when the time difference between pre- and postsynaptic action potentials is small, and there is a sharp transition from strengthening to weakening. This phenomenon of spike timing.dependent plasticity (STDP) is illustrated in figure 1. ================================================================================================== The Spike Response Model ... Wulfram Gerstner A description of neuronal activity on the level of ion channels as in the Hodgkin Huxley model leads to a set of coupled nonlinear differential equations which are diffcult to analyze. ... -> a reduction of the nonlinear spike dynamics to a threshold process. ... Spikes occur if the membrane potential $u(t)$ reaches a threshold $\theta$. The voltage response to spike input is described by the postsynaptic potential $\epsilon$. Postsynaptic potentials of several input spikes are added linearly until $u$ reaches $\theta$. The output pulse itself and the reset/refractory period which follow the pulse are described by a function $\eta$. Since $\epsilon$ and $\eta$ can be interpreted as response kernels, the resulting model is called the Spike Response Model (SRM). (1) Hodgkin-Huxley dynamics with time-dependent input can be reproduced to a high degree of accuracy by the SRM. (2) The simple integrate-and-fire neuron is a special case of the Spike Response Model On-the-complexity-of-learning-for-a-spiking-neuron. Polychronization Computation-with-spikes Evolutionary-supervision-of-a-dynamical-neural-network-allows-learning-with-on-going-weights.