CHAPTER

Decorrelation Learning in the Cerebellum: Computational Analysis and Experimental Questions

7

Paul Dean1, John Porrill Department of Psychology, Sheffield University, Sheffield, United Kingdom Corresponding author: Tel.: þ44-114-222-6521; Fax: þ44-114-276-6515, e-mail address: [email protected]

1

Abstract Many cerebellar models use a form of synaptic plasticity that implements decorrelation learning. Parallel fibers carrying signals positively correlated with climbing-fiber input have their synapses weakened (long-term depression), whereas those carrying signals negatively correlated with climbing input have their synapses strengthened (long-term potentiation). Learning therefore ceases when all parallel-fiber signals have been decorrelated from climbing-fiber input. This is a computationally powerful rule for supervised learning and can be cast in a spike-timing dependent plasticity form for comparison with experimental evidence. Decorrelation learning is particularly well suited to sensory prediction, for example, in the reafference problem where external sensory signals are interfered with by reafferent signals from the organism’s own movements, and the required circuit appears similar to the one found to mediate classical eye blink conditioning. However, for certain stimuli, avoidance is a much better option than simple prediction, and decorrelation learning can also be used to acquire appropriate avoidance movements. One example of a stimulus to be avoided is retinal slip that degrades visual processing, and decorrelation learning appears to play a role in the vestibulo-ocular reflex that stabilizes gaze in the face of unpredicted head movements. Decorrelation learning is thus suitable for both sensory prediction and motor control. It may also be well suited for generic spatial and temporal coordination, because of its ability to remove the unwanted side effects of movement. Finally, because it can be used with any kind of time-varying signal, the cerebellum could play a role in cognitive processing.

Keywords Cerebellum, eye blink conditioning, vestibulo-ocular reflex, spike-timing dependent plasticity, avoidance learning, long-term depression, long-term potentiation, supervised learning, reafference, least mean squares

Progress in Brain Research, Volume 210, ISSN 0079-6123, http://dx.doi.org/10.1016/B978-0-444-63356-9.00007-8 © 2014 Elsevier B.V. All rights reserved.

157

158

CHAPTER 7 Decorrelation Learning and the Cerebellum

1 INTRODUCTION A central feature of the models published soon after the seminal description of the anatomy and physiology of the cerebellum (Eccles et al., 1967) is that they can learn (Albus, 1971; Marr, 1969). Whenever cerebellar output (modeled as Purkinje cell simple-spike firing, Fig. 1A) is in error, a climbing-fiber signal produces changes in the efficacy of synapses between parallel fibers and Purkinje cells, which in turn alter Purkinje cell firing. Eventually, according to the models, cerebellar output reaches the desired values. This procedure only works if the synaptic adjustment does in fact reduce output error. A learning rule that achieves this goal can be derived analytically (Appendix, Fig. 1B), and takes the form: dw i ¼ bheðt Þpi ðt Þi

(7.1)

where dwi denotes the change in weight (i.e., efficacy) of the synapse carrying the i-th parallel-fiber signal, e(t) the climbing-fiber or error signal, pi(t) the signal on the i-th parallel fiber (both functions of time), b is the learning rate and the angled brackets denote expected or mean values. The error signal here is the difference between the actual and desired output (actual and desired Purkinje cell firing). The rule means that if the signal on a particular parallel fiber is positively correlated with the error, the efficacy of the corresponding synapse is reduced; conversely, if the correlation is negative, efficacy is increased. It appears that correlation is being taken to imply cause, which in this case it does provided cerebellar output can in fact influence the error signal appropriately. If so, the rule reduces weights on those parallel-fiber signals that increase the error, while increasing those that reduce it (Dean et al., 2002). Albus (1971) used an informal version of this rule in which “[t]he amount of weakening of each synapse is proportional to how strongly that synapses is exciting the Purkinje cell at the time of the error signal” (pp. 44–45). It seems to have been first applied formally by Fujita (1982, equation 24), and versions of it are currently used in almost all cerebellar models that seek to simulate the role of the cerebellum in behavioral tasks (references in Dean et al., 2010). The fact that the rule is widely used in modeling makes it all the more important to find out whether it is correct, and the next section briefly describes some of the questions raised by current experimental evidence. Subsequent sections consider what the computational consequences of this learning rule would be for cerebellar function, assuming that it is in fact correct. The learning rule shown in Eq. (7.1) has a number of names (Appendix). Here, the actual process of learning is referred to as decorrelation learning, since this draws attention to an important feature of the rule, which is that learning only stops when there is no longer a correlation between any parallel-fiber signal and the climbingfiber signal. Thinking of cerebellar learning in decorrelation terms can help intuitions about cerebellar function in complex circuits. Finally we note that, although the decorrelation formulation presented here is very powerful, this does not rule out other complementary modes of operation (discussed further in Dean and Porrill, 2011).

1 Introduction

FIGURE 1 A schematic diagram of basic features of the cerebellar cortical microcircuit. Mossy-fiber inputs y(t) are distributed over many granule cells, whose axons bifurcate to produce parallel fibers carrying signals pi(t). These fibers form synapses with weights wi on Purkinje cells. Each Purkinje cells also receives a climbing-fiber signal e(t), which in Marr–Albus models is assumed to alter the weights wi. In these models, Purkinje cell output z(t) is assumed to be simple-spike firing, with the effects of complex spikes produced by climbing-fiber input usually neglected. Not shown in the diagram: (i) Purkinje cell output is inhibitory and acts via neurons in the deep cerebellar nuclei (and vestibular nuclei); (ii) granule cell axons also form synapses on molecular-layer interneurons (stellate and basket cells), which in turn form inhibitory synapses on Purkinje cells. In this way, granule cells influence Purkinje cells via both an excitatory direct and an inhibitory indirect pathway. (B) Systems-level equivalent of circuit shown in A, which corresponds to an adaptive (analysis–synthesis) filter. Processing of mossy-fiber input y(t) by the granule cell layer is interpreted as analysis by a bank of causal filters Gi so that the parallel fibers carry signals which form an expansion recoding, pi ¼ Gi[y], P of the mossy-fiber input. Purkinje cell output is modeled as a weighted sum z(t) ¼ wipi(t) of P its parallel-fiber inputs so the Purkinje cell implements a linear-in-weights filter C ¼ wiGi. The climbing-fiber input is interpreted as a training signal e(t) that adapts synaptic weights wi using the decorrelation learning rule derived in the Appendix. Panels (A) adapted from fig. 1A of Porrill et al. (2004) and (B) from fig. 1B of Porrill et al. (2004).

159

160

CHAPTER 7 Decorrelation Learning and the Cerebellum

2 IMPLEMENTATION OF LEARNING RULE The derivation of the learning rule in Eq. (7.1) assumes that neuronal firing rates could be treated as continuous variables (Appendix). But in order to relate the rule to experimental evidence, it has to be translated into a form suitable for signals that are carried by neuronal spike trains, that is in a spike-timing dependent plasticity (STDP) form. The next section briefly describes one way in which the translation can be carried out (more detailed discussion of the theoretical issues involved is given in Menzies et al., 2010).

2.1 STDP Version When both parallel-fiber and climbing-fiber signals have higher than normal values in a small time range, Eq. (7.1) requires that the weight decreases; this implies that near simultaneous climbing-fiber and parallel-fiber spikes should reduce (depress) the efficacy of the synapse between the parallel fiber and the Purkinje cell. However, it is clear that this process on its own would lead to eventual silencing of all synapses, so it must be balanced by some process that increases (potentiates) synaptic efficacy. This combination of depression and potentiation can be summarized in the spiketiming dependent plasticity (STDP) profile illustrated in Fig. 2B, in which the strong depression for nearly coincident spikes is surrounded by much weaker potentiating side lobes for parallel-fiber spikes that are not coincident with a climbing-fiber spike. It can be demonstrated that this STDP profile implements the decorrelation learning rule for suitable rate coding schemes (cf. Menzies et al., 2010). Since parallel-fiber and climbing-fiber signals at their tonic rates carry no signal information, these rates in combination should not produce learning. To achieve this, the total amounts of depression and potentiation under the STDP profile must be exactly balanced. A remarkable consequence of such a balance can be deduced by considering climbing-fiber spikes that are “missing” from the tonic background. Parallel-fiber spikes that are close in time to these missing spikes fail to produce the expected depression; instead these climbing-fiber spike “holes” effectively drive an equal and opposite potentiation. Hence, despite its seeming emphasis on depression, the spiking learning rule is completely symmetric in practice: positive signal correlations produce depression while negative signal correlations produce equal and opposite potentiation. This symmetry is very important, for example, it supplies a mechanism via which previously silent synapses can be recruited to perform a task (see below). Although Fig. 2B captures the basic shape of the STDP learning rule, it is misleading in that it assumes the error signal carried by the climbing fibers is instantaneously available for learning at the relevant synapses. However, for some regions of the cerebellum the error signal can be subject to large transmission delays (e.g., 50–100 ms for visual processing), which means that the parallel-fiber and climbingfiber signals will not match exactly in time. This mismatch can be shown to lead to unstable learning at high frequencies (Porrill and Dean, 2007a). Since the

2 Implementation of Learning Rule

FIGURE 2 Spike-timing dependent plasticity (STDP) implementation of decorrelation learning. (A) Incremental changes in the weights of synapses between parallel fibers and Purkinje cells depend on the relative timing T of parallel-fiber and climbing-fiber action potentials. Positive T is chosen to represent climbing-fiber spikes arriving after parallel-fiber spikes, so that the parallel-fiber contribution to Purkinje cell input could have causally affected the component of the teaching signal carried by that climbing-fiber spike. (B) The decorrelation learning rule can be induced by the spike-timing dependent plasticity profile shown in this plot. Nearly coincident parallel-fiber and climbing-fiber spikes produce long-term depression (LTD), whereas widely separated parallel-fiber and climbing-fiber spikes produce a much smaller amount of long-term potentiation (LTP). The total amount of LTD and LTP must balance in order to produce the symmetric behavior required by the learning rule. To obtain an exact correlational rule, the dip must be infinitesimally wide and infinitely high (a delta function) and the surround LTP lobes must be infinitely wide and infinitesimally high. The more realistic smooth profile shown here will restrict learning performance by causing it to fall off at very high and very low frequencies (further details in Menzies et al., 2010). (Continued)

161

162

CHAPTER 7 Decorrelation Learning and the Cerebellum

climbing-fiber signal cannot be advanced, unstable learning has to be prevented by delaying the effect of the parallel-fiber signal. A filter which affects such a delay is called an eligibility trace (Fig. 2C), and it can be incorporated into the STDP profile by situating the depression dip at an inter-spike time corresponding to the transmission delay (Fig. 2D). One further implementation issue is the seeming limitation imposed by the low spike rate of climbing fibers. How can such a firing rate of around 1 Hz carry useful information about movement errors with frequencies up to 5 Hz and above? The answer is that the error signal is used for learning not instantaneous feedback, so that weight changes can be averaged over many training events. Hence, over the relevant training period, we only require that the expected value on the right hand side of the learning rule is properly calculated. The theoretical constraint is that, over multiple trials, climbing-fiber activity should provide an unbiased estimate of the teaching signal at the relevant frequencies. It is clear that an “event-triggered average” of climbing-fiber activity over many trials can contain frequencies higher than 1 Hz and in practice cerebellar learning rates are slow enough to provide substantial effective high-frequency content in the teaching signal.

2.2 Experimental Questions Initial debates about the decorrelation learning rule focused on whether the cerebellum used it in any form (references in, e.g., Jacobson et al., 2008; Llina´s et al., 2004). More recently attention has shifted to understanding what particular form of the rule might be implemented by the cerebellar microcircuit. The STDP profile shown in Fig. 2D suggests that the most prominent form of plasticity at synapses between parallel fibers and Purkinje cells would be depression of synaptic efficacy, produced by conjunctive stimulation of parallel and climbing fibers. This phenomenon, now known as cerebellar long-term-depression or LTD, was demonstrated by Ito et al. (1982) and has since been the subject of extensive experimental investigation (for reviews see, e.g., Ito, 2001, 2012). The less prominent increase in synaptic efficacy (long-term potentiation, LTP) produced by stimulation of parallel fibers alone (Fig. 2B and D) has been described in postsynaptic FIGURE 2—Cont’d (C) The symmetric STDP profile in B does not respect causality as LTD can be produced by parallel-fiber spikes arriving after climbing-fiber spikes. The expected delay between climbing-fiber spike and the parallel-fiber spike which could be responsible (about 100 ms when the teaching signal is, e.g., retinal slip) can be incorporated into the learning rule by incorporating an eligibility trace induced by the parallel-fiber spike which describes the eligibility for learning of subsequent climbing-fiber spikes. Here, this is chosen to be causal and peak at the expected delay. (D) The eligibility trace can be combined with the STDP profile in B to produce a causal STDP profile tuned to the expected delay in the teaching signal. The LTD dip produces maximum learning at the expected delay, and the broad dip limits high-frequency learning, reducing instabilities caused by any inaccuracy in the estimated delay.

2 Implementation of Learning Rule

form by Lev-Ram et al. (2002), and shown to be capable of reversing the effects of LTD (Coesmans et al., 2004; Lev-Ram et al., 2003). Functional consequences of this bidirectional plasticity have been reviewed by Jo¨rntell and Hansel (2006), including its ability to explain how parallel-fiber stimulation without conjunctive climbingfiber stimulation can enormously increase the tactile receptive fields of Purkinje cells in the C3 zone of the cerebellum in vivo (Ekerot and Jorntell, 2003; Jo¨rntell and Ekerot, 2002, 2011). In general terms, it seems that the bidirectional plasticity observed at synapses between parallel fibers and Purkinje cells fits reasonably well with the STDP profile (Fig. 2D) derived from the decorrelation learning rule (Dean et al., 2010). There are, however, a number of findings that show that the detailed mechanisms underlying this fit are not well understood. For example, LTD at synapses between parallel fibers and Purkinje cells is not observed in mice with mutations that specifically prevent the internalization of AMPA receptors at those synapses (Schonewille et al., 2011). Yet these mice do not show classical cerebellar learning impairments on tasks such as eye blink conditioning or adaptation of the vestibular-ocular reflex. Less dramatic but still of concern is the discrepancy between in vitro studies of LTD which typically find some learning when parallel-fiber and climbing-fiber inputs arrive simultaneously (although learning may be faster when the climbing-fiber signal is delayed by 100 ms, cf. Fig. 2D), and in vivo studies of eye blink conditioning that find no learning for simultaneously presented conditioned and unconditioned stimuli (reviewed by Hesslow et al., 2013). One explanation for these discrepancies is that because synapses between parallel fibers and Purkinje cells are extremely complex and can display numerous forms of both LTD and LTP in vitro, at present the actual form used in vivo has yet to be identified. Alternatively, differences between in vitro and in vivo conditions may produce substantial differences in the properties of the same LTD process (both arguments developed further by Ito, 2012). Moreover, it is possible that in vivo new learning often starts with LTP, rather than LTD (Porrill and Dean, 2008). One computational feature of the decorrelation learning rule is that synapses for parallel fibers carrying noise or signals unrelated to climbing-fiber signals will eventually be driven to zero, and experimental evidence indicates that a high proportion (up to 98%) of parallel-fiber synapses are indeed silent (Isope and Barbour, 2002; Jo¨rntell and Ekerot, 2002; Wang et al., 2000). Since the only way that silent synapses can take part in new learning is via LTP, it is possible that the properties of in vitro LTP rather than LTD will be relevant to in vivo learning. In fact, a further implication of relying on LTP is that there needs to be plasticity in the inhibitory pathway from granule cells to Purkinje cells via molecular-layer interneurons (stellate and basket cells), otherwise tasks such as eye blink conditioning that require a reduction in Purkinje cell firing could not be learnt (Dean et al., 2010; Ekerot and Jo¨rntell, 2003; Porrill and Dean, 2008). Such plasticity, predicted by Albus (1971), has been described in vivo (e.g., Jirenhed et al., 2013; Jo¨rntell and Ekerot, 2003, 2011; Jo¨rntell et al., 2010), with general properties consistent with the decorrelation rule, that is, in the inhibitory pathway

163

164

CHAPTER 7 Decorrelation Learning and the Cerebellum

parallel-fiber stimulation on its own gives LTD, but LTP in conjunction with climbing-fiber stimulation, the converse of the STDP required by the excitatory pathway (Dean et al., 2010). Present evidence therefore suggests that bidirectional plasticity of the general form shown in Fig. 2 could underlie in vivo electrophysiological results for synapses between parallel fibers and Purkinje cells, and in converse form could underlie results for synapses between parallel fibers and molecular-layer interneurons. However, important issues remain to be resolved, in particular, the relative roles of LTD and LTP in different cerebellar learning tasks and the precise mechanisms that mediate LTP and LTD in each of the two pathways between granule and Purkinje cells (references in, e.g., Andreescu et al., 2011; Belmeguenai et al., 2010; Boyden et al., 2006; Prestori et al., 2013; Schonewille et al., 2010). Moreover, the efficacy of the proposed learning rule will be influenced by the effects of plasticity at additional sites in cerebellar cortex, vestibular nuclei, and possibly the deep cerebellar nuclei (e.g., D’Angelo and De Zeeuw, 2009; De Zeeuw and Yeo, 2005; Gao et al., 2012; Hesslow et al., 2013; Porrill and Dean, 2007a), though these effects will not be considered further here.

3 PROPERTIES OF LEARNING RULE The general properties of the decorrelation learning rule have been extensively studied in artificial devices, one of which—a form of adaptive filter (Fig. 1B)—has a structure similar to that of the simplified microcircuit shown in Fig. 1A (Fujita, 1982). These studies have first of all demonstrated the computational power of the rule for supervised learning, that is, learning that uses a teaching or error signal (e.g., Widrow and Stearns, 1985) This power derives from the rule’s mathematical foundations (Appendix), which ensure that it guaranteed to converge to the optimal solution given an appropriate error signal (e.g., see Widrow and Stearns, 1985). Secondly, devices that use the rule, including adaptive filters, have been employed in a wide variety of circuits for many different signal-processing and motor-control tasks (e.g., Widrow and Stearns, 1985). This is clearly appropriate for the cerebellum, where a relatively uniform microcircuit is functionally divided into many microcomplexes, each with their unique set of external connections (e.g., Apps and Hawkes, 2009; Ito, 1970, 1997; Porrill et al., 2013). Thirdly, the learning rule is highly suitable for what are generally considered to be typical “cerebellar” tasks. Historically, on the basis of available anatomical, clinical, and lesion evidence, the cerebellum was associated with motor control (e.g., Dow and Moruzzi, 1958; Ito, 1984). For example, clinical observations indicated that cerebellar damage led to inaccurate and uncoordinated movements, with little apparent effect on standard tests of sensory processing (Glickstein et al., 2009). Subsequent investigations however indicated that the cerebellum was involved in active sensing, where the acquisition of sensory information was dependent on the organism’s own activities (e.g., Bower, 1997; Bower and Parsons, 2003). A central feature

4 Sensory Prediction

of active sensing was identified as the ability to predict the sensory consequences of movement (e.g., Bastian, 2011; Imamizu, 2010; Ito, 2012; Medina, 2011; Wolpert et al., 1998). In the next sections, therefore we examine more specifically how the decorrelation learning rule could be used by the cerebellum for both sensory prediction and movement control.

4 SENSORY PREDICTION The predicted sensory consequences of movements can be used to help solve a variety of different sensorimotor problems. One of these is the reafference problem, which occurs when the sensory signals produced by the organism’s own movement interfere with sensory signals coming from the outside world. This problem has long been recognized (for review see, e.g., Cullen, 2004), and is sometimes referred to as the reafference problem because it requires distinguishing between “reafferent” (internally generated) and “exafferent” (externally generated) signals. We focus on the reafference problem as an example of sensory prediction because it is well studied in a biological context, with reasonable evidence of cerebellar involvement.

4.1 The Reafference Problem 4.1.1 Computational Analysis Use of decorrelation learning in artificial systems (Widrow and Stearns, 1985; Widrow et al., 1975) suggests a circuit for predicting the sensory consequences of movement which is effective, and requires only signals that are biologically available (Fig. 3A). The signals are (i) an efference copy of the motor command r(t), and (ii) input from the relevant sensors s(t) þ n(t), which is a mixture of reafferent n(t) and exafferent s(t) signals. The key to the circuit is an adaptive element (here corresponding to a cerebellar microcomplex) that takes the efferent copy as its mossy-fiber input, and the difference between its output nest(t) and the observed sensory signal s(t) þ n(t) as its teaching signal. The central idea is that this signal will train the microcomplex to produce an accurate estimate nest(t) of the sensory effects n(t) generated by motor commands, even though the dynamical links between the commands and the sensory consequences (which include neural circuits, muscle and other tissue properties, and sensor characteristics) are unknown. Since the actual sensory input to the system is the sum s(t) þ n(t), where s(t) is the externally generated sensory signal, subtraction from it of the cerebellar output gives an estimate of s(t) of the form sest ðt Þ ¼ sðt Þ þ n ðt Þ  n est ðt Þ

(7.2)

The critical computational point here is that when nest(t) is incorrect, the sensory estimate sest(t) will still be contaminated by the effects of motor commands, and this residual contamination will reveal itself as a correlation between the sensory estimate and the efference copy. By decorrelating these two quantities, the learning rule therefore produces an accurate estimate of external sensory input. The decorrelation

165

166

CHAPTER 7 Decorrelation Learning and the Cerebellum

FIGURE 3 (A) Generic adaptive architecture for reafference problem. The task is to cancel the “reafferent” noise n(t) produced by the system’s own movements that additively corrupts external (exafferent) signals of interest s(t). Motor commands r(t) produce the noise n(t) acting via an unknown dynamic process (motor plant plus sensory dynamics). An efference copy of the commands is sent as input to an adaptive element, which learns to produce an estimate nest(t) of the noise. This estimate is used to cancel the noise, resulting in a prediction sest(t) of the exafferent signal s(t), which also acts as teaching signal. The adaptive element learns to decorrelate r(t) from the teaching signal sest(t). When learning is complete, the adaptive element transforms its input r(t) into nest(t) exactly as the motor plant and sensor dynamics do, so that nest(t) ¼ n(t). This means that sest(t) ¼ s(t), so the output of the system corresponds to the uncorrupted exafferent signal, and also ceases to be correlated with r(t)— that is, decorrelation learning has taken place. (B) Specific adaptive architecture for detecting novel whisker contacts. In this case, the motor commands r(t) moves the rat’s whiskers back and forth (whisking), a movement that affects whisker sensors so producing a reafferent signal n(t). An efference copy of the whisking commands is sent as mossy-fiber input to cerebellar zone A2, whose output via the dorsolateral protuberance (DLP, a part of the deep cerebellar nuclei) acts as an estimate nest(t) of the reafferent signal. This output is sent to the superior colliculus, where it is subtracted from the observed whisker signal s(t) þ n(t). Collicular output is thus an estimate sest(t) ¼ s(t) þ n(t)  nest(t) of external whisker contacts, which is used both to drive orienting movements of the head (not shown), and as an teaching signal to zone A2 via climbing fibers from the caudal medial accessory olive (cMAO). When learning is complete nest(t) ¼ n(t) and so sest(t) ¼ s(t). Panels (A) adapted from fig. 1 of Anderson et al. (2012) and (B) redrawn from fig. 8 of Anderson et al. (2012).

4 Sensory Prediction

learning rule is thus exquisitely suited to the reafference problem. If the effects of the motor command are substantially delayed by the dynamics of the plant, then nest(t) is a predictive estimate. It is thought the cerebellum can predict over an approximately subsecond range. The potential of the circuit in Fig. 3A for improving sensory processing has been demonstrated in a biomimetic robot that uses artificial whiskers to explore its environment (Anderson et al., 2010, 2012). The original circuit from which Fig. 3A is derived was designed for canceling any form of noise (also termed interference) from a signal of interest (Widrow and Stearns, 1985, fig. 12.1). Thus the input to the system does not have to be a motor command, but could be for example another sensory signal—hence, the neutral term “reference input” in Widrow and Stearns (1985) and the corresponding notation r(t). The circuit in Fig. 3A also illustrates an important theoretical point about the nature of the “error” signal, since in this case it is a copy of the desired system output, which is an estimate of the “real” (exafferent) sensory signal. Unlike a conventional error signal, this would not be expected to decay to zero. Instead, it ceases to change when none of the input signals to the adaptive element are correlated with system output. Thus, decorrelation learning will have ceased even though there is a clear “error” signal. Finally, the adaptive element in the circuit is sometimes said to be learning a “forward model.” Although thinking in terms of forward and inverse internal models can be helpful, it is also sometimes a source of confusion (as discussed by, e.g., Medina, 2011; Porrill et al., 2013) and we have chosen not to adopt it here.

4.1.2 Experimental Questions Early studies suggesting a role for the cerebellum in reafference used imaging data from human subjects (e.g., Blakemore et al., 2000, 2001), and were not primarily concerned with the details of the underlying circuitry. Here, however, we focus on the experimental question of identifying neural equivalents of the structures shown in Fig. 3A. This question has two parts, one concerning which regions of the cerebellum are involved, and the second concerning how those regions connect to other parts of the brain—particularly, in the case of the reafference circuit, the part of the brain that implements the comparator. The decorrelation rule only works if cerebellar output influences climbing-fiber signals appropriately. In the circuit of Fig. 3A, this influence occurs inside the system, in the comparator. Some answers to these questions have been proposed for rat whisker movements, and primate head movements. In addition, we argue that studies of eye blink classical conditioning can be viewed in the context of the reafference circuit shown in Fig. 3A. In the case of rat whisking (Fig. 3B, modified from Anderson et al., 2012), the proposal is that a specific region of cerebellar cortex, zone A2 (Fig. 4), receives an efference copy of whisking commands r(t) as mossy-fiber input, and sends its output nest(t) (via a part of the deep cerebellar nuclei termed the dorsolateral protuberance) to the superior colliculus. This output is compared with whisker sensory input s(t) þ n(t) in the superior colliculus, and the result sest(t) sent both to the inferior olive (caudal medial accessory olive) as teaching signal, and to the motor system to produce orienting movements to novel whisker contacts. The implication is that zone A2

167

168

CHAPTER 7 Decorrelation Learning and the Cerebellum

FIGURE 4 Schematic diagram of flattened cerebellar cortex in rat taken from fig. 11 of Voogd (2011). The left hand side shows the cerebellar lobules I–X (I–V anterior lobe, VI–X posterior lobe) labeled in the vermis, together with the names given to their hemispheric extensions HI–HX lateral to the vermis (details of nomenclature given in, e.g., Glickstein and Voogd, 1995; Glickstein et al., 2011). The lobules run approximately medio-laterally. The right hand side shows the organization of cerebellar zones A to D2, strips of cortex that (again approximately) run rostro-caudally. A given zone receives climbing-fiber input from a particular region of the inferior olive and projects solely to a particular region of the deep cerebellar nuclei. Moreover, these two regions are themselves interconnected, forming what appears to be a tightly organized functional subregion. Diagrams of this kind allow particular areas of cerebellar cortex to be conveniently located by their zone and lobule. For more detailed diagrams of cerebellar zones, including their relation to Purkinje cell zebrin staining, see for example, Sugihara and Shinoda (2004), Voogd and Ruigrok (2004), Apps and Hawkes (2009), and Marzban and Hawkes (2011).

learns to predict the sensory effects of whisking n(t), and thus improve the detection of exafferent vibrissal signals s(t). However, although the basic connectivity shown in Fig. 3B is consistent with experimental evidence, and the role of the superior colliculus in orienting to unexpected whisker deflection is well established (references in Anderson et al., 2012), as yet the nature of the signals conveyed by Purkinje cells in zone A2 or neurons in the dorsolateral protuberance is unknown. In the case of primate head (and body) movements, neurons in the rostral fastigial nucleus appear to code sest(t) explicitly, where s(t) itself corresponds to exafferent

4 Sensory Prediction

vestibular and proprioceptive signals related to movement of the head and body (Brooks and Cullen, 2013). This finding directly supports a role for the cerebellum in the reafference problem, and points to a number of important questions concerning the underlying circuitry. For example, the area of the vermis that projects to these neurons, and the nature of its climbing-fiber input, do not appear to have been identified. If it is the case that the comparator in this particular circuit is the rostral fastigial nucleus itself (Brooks and Cullen, 2013), then Purkinje cells in the corresponding area of vermis should signal predicted reafference nest(t) (cf. Fig. 3A). In addition, it is unclear how the signal from the rostral fastigial nucleus is used by the circuit for suppressing the vestibular-ocular reflex described previously (e.g., Roy and Cullen, 2004). One problem here is the sophistication available for processing head movements, since it appears the system can distinguish between voluntary head movements that are carried out and those that are mechanically prevented, by comparing predicted and actual signals from neck proprioceptors (Brooks and Cullen, 2013; Roy and Cullen, 2004). Such a comparison adds yet another layer of complexity to the underlying circuit. More thoroughly studied than either of the above two neural circuits is the one that mediates delay eye blink conditioning (Fig. 5). Here, the relevant area of cerebellar cortex is located primarily in zone C3 (Fig. 4), confined to the hemispheric part

FIGURE 5 Specific adaptive architecture for sensory prediction in classical conditioning of the eye blink. The conditional stimulus r(t) in effect acts to produce a painful unconditional stimulus n(t) to the eye or surrounding tissue after an unknown delay. A copy of r(t) is sent as mossy-fiber input to zone C3 (and possibly zone D0, Mostofi et al., 2010) of lobule HVI. The output of this eye blink region acts via the anterior interpositus nucleus AIP as a prediction nest(t) of the unconditional stimulus. This prediction is sent by the nucleo-olivary pathway to part of the dorsal accessory olive where it is subtracted from the sensory signal provided by part of the trigeminal nucleus. The output of the olive is thus an estimate sest(t) of any unpredicted painful stimulus s(t) to the eye or surrounding tissue and is used as a teaching signal sent via climbing fibers to the eye blink region of cerebellar cortex. Learning in the eye blink region proceeds until nest(t) ¼ n(t) so there is no longer any correlation between r(t) and sest(t). In typical classical conditioning experiments, s(t) is set to zero, so learning proceeds until the teaching signal sest(t) is also zero.

169

170

CHAPTER 7 Decorrelation Learning and the Cerebellum

of lobule VI (Mostofi et al., 2010, for rabbit). In eye blink conditioning, a conditioned stimulus (often a tone) is presented shortly before an unconditioned stimulus (e.g., mild shock to the skin round the eye) that elicits an eye blink, the unconditioned response. After a number of such pairings, the tone itself elicits a blink (conditioned response) whose peak amplitude occurs approximately at the same time as the unconditioned response is delivered. Very extensive experimental investigation (reviewed in, e.g., De Zeeuw and Yeo, 2005; Hesslow and Yeo, 2002; Thompson and Steinmetz, 2009) has revealed that mossy-fiber input to the eye blink zone conveys information about the conditioned stimulus, climbing-fiber input conveys information about the unconditioned stimulus, and the output of the eye blink zone is related to the conditioned response. Although eye blink conditioning is often treated as an example of associative learning, here we consider it in the framework of sensory prediction and the reafference problem. We do so by comparing three features of the circuits shown in Figs. 3A and 5. First, as far as the adaptive element in Fig. 3A is concerned, the critical computational feature of its input is that it predicts future sensory signals. The circuit remains the same whether this predictive input is an efferent copy (Fig. 3A) or another sensory stimulus (Fig. 5). The predictive task is actually simpler in the case of eye blink conditioning than in usual reafference problems, since the dynamics linking conditioned and unconditioned stimuli are a simple delay between stimulus onsets. In fact, reafference problems in general can be solved by the most suitable mixture of efference copy and relevant sensory signals (Cullen, 2004). Thus, as far as sensory prediction is concerned, the difference between the input signals of Figs. 3A and 5 is not fundamentally important, and reflects the generic character of the original noise-cancelation circuit (Widrow and Stearns, 1985; Widrow et al., 1975). Secondly, although cerebellar output in eye blink conditioning is usually considered in terms of motor commands, it is well known to have a sensory predictive component. For example, in classical conditioning the unconditioned stimulus is delivered regardless of the organism’s response, yet the climbing-fiber signal to cerebellar cortex, which in eye blink conditioning is driven by inescapable periorbital shock, nonetheless diminishes as acquisition proceeds (e.g., Hesslow and Ivarsson, 1996; Rasmussen et al., 2008; Sears and Steinmetz, 1991). The climbing-fiber signal appears to be predicted shock, and models of eye blink conditioning have therefore typically used a comparator in which cerebellar output is compared with the unconditioned stimulus signal, just as in Fig. 5 (e.g., Grossberg and Schmajuk, 1989; Lepora et al., 2010; Medina et al., 2000b; Moore et al., 1989). The training signal then becomes not s(t) þ n(t) but s(t) þ n(t)  nest(t), and since the chances of a second painful stimulus occurring at the same time as the one administered experimentally are low, it can be treated simply as n(t)  nest(t). With this signal, acquisition ceases when nest(t) equals n(t), that is, it does not continue indefinitely as the classical conditioning paradigm suggests it might. Also, as soon as n(t) is omitted after acquisition, a signal for extinction is available. And once acquisition is complete, addition of

4 Sensory Prediction

a second predictive signal (e.g., a light) to the first should not result in further learning for that signal, since the training signal is effectively zero. This phenomenon, known as blocking, has been observed experimentally (references in Kim et al., 1998). In general, comparator models mimic the performance of trial-level models of conditioning such as that of Rescorla and Wagner (1972), which explicitly use the unpredicted unconditioned stimulus as a teaching signal, exactly as in Fig. 5 (Lepora et al., 2010). Thirdly, a number of studies have suggested that in the specific case of eye blink conditioning an excellent candidate for the comparator is the inferior olive itself (e.g., Andersson et al., 1988; Bengtsson and Hesslow, 2006; Bengtsson et al., 2007; Jirenhed et al., 2007; Kim et al., 1998; Medina et al., 2002; Nicholson and Freeman, 2003; Rasmussen et al., 2008; Sears and Steinmetz, 1991). A subset of neurons in the relevant part of the deep cerebellar nuclei (the anterior interpositus nucleus) send inhibitory projections to the relevant region of the inferior olive (dorsal accessory olive), and these nucleo-olivary neurons would provide a natural substrate for the cerebellar signal nest(t) in Fig. 5. The issue is not settled: it is unclear whether other comparators are located elsewhere (e.g., the red nucleus), or whether an olivary comparator function has to be combined with tonic regulation of Purkinje cell simple-spike firing rates (references in Lepora et al., 2010). Nonetheless, experimental studies of eye blink conditioning have made considerable progress in identifying a vital component of the circuit shown in Fig. 3A. The resemblance between the sensory prediction circuitry of eye blink conditioning and the circuit proposed for solving the reafference problem points to the likely importance of the decorrelation learning rule in the latter task. It has been established that paired stimulation of mossy and climbing-fiber inputs to the C3 zone in lobule VI reduces the firing rates of Purkinje cells located there, whereas subsequent stimulation of mossy fibers alone increases them back to baseline (Jirenhed et al., 2007). This is entirely consistent with the STDP version of the decorrelation learning rule (Fig. 2), although as discussed previously important issues concerning details of the underlying mechanisms remain (Rasmussen et al., 2013). Purkinje cell responses during eye blink conditioning have in fact been modeled using an STDP rule apparently quite similar to that shown in Fig. 2 (Medina et al., 2000a). It is also the case that the circuit of Fig. 5 is connected so that decreases in Purkinje cell firing affect climbing-fiber input in the appropriate direction, that is to decrease it. The general importance of this aspect of connection has been demonstrated by Badura et al. (2013), who found that mutant mice in which the projections of the inferior olive were rerouted ipsilaterally displayed much more severe ataxia that seen after simple cerebellar inactivation.

4.1.3 Summary From a computational perspective, the decorrelation rule seems ideally suited to solving the reafference problem. Application of the rule in artificial systems has led to a simple and effective circuit (Fig. 3A), shown to be useful in robotics. However, corresponding neural circuits have yet to be fully characterized. Thus,

171

172

CHAPTER 7 Decorrelation Learning and the Cerebellum

a proposed circuit for rat whisking corresponds to the known anatomy (Fig. 3B), but the electrophysiological evidence is lacking (Anderson et al., 2012). Conversely, electrophysiological evidence implicates the primate rostral fastigial nucleus in head and body reafference, but the circuitry awaits clarification (Brooks and Cullen, 2013). It seems likely that further experimental investigation of these candidate circuits will advance our understanding of the role of the cerebellum (and as argued here the role of decorrelation learning) in dealing with the reafference problem. We have also suggested that the eye blink-conditioning circuit (Fig. 5) is similar in many respects to the theoretical circuit, even though the task it appears to be carrying out is not quite classical reafference cancelation. There are two differences, the first being that cerebellar output here labeled nest(t) is also used to produce a movement. The implications of this difference are considered further in the section on motor control below. The second is that if the inferior olive is indeed the comparator, then no signal corresponding to sest(t) is available to the rest of the system outside the cerebellum, since the inferior olive projects exclusively to cerebellar targets. As mentioned above, this might make good functional sense, because the chances of a second painful stimulus occurring at the same time as the unconditioned stimulus are low, and the important task for the organism is to use information about the predicted painful stimulus, that is, nest(t). But this reinforces the observation that the task in eye blink conditioning may not be identical to the standard reafference problem. Tantalizingly, the clearest evidence at present concerns not the cerebellum itself, but what are termed precerebellar structures. These structures resemble the cerebellar microcircuit in certain respects, and extensive experimental investigation has demonstrated that some do indeed adaptively remove reafferent interference from sensory signals, Moreover they use a form of the decorrelation learning rule to do so. For example, the electrosensory lateral line lobe of the Mormyrid electric fish is able to learn to form a negative image of the electric organ discharges produced during active electrosensing in order to cancel them from the output of its passive electrosensory receptors (e.g., Bell et al., 1997, 2008; Montgomery et al., 2012; Requarth and Sawtell, 2011). In these cells, the efference copy signals m(t) are supplied on parallel-fiber inputs to the apical dendrites, whereas the mixed sensory signal n(t)þs(t) is also supplied as an input on basal dendrites. The comparator stage can thus occur within the cell itself, with neuronal output as the teaching signal, eliminating the need for a climbing fiber (Dean and Porrill, 2010; Porrill et al., 2013). The required learning rule is a homosynaptic and anti-Hebbian version (for more discussion see, e.g., Roberts and Bell, 2000; Sawtell and Williams, 2008) of the decorrelation learning rule. The evidence concerning the role of precerebellar structures in the reafference problem can only be suggestive of a similar role for the cerebellum, but it does indicate that the basic cerebellar microcircuit has the required computational capacity, and that sensory prediction may have been an evolutionarily ancient cerebellar function.

5 Motor Control

4.2 General Sensory Prediction Recent studies have indicated a cerebellar role in sensory prediction for a wide range of active sensing and motor control tasks (e.g., Bastian, 2011; Bhanpuri et al., 2013; Cerminara et al., 2009; Imamizu et al., 2003; Izawa et al., 2012; Knolle et al., 2012; Miall et al., 2007; Roth et al., 2013; Schlerf et al., 2012; Shmuelof et al., 2012; Tseng et al., 2007). In broad terms, this is consistent with the fact that an adaptive filter using the decorrelation learning rule can be connected effectively in many different circuits that use sensory prediction (e.g., Porrill et al., 2013; Widrow and Stearns, 1985) besides the one shown in Fig. 3A. However, as with the early experiments on reafference mentioned earlier, these studies often use imaging techniques, or examine the performance of patients with cerebellar damage. Detailed information concerning circuitry, such as the exact nature of the signals entering and leaving the particular region of cerebellar cortex, or the relative contribution of extracerebellar structures, remain unknown. In particular, we do not know how cerebellar output affects climbing-fiber input for the relevant microcomplex. At this stage, therefore it is not possible to relate these findings in any specific way to the decorrelation learning rule. Hence, the emphasis here on the restricted topic of reafference.

5 MOTOR CONTROL A natural extension of the reafference problem to motor control comes from the observation that some sensory signals are better avoided than predicted, for example painful stimuli that indicate tissue damage. We therefore consider first the computational issues involved in converting the predictive circuit of Fig. 3A to a circuit capable of signaled (conditioned) avoidance learning, that is using a predictive stimulus to trigger a movement of the body that avoids the pain. Secondly we look at an example where the need for avoidance is perhaps less obvious, which is when the “bad” stimulus is movement of the image across the retina. Such movement irreversibly degrades the visual signal, and so also requires its avoidance by gaze stabilization rather than prediction.

5.1 Signaled-Avoidance Learning 5.1.1 Computational Analysis The circuit for signaled-avoidance learning shown in Fig. 6A differs from the sensory prediction circuit of Fig. 3A in two minor ways. The input to the system r(t) is now labeled “predictive stimulus” (cf. Fig. 5), and this can be either an efference copy or an external stimulus, or a mixture of the two. In similar fashion, the “unknown process” can refer to the body’s own dynamics, or to a contingency imposed externally by an experimenter, or the physics of the environment (cf. previous discussion of eye

173

174

CHAPTER 7 Decorrelation Learning and the Cerebellum

FIGURE 6 (A) Generic circuit for signaled avoidance. Here, a stimulus r(t) can be thought of as producing, via an unknown process or contingency, a painful stimulus n(t). A copy of r(t) is sent to the adaptive element that uses it to produce a motor command mest(t). This command acts via the appropriate motor circuitry and plant to produce what is in effect an estimate nest(t) of the painful stimulus n(t). As learning proceeds, the avoidance movement becomes more successful, until eventually n(t) is avoided completely, that is, nest(t) ¼ n(t). In the case of signaled avoidance, the comparison between nest(t) and n(t) takes place in the external world (dotted box). The unavoided pain signal sest(t) is the sum of any unpredicted painful stimulus s(t), and the inaccuracy n(t)  nest(t) of the prediction, and is used as the teaching signal for the adaptive element. It can activate the motor circuitry on its own account, triggering an escape rather than avoidance movement (not shown). (B) Specific circuit for signaled (conditioned) eye blink avoidance. Here, r(t) is the conditioned stimulus, which predicts the arrival of the painful unconditioned stimulus n(t) after an unknown delay. A copy of r(t) is sent to the cerebellar eye blink region (lobule HVI, zone C3 possibly encroaching on D0) which produces via the anterior interpositus nucleus (AIP) a motor command mest(t). This acts via the red nucleus, accessory abducens nucleus, and nictitating membrane (NM) plant dynamics to produce the conditioned nictitatingmembrane movement with temporal profile nest(t). The unavoided pain signal sest(t) ¼s(t) þ n(t)  nest(t) is detected by corneal receptors and relayed to the trigeminal nucleus, where it can be used to drive an unconditioned NM response (not shown) and is also sent to the dorsal accessory olive to be used as a teaching signal.

5 Motor Control

blink circuit in Fig. 5). These two changes mean that the circuit in Fig. 6A is more general than that of Fig. 3A, but do not affect its basic signal-processing capability. The remaining two differences between the circuits of Figs. 6A and 3A are more significant. First, the internal comparator in Fig. 3A has been removed because “comparison” now takes place in the external world (dotted box in Fig. 6A), in the sense that the movement produced by the circuit physically reduces the painful effects that would otherwise have occurred. One potential computational problem with the external comparator is that for certain kinds of avoidance response the system may not be able to tell if the contingency between predictor stimulus and pain has been altered, and so would not be capable of extinction. This problem is discussed in the next section on eye blink conditioning. Secondly, the outcome of the adaptive element is no longer a simple estimate nest(t) of the predicted sensory signal, but a motor command mest(t) that will produce a movement with temporal profile nest(t) to avoid n(t). This command is acted on by the transfer functions B of the premotor circuitry and P of the motor plant, so that PB mest ðt Þ ¼ n est ðt Þ

(7.3)

The complexities introduced by B and P vary in importance from system to system. In simple cases of conditioned avoidance, the cerebellar output may be able to use the reflex circuitry for escape or withdrawal already in place for the painful stimulus itself. However, for more precise movements such as those required for the VOR, this complexity assumes considerable computational importance. It is therefore considered in more detail in the section on the VOR below.

5.1.2 Experimental Questions Figure 6B shows a circuit, based on Fig. 6A, proposed here for the specific case of signaled-avoidance learning using the eye blink response. To ensure that avoidance is physically possible, the unconditioned stimulus needs to be an airpuff or similar stimulus, whose effects on the cornea can in fact be avoided by closure of eyelids or nictitating membrane (NM). This contrasts with the classical conditioning circuit shown in Fig. 5, where the unconditioned stimulus is a periorbital shock that is unaffected by eye blinks. Avoidance learning is in fact usually regarded as a form of instrumental or operant conditioning, as opposed to a form of classical conditioning (e.g., Mackintosh, 1974; Moore and Gormezano, 1961). The circuit shown in Fig. 6B assumes that the relevant regions of cerebellar cortex and the deep cerebellar nuclei used for avoidance learning are the same as those used for sensory prediction in classical conditioning of the eye blink, namely zone C3 of cortical lobule HVI, and the anterior interpositus nucleus (AIP). The latter projects to the red nucleus, which in turn projects to eye blink motoneurons in both the facial nucleus for movements of the external eyelid, and (as shown here) in the accessory abducens nucleus for movements of the NM. Early in acquisition these conditionedresponse movements will be too small and perhaps wrongly timed, so some air from

175

176

CHAPTER 7 Decorrelation Learning and the Cerebellum

the unconditioned stimulus (airpuff) will continue to reach the cornea. This painful stimulus will be registered by the trigeminal nucleus, whose output triggers an unconditioned reflex blink and also modulates the relevant region of the inferior olive, the dorsal accessory olive. The climbing-fiber teaching signal is relayed to the relevant part of cerebellar cortex. Eventually the movements become fully effective in avoiding the airpuff, so there are no painful stimuli to be signaled by the trigeminal nucleus, hence, no modulation of the inferior olive and no further decorrelation learning. This proposed circuit for eye blink avoidance learning raises a number of experimental questions. One is whether avoidance does in fact use the same basic neural machinery as classical conditioning. There is some direct evidence that the response profiles of avoidance and conditioned eyelid responses are very similar (Mauk and Ruiz, 1992), and also, much more indirectly, a general impression that the effects of neural manipulations are the same for both avoidance learning (airpuff and unrestrained eyelid) and classical conditioning (periorbital shock). This general impression is reinforced by cases where an apparent avoidance paradigm (airpuff plus unrestrained eyelids) is not regarded as importantly different from classical conditioning (for recent example see Chettih et al., 2011). However, detailed comparisons of the effects of cerebellar manipulations in the two paradigms have yet to be made. Another experimental question concerns an issue raised in the previous section, about the relation between the motor command mest(t) and its sensory effects nest(t). In the case of the NM, it appears that the temporal profile of the classically conditioned-response command is delayed and lengthened by the (first-order) dynamics of the retrobulbar muscle and the membrane itself (Lepora et al., 2007). However, these effects are not large compared to the variability of both conditioned-response amplitudes and peak timing (references in Lepora et al., 2010). In any case, the durations of conditioned responses (100–800 ms, depending on interval between conditioned and unconditioned stimulus onset) are often substantially greater than those of the unconditioned stimulus (typically 60 ms for periorbital shock), suggesting the system is not concerned in precisely matching the temporal characteristics of conditioned-response and unconditioned stimulus. It may however be concerned with response amplitude, because evidence on the effects of eyelid restraint suggests that the amplitude of unconditioned blinks is under adaptive control by the cerebellum (Chen and Evinger, 2006; Pellegrini and Evinger, 1997). How this adaptive circuitry might be combined with the circuit proposed in Fig. 6B is not understood. The third question is also related to an issue raised in the previous section, that of extinction. How would the circuit on Fig. 6B know if the unconditioned stimulus were omitted? This is a generic problem with signaled avoidance (Mackintosh, 1974). One obvious possibility in the present case is that when the eyes are fully closed, the airpuff can still be detected by sensors on the eyelids (see also Longley and Yeo, this volume). As far as we are aware, there is no experimental evidence either for or against this possibility. Finally, if the same areas of cerebellar circuitry are indeed involved in both sensory prediction and avoidance learning, then the circuits shown in Figs. 5 and 6B

5 Motor Control

have to be combined. However, it is unclear how in practice the internal and external comparators would work together. Could a sufficiently strong nucleo-olivary signal prematurely stop avoidance learning, in effect fooling the system about the external world? The possibility that the comparators do in fact work together is suggested by the observation that the delay introduced by the nucleo-olivary pathway (90 ms, Best and Regehr, 2009; Hesslow, 1986) is similar to that introduced by the dynamics of the NM plant (Lepora et al., 2010). But little is known about the details of how such a collaboration could work.

5.1.3 Summary The fact that painful stimuli are better avoided than just predicted suggests the utility of extending the sensory predictor circuit of Fig. 3A, and a candidate theoretical circuit for signaled avoidance is shown in Fig. 6A. As with sensory prediction itself, it seems that the most investigated candidate neural circuit for signaled avoidance is the one used in eye blink conditioning (Fig. 6B). However, although it seems plausible that the same cerebellar circuitry is used for both prediction and avoidance, this has yet to be firmly established, and the problems of coordinating an internal comparator with actions in the external world are little understood. Moreover, the decorrelation learning rule cannot be simply applied to motor commands in the same way it can be applied to sensory predictions. This point is examined in more detail in the next section. Although we have focused here on eye blink circuitry as an example of signaled avoidance, other possibilities have been suggested recently. A version of the circuit shown in Fig. 6B has been proposed to account for our general ability, apparently acquired in early childhood, to move our limbs without harming ourselves (Dean et al., 2013). This version uses an efference copy of motor commands to the limbs as input, and reflex withdrawal circuits in the spinal cord to organize the output. This circuit is based on the fore- and hindlimb areas of zone C3 in lobules IV and V, and derives from previous suggestions concerning the functions of these areas (Ekerot et al., 1995, 1997; Jo¨rntell and Ekerot, 2003). The role of signaled avoidance in preventing damaging collisions has also been explored in the context of robot locomotion (Herreros and Verschure, 2013), using a cerebellar-based control circuit to explore the possible role of an internal comparator on switching behavior from reactive mode (unconditioned reflex) to adaptive mode (conditioned reflex). Finally, the conditioned eye blink response to visual threat, again thought to be acquired in early childhood, is compromised in patients with cerebellar degeneration (Thieme et al., 2013). Lesion-symptom mapping implicates regions in the posterior lobe, two of which are in HVI.

5.2 Gaze Stabilization Although whole-image movement (retinal slip) is very different from pain, its degradation of visual processing is so undesirable that many species take great lengths to stabilize the direction of gaze. Gaze stabilization is in effect driven by avoidance of

177

178

CHAPTER 7 Decorrelation Learning and the Cerebellum

retinal slip, so it is possible that the pain avoidance circuit shown in Fig. 6A could be adapted for gaze stabilization. Here, we focus on just one aspect of gaze stabilization, namely the vestibulo-ocular reflex (VOR). Figure 7 shows a signaled-avoidance circuit (Fig. 6A) modified for the horizontal VOR (references in, e.g., Boyden et al., 2004). The input r(t) to the system is now a vestibular signal (from the semicircular canals with some processing in the vestibular nuclei), and it signals a movement of the head that on its own would produce a retinal slip signal n(t). A copy of r(t) is sent as mossy-fiber input to the floccular region of the cerebellum (Fig. 4) where it is transformed into a motor command mest(t) that acts on premotor circuitry in the brainstem. This circuitry, which also receives the vestibular signal r(t) itself, produces an eye movement that can be considered an estimate nest(t) of the eye-rotation produced by the head movement n(t). Any inaccuracy in this eye movement gives rise to a retinal slip signal n(t)  nest(t) which is relayed via structures such as the nucleus of the optic tract (NOT) to the inferior olive (dorsal cap of Kooy) as the training signal for decorrelation learning in the flocculus. There may also be “exafferent” retinal slip, induced for example by an experimenter, but if

FIGURE 7 Signaled-avoidance circuit for the vestibulo-ocular reflex (VOR). Here, the input r(t) is a vestibular signal that is related to actual eye disturbance n(t) via a complex and unknown process involving inverse vestibular processing (to reconstruct actual head movement) and the mechanical linkage between head and eye movement. A copy of r(t) is sent to the flocculus, whose output acts in conjunction with r(t) on brainstem premotor circuitry to produce a motor command mest(t). This acts on the oculomotor plant to produce an “avoidance” eye movement with temporal profile nest(t). In the external world (dotted box), this is in effect subtracted from a notional retinal slip signal, combining predictable (n(t)) and unpredictable (s(t)) retinal slip. The result is an actual retinal slip signal sest(t) that combines any unpredictable retinal slip (s(t)) with slip that was not successfully avoided (n(t)  nest(t)). This signal is relayed via the NOT (nucleus of the optic tract) and related structures to the dorsal cap of Kooy in the inferior olive, which sends climbing fibers to the flocculus. To ensure the stable learning of accurate avoidance eye movements nest(t), the flocculus also receives as mossy-fiber input an efference copy of the motor command mest(t) (see text).

5 Motor Control

this is uncorrelated with head movement then decorrelation learning will not be affected. The driving of the avoidance response by r(t) itself is a difference between the circuits for the VOR (Fig. 7) and signaled pain avoidance (Fig. 6A). Although such driving can occur in signaled avoidance (e.g., the alpha reflex where the CS does trigger an escape response on its own account) it is not a central feature. Here, however, the output of the adaptive element mest(t) acts together with r(t), so the function of the adaptive element can be regarded as reflex calibration. In this case, the system is avoiding current not future retinal slip, so is not acting as a predictor (though there appear to be other gaze-stabilization circuits that are predictive: see Brooks and Cullen, 2013). A second difference between the circuits of Figs. 6A and 7 is that retinal slip is a directional signal, so the problem of extinction is avoided. The third and perhaps most important difference is that the eye movements needed for gaze stabilization must be very precise, much more so than for example eye blink CRs, which can be an order of magnitude longer that the unconditioned stimulus. In the VOR eye movement, amplitude and timing must be matched as closely as possible to the temporal profile of the head movement, a requirement that focuses attention on Eq. (7.3). This equation shows that the output of the adaptive element in Fig. 6A is not an estimate nest(t) of painful signal to be avoided, but an estimate mest(t) of the motor command required avoid it. The situation for the VOR is similar, though more complicated because of the influence of the vestibular input r(t) on eye movement. However, in both cases (Figs. 6A and 7) the teaching signal is still what is sometimes termed “sensory error” n(t)  nest(t), when what is needed is “motor error” m(t)  mest(t) where m(t) is the correct motor command. Since these two signals are related (as in Eq. 7.3) by a complex combination of premotor processing in the brainstem and the action of the oculomotor plant it can be seen that the “motor error” m(t)  mest(t) is not directly available to the system. We have glossed over this problem for the slightly simpler case of signaled avoidance by pointing to the relative imprecision of the CR, in effect by assuming that the differences between sensory and motor error can be safely ignored, but for other kinds of movement it is not clear this assumption is warranted. In particular, there is a danger for complex plants that, for certain frequencies, the relation between sensory and motor error will change in sign, so producing disastrously unstable learning (cf. results of Badura et al., 2013, discussed previously). The solution to this problem that is shown in Fig. 7 is to take an efference copy of the eye-movement command as an additional mossy-fiber input to the flocculus. This “recurrent architecture” using sensory error can be shown theoretically to produce stable learning (Porrill and Dean, 2007b), and has been used in simulated and robot VOR adaptation (Dean et al., 2002; Haith and Vijayakumar, 2009; Lenz et al., 2009; Porrill et al., 2004). Informally, it can be understood as an extension of decorrelation learning to the case where the relevant correlation to be removed is that between one’s own motor commands and some undesired sensory outcome: the correlation again is assumed to signify cause, and removing the correlation equivalent to

179

180

CHAPTER 7 Decorrelation Learning and the Cerebellum

learning accurate movements. The recurrent architecture is not the only solution that has been proposed for the motor-error problem; in particular, Kawato and coworkers have argued strongly for feedback–error–learning where sensory error is converted approximately into a motor-error signal than can be used both for online control and adaptive learning (e.g., Gomi and Kawato, 1992). Recent discussion of the comparative merits of the two solutions can be found in Porrill et al. (2013). Although many studies of floccular function in the VOR are consistent with the circuit shown in Fig. 7, its complexities have made it difficult to firmly identify mechanisms of synaptic plasticity (Boyden et al., 2004; Dean and Porrill, 2011; Porrill et al., 2013). Part of the problem is the factor mentioned previously, that there are both direct excitatory and indirect inhibitory pathways from granule cells to Purkinje cells, and evidence suggests that synapses in both pathways are capable of LTD and LTP. But there is an additional site of synaptic plasticity in the vestibular nuclei that, depending on the precise experimental conditions, can transfer learning from cerebellar cortex to the vestibular nuclei, or adapt the VOR on its own with no apparent learning in cerebellar cortex (Boyden et al., 2004; Ke et al., 2009; Mcelvain et al., 2010; Menzies et al., 2010; Porrill and Dean, 2007a). This latter complexity arises in part because of the optokinetic reflex (OKR) driven by retinal slip (not shown in Fig. 7), which corresponds to the unconditioned eye blink reflex. Because of 100 ms delays in retinal slip processing, this reflex operates primarily at low frequencies (less than 0.5 Hz, Paige, 1983). However, its existence can complicate experimental attempts to modify the VOR. For example, one way of driving VOR gain down is to rotate the subject on a turntable, in phase with rotation of the visual surround. But if the rotation is at a frequency within the range of the OKR the role of the VOR is lessened as the OKR itself reduces retinal slip. Moreover, because the OKR is partly mediated by the flocculus itself, the adaptation paradigm described above will ensure a correlation between the output of the flocculus as it drives the OKR and vestibular input. This is the correlation that can drive plasticity in the vestibular nuclei themselves, in the extreme case without plasticity in the flocculus (Boyden et al., 2004; Mcelvain et al., 2010; Menzies et al., 2010; Porrill and Dean, 2007a). Complexities of this kind can prove even more significant when smooth pursuit, also mediated by the flocculus, plays an important role (Ke et al., 2009). An important experimental question, therefore, is how to devise training conditions that would allow different sites of synaptic plasticity to be investigated selectively. One suggestion (Boyden et al., 2004) is to confine training (at least in primates) to high frequencies such as 5 Hz, where it is known Purkinje cell simple-spike output does not modulate and therefore does not produce learning in the vestibular nucleus (Raymond and Lisberger, 1998).

5.3 General Motor Control We have argued that the decorrelation learning rule is ideally suited to the reafference problem, and to sensory noise cancelation in general. Moreover, the basic noisecancelation circuit can be easily extended to produce avoidance of stimuli rather than

6 Future Directions

simple prediction, which indicates how the decorrelation rule can also be used for motor control. The two examples of motor control considered in some detail were signaled-avoidance learning for eye blink, and gaze stabilization by the VOR. A third example would be saccadic adaptation, in which the sensory stimulus to be avoided rather than predicted is the appearance of a target outside the fovea at the end of a saccade (e.g., Hopp and Fuchs, 2004; Soetedjo et al., 2009). However, for both eye blink and VOR (and indeed saccades) the candidate neural implementation of the theoretical circuit shows that external wiring of cerebellar microzones can be more complex than apparently required, and that these complexities can hinder identification of the underlying computational principles. In eye blink conditioning, it appears that the circuitry may use both an internal comparator and external comparison to combine sensory and motor functions, in a way not familiar to artificial systems that also use the decorrelation rule. In gaze stabilization, the OKR and VOR cooperate in complex and subtle ways to ensure reduction of retinal slip, exploiting where helpful an extracerebellar site of plasticity that extends the dynamic range of the VOR (Porrill and Dean, 2007a). Although these complexities continue to make obstacles for the experimental characterization of cerebellar learning rules, they do point to the possibility that detailed understanding of cerebellar circuits will not only throw light on sensorimotor processing in the brain, but may also suggest novel algorithms for such processing in artificial systems, especially in robotics.

6 FUTURE DIRECTIONS Finally, we briefly consider some rather more general consequences of the decorrelation learning rule for future understanding of cerebellar function, in the context first of coordination, and then of “cognitive” processing.

6.1 Coordination A major problem for complex organisms and robots is the undesirable impact that action by one part can have on other parts. An example of this has already been mentioned: clumsy scratching of the face could in principle cause damage to the eye. Other examples include possible effects of body movements on balance or blood pressure. The general theme is that a desired action is likely to have unwanted side effects. Circuits that use the decorrelation learning rule for signaled avoidance appear ideally suited for dealing with the problem of unwanted side effects. For example, in the absence of adjustment by the cerebellum, certain movements will cause undesirable changes in blood pressure. This can be prevented by having a region of the cerebellum that (i) receives as mossy-fiber input efference copies of the relevant command, (ii) receives as climbing-fiber input a signal related to blood pressure, and (iii) has an output connected appropriately to the neural machinery responsible for cardiovascular control (as in, e.g., Figs. 6A and 7). This region of the cerebellum can

181

182

CHAPTER 7 Decorrelation Learning and the Cerebellum

then use decorrelation learning to both predict when a given command would be followed by changes in blood pressure, and act so as to avoid those changes. For this particular example of coordination, a circuit has been identified, centered on part of the flocculus (Fig. 4) and with the kind of external connectivity just described, that does in fact appear to avoid the otherwise excessive increases in blood pressure that would otherwise be produced by defense reactions (Nisimaru et al., 2013). However, in general the detailed circuits that underlie the multiple forms of coordination required by complex organisms are not yet well understood. It is possible that attempts to use cerebellar-inspired control algorithms to ensure coordination in complex robots may provide clues about the sort of connectivity to look for in particular cases. Signaled-avoidance circuits may also be helpful for what might be termed temporal coordination, needed for rapid sequences of movements made by the same part of the body, as in speech or playing a musical instrument. The precise movement required at any point in a rapid sequence will be influenced by the position of the fingers, tongue, and so forth at the end of the previous movement. In this case, a copy of the next motor command together with information related to current position and velocity would be used as mossy-fiber input, and the difference between intended and actual sound (for speech or music) as climbing-fiber input. Again, while it appears that decorrelation learning could in principle be used to ensure temporal coordination, detailed descriptions of candidate circuits do not yet appear to be available, and studies in robots may provide useful clues.

6.2 Cognitive Tasks The signal r(t) in the computational circuits shown in Figs. 3A, 6A, and 7 can be motor, e.g., efference copy, or sensory, e.g., proprioceptive. As far as the circuits themselves are concerned, the key property of r(t) is not whether it is sensory or motor, but how it is correlated with the teaching signal. To the extent that “thoughts” can be represented in the form of a time-varying signal r(t), then they too can be processed by circuits using decorrelation learning. Thus in principle it seems computationally feasible for some regions of the cerebellum to participate in cognitive functions (e.g., Ito, 2008, 2012). Moreover, there is extensive anatomical, imaging and clinical evidence suggestive of such participation (for reviews see, e.g., Fatemi et al., 2012; Fuentes and Bastian, 2007; Gowen and Miall, 2007; Nicolson et al., 2001; Ramnani, 2006; Schmahmann, 2010). The problem however is that this evidence is not yet detailed enough to allow identification of putative circuits that could underlie cerebellar involvement in specific cognitive functions (see, e.g., Frith et al., 2000; Gowen and Miall, 2007; Glickstein et al., 2011). Typically, the exact regions of cerebellum involved, their inputs and outputs, and above all how the outputs affect climbing-fiber input, remain unknown. Here, we briefly outline, in the absence of these circuit details, two ways in which decorrelation learning could in principle contribute to a cerebellar role in cognitive processing.

6 Future Directions

First, the reafference-problem circuit of Fig. 3A can be speculatively applied to internal speech. The input r(t) to the circuit becomes preverbal thoughts, which are translated by an unknown dynamic process into internal speech n(t), perhaps for example to utilize the powers of various forms of verbal memory. In the circuit of Fig. 8, the cerebellum receives a copy of these thoughts, and learns to provide an estimate nest(t) of the resultant inner speech. This estimate is subtracted from the combined signal for inner speech n(t) and external speech s(t), thus producing an estimate sest(t) of the external signal. Learning proceeds exactly as in Fig. 3A until there is no longer any correlation between r(t) and sest(t), that is until nest(t) ¼ n(t) and the effects of inner speech are exactly canceled (cf. Ackermann et al., 2007; Scott, 2013). It can be conjectured that if this circuit were to fail, then internal speech would become confused with external signals, perhaps leading to illusions of voices in the head (e.g., Frith et al., 2000). Secondly, it has been reported that “histoanatomic abnormalities of the cerebellum are one of the most consistent neuroanatomic findings in the brains of autistic individuals . . .it is likely that abnormalities of the cerebellum contribute significantly to many of the clinical features of the disorder” (Fatemi et al., 2012, p. 779). Although the computational underpin of such a contribution is quite unclear, one possibility is suggested by a consistent feature of first-hand accounts of the experience of autism, namely disorders in sensory processing, both hypo- and hypersensitivity (e.g., Cesaroni and Garber, 1991). Both hypo- and hypersensitivity would be consistent with a difficulty in setting a threshold level for attended stimuli, which in turn would be a natural consequence of a failure in noise cancelation (e.g., Fig. 3A). In a whisking robot rat (Anderson et al., 2010, 2012), failure to solve the reafference problem means choosing either a low threshold for stimulus salience that produces unwanted “ghost” orients to predictable self-produced stimuli, or a high threshold that ensures significant external events are missed. The two choices

FIGURE 8 Predicting internal speech. This diagram is a relabeling of Fig. 3A. Preverbal thoughts are converted by a complex unknown process into internal speech, where they appear in combination with external speech sources. If these internal speech sensations are not canceled, they could lead to the illusions of voices in the head.

183

184

CHAPTER 7 Decorrelation Learning and the Cerebellum

lead to hyperactive and quiescent behaviors respectively. After noise cancelation, a single threshold effectively separates the two types of input. As far as we are aware, the possibility that sensory deficits in autism might be mediated by cerebellar–collicular connectivity has not previously been considered; for example a recent review concentrates solely on cerebro-cerebellar connectivity (Fatemi et al., 2012). If this pathway and mechanism were to be implicated in autism, it might lead to the development of convenient new animal models and experimental techniques based on the whisker noise-cancelation paradigm.

APPENDIX DERIVATION OF SUPERVISED-LEARNING RULE This rule for supervised learning has a number of names, for example covariance, delta, Widrow–Hoff, and least mean squares (LMS). It has been proposed not just for the cerebellum, but for artificial devices that use supervised learning, such as adaptive filters (Fig. 1B). If the desired output of the Purkinje cell (Fig. 1A) is zd(t) (Fig. 1B), we can define the neuron output error as the difference between actual and desired neuron output X e ðt Þ ¼ z ðt Þ  z d ðt Þ ¼ w i pi ðt Þ  z d ðt Þ (A1) The mean square output error over time interval T ð  2  1 t 0 þT e ¼ eðt Þ2 dt T t0

(A2)

provides a well-behaved measure of performance over that time (we use angle brackets to express expected or mean values). Hence, we can define a cost function E ðw Þ ¼

1  2 e 2

(A3)

which quantifies the performance of the neuron for any given value of the weight vector w ¼ (w1, w2, . . ., wn). The optimal weight estimate (in the sense of least squares, for these data) minimizes this cost function. While there are many direct techniques available to solve such minimization problems, we are looking specifically for a biologically plausible algorithm that can be implemented as a synaptic learning rule. For a learning rate parameter b, which is chosen positive and small enough, weight changes given by the gradient descent learning rule Dw i ¼ b

@E @w i

(A4)

are guaranteed to reduce the cost function (unless the gradient is zero, in which case we are already at the global minimum for the quadratic cost function we have chosen). The gradient of the cost function is

References

    @E 1 @e2 @e ¼ e ¼ hepi i ¼ @w i 2 @w i @w i

(A5)

hence, the gradient descent learning rule can be written as Dw i ¼ bhepi i

(A6)

Since this change in weight is proportional to the correlation of pi(t) and e(t), learning stops when parallel-fiber activity has zero correlation with climbing-fiber activity, hence, this process is also called decorrelation learning (Dean et al., 2002).

Acknowledgments Preparation of this article was supported by grants from the European Union (REALNET, 270434 FP7) and the EPSRC (EP/1032533/1).

References Ackermann, H., Mathiak, K., Riecker, A., 2007. The contribution of the cerebellum to speech production and speech perception: clinical and functional imaging data. Cerebellum 6, 202–213. Albus, J.S., 1971. A theory of cerebellar function. Math. Biosci. 10, 25–61. Anderson, S.R., Pearson, M.J., Pipe, A., Prescott, T., Dean, P., Porrill, J., 2010. Adaptive cancelation of self-generated sensory signals in a whisking robot. IEEE Trans. Robot. 26, 1065–1076. Anderson, S.R., Porrill, J., Pearson, M.J., Pipe, A., Prescott, T., Dean, P., 2012. An internal model architecture for novelty detection: implications for cerebellar and collicular roles in sensory processing. PLoS ONE 7, e44560. Andersson, G., Garwicz, M., Hesslow, G., 1988. Evidence for a GABA-mediated cerebellar inhibition of the inferior olive in the cat. Exp. Brain Res. 72, 450–456. Andreescu, C.E., Prestori, F., Brandalise, F., D’Errico, A., De Jeu, M.T.G., Rossi, P., Botta, L., Kohr, G., Perin, P., D’Angelo, E., De Zeeuw, C.I., 2011. NR2A subunit of the n-methyl d-aspartate receptors are required for potentiation at the mossy fiber to granule cell synapse and vestibulo-cerebellar motor learning. Neuroscience 176, 274–283. Apps, R., Hawkes, R., 2009. Cerebellar cortical organization: a one-map hypothesis. Nat. Rev. Neurosci. 10, 670–681. Badura, A., Schonewille, M., Voges, K., Galliano, E., Renier, N., Gao, Z.Y., Witter, L., Hoebeek, F.E., Chedotal, A., De Zeeuw, C.I., 2013. Climbing fiber input shapes reciprocity of Purkinje cell firing. Neuron 78, 700–713. Bastian, A.J., 2011. Moving, sensing and learning with cerebellar damage. Curr. Opin. Neurobiol. 21, 596–601. Bell, C., Bodznick, D., Montgomery, J., Bastian, J., 1997. The generation and subtraction of sensory expectations within cerebellum-like structures. Brain Behav. Evol. 50, 17–31. Bell, C.C., Han, V., Sawtell, N.B., 2008. Cerebellum-like structures and their implications for cerebellar function. Annu. Rev. Neurosci. 31, 1–24.

185

186

CHAPTER 7 Decorrelation Learning and the Cerebellum

Belmeguenai, A., Hosy, E., Bengtsson, F., Pedroarena, C.M., Piochon, C., Teuling, E., He, Q., Ohtsuki, G., De Jeu, M.T., Elgersma, Y., De Zeeuw, C.I., Jo¨rntell, H., Hansel, C., 2010. Intrinsic plasticity complements long-term potentiation in parallel fiber input gain control in cerebellar Purkinje cells. J. Neurosci. 30, 13630–13643. Bengtsson, F., Hesslow, G., 2006. Cerebellar control of the inferior olive. Cerebellum 5, 7–14. Bengtsson, F., Jirenhed, D.A., Svensson, P., Hesslow, G., 2007. Extinction of conditioned blink responses by cerebello-olivary pathway stimulation. NeuroReport 18, 1479–1482. Best, A.R., Regehr, W.G., 2009. Inhibitory regulation of electrically coupled neurons in the inferior olive is mediated by asynchronous release of GABA. Neuron 62, 555–565. Bhanpuri, N.H., Okamura, A.M., Bastian, A.J., 2013. Predictive modeling by the cerebellum improves proprioception. J. Neurosci. 33, 14301–14306. Blakemore, S.J., Wolpert, D., Frith, C., 2000. Why can’t you tickle yourself? Neuroreport 11, R11–R16. Blakemore, S.J., Frith, C.D., Wolpert, D.M., 2001. The cerebellum is involved in predicting the sensory consequences of action. NeuroReport 12, 1879–1884. Bower, J.M., 1997. Is the cerebellum sensory for motor’s sake or motor for sensory’s sake: the view from the whiskers of a rat? Prog. Brain Res. 114, 463–496. Bower, J.M., Parsons, L.M., 2003. Rethinking the ’lesser brain’. Sci. Am. 289, 40–47. Boyden, E.S., Katoh, A., Raymond, J.L., 2004. Cerebellum-dependent learning: the role of multiple plasticity mechanisms. Annu. Rev. Neurosci. 27, 581–609. Boyden, E.S., Katoh, A., Pyle, J.L., Chatila, T.A., Tsien, R.W., Raymond, J.L., 2006. Selective engagement of plasticity mechanisms for motor memory storage. Neuron 51, 823–834. Brooks, J.X., Cullen, K.E., 2013. The primate cerebellum selectively encodes unexpected selfmotion. Curr. Biol. 23, 947–955. Cerminara, N.L., Apps, R., Marple-Horvat, D.E., 2009. An internal model of a moving visual target in the lateral cerebellum. J. Physiol. Lond. 587, 429–442. Cesaroni, L., Garber, M., 1991. Exploring the experience of autism through firsthand accounts. J. Autism Dev. Disord. 21, 303–313. Chen, F.P., Evinger, C., 2006. Cerebellar modulation of trigeminal reflex blinks: interpositus neurons. J. Neurosci. 26, 10569–10576. Chettih, S.N., Mcdougle, S.D., Ruffolo, L.I., Medina, J.F., 2011. Adaptive timing of motor output in the mouse: the role of movement oscillations in eyelid conditioning. Front. Integr. Neurosci. 5, 72. Coesmans, M., Weber, J.T., De Zeeuw, C.I., Hansel, C., 2004. Bidirectional parallel fiber plasticity in the cerebellum under climbing fiber control. Neuron 44, 691–700. Cullen, K.E., 2004. Sensory signals during active versus passive movement. Curr. Opin. Neurobiol. 14, 698–706. D’Angelo, E., De Zeeuw, C.I., 2009. Timing and plasticity in the cerebellum: focus on the granular layer. Trends Neurosci. 32, 30–40. Dean, P., Porrill, J., 2010. The cerebellum as an adaptive filter: a general model? Funct. Neurol. 25, 1–8. Dean, P., Porrill, J., 2011. Evaluating the adaptive-filter model of the cerebellum. J. Physiol. Lond. 589, 3459–3470. Dean, P., Porrill, J., Stone, J.V., 2002. Decorrelation control by the cerebellum achieves oculomotor plant compensation in simulated vestibulo-ocular reflex. Pro. Biol. Sci. 269, 1895–1904.

References

Dean, P., Porrill, J., Ekerot, C.F., Jo¨rntell, H., 2010. The cerebellar microcircuit as an adaptive filter: experimental and computational evidence. Nat. Rev. Neurosci. 11, 30–43. Dean, P., Anderson, S.R., Porrill, J., Jo¨rntell, H., 2013. Adaptive-filter model of cerebellar zone C3: possible basis for safe limb control. J. Physiol. 591, 5459–5474. De Zeeuw, C.I., Yeo, C.H., 2005. Time and tide in cerebellar memory formation. Curr. Opin. Neurobiol. 15, 667–674. Dow, R.S., Moruzzi, G., 1958. The Physiology and Pathology of the Cerebellum. University of Minnesota Press, Minneapolis, MN. Eccles, J.C., Ito, M., Szenta´gothai, J., 1967. The Cerebellum as a Neuronal Machine. SpringerVerlag, Berlin. Ekerot, C.F., Jo¨rntell, H., 2003. Parallel fiber receptive fields: a key to understanding cerebellar operation and learning. Cerebellum 2, 101–109. Ekerot, C.-F., Jo¨rntell, H., Garwicz, M., 1995. Functional relation between corticonuclear input and movements evoked on microstimulation in cerebellar nucleus interpositus anterior in the cat. Exp. Brain Res. 106, 365–376. Ekerot, C.F., Garwicz, M., Jo¨rntell, H., 1997. The control of forelimb movements by intermediate cerebellum. Prog. Brain Res. 114, 423–429. Fatemi, S.H., Aldinger, K.A., Ashwood, P., Bauman, M.L., Blaha, C.D., Blatt, G.J., Chauhan, A., Chauhan, V., Dager, S.R., Dickson, P.E., Estes, A.M., Goldowitz, D., Heck, D.H., Kemper, T.L., King, B.H., Martin, L.A., Millen, K.J., Mittleman, G., Mosconi, M.W., Persico, A.M., Sweeney, J.A., Webb, S.J., Welsh, J.P., 2012. Consensus paper: pathological role of the cerebellum in autism. Cerebellum 11, 777–807. Frith, C.D., Blakemore, S., Wolpert, D.M., 2000. Explaining the symptoms of schizophrenia: abnormalities in the awareness of action. Brain Res. Rev. 31, 357–363. Fuentes, C.T., Bastian, A.J., 2007. ’Motor cognition’—what is it and is the cerebellum involved? Cerebellum 6, 232–236. Fujita, M., 1982. Adaptive filter model of the cerebellum. Biol. Cybern. 45, 195–206. Gao, Z.Y., Van Beugen, B.J., De Zeeuw, C.I., 2012. Distributed synergistic plasticity and cerebellar learning. Nat. Rev. Neurosci. 13, 619–635. Glickstein, M., Voogd, J., 1995. Lodewijk Bolk and the comparative anatomy of the cerebellum. Trends Neurosci. 18, 206–211. Glickstein, M., Strata, P., Voogd, J., 2009. Cerebellum: history. Neuroscience 162, 549–559. Glickstein, M., Sultan, F., Voogd, J., 2011. Functional localization in the cerebellum. Cortex 47, 59–80. Gomi, H., Kawato, M., 1992. Adaptive feedback control models of the vestibulocerebellum and spinocerebellum. Biol. Cybern. 68, 105–114. Gowen, E., Miall, R.C., 2007. The cerebellum and motor dysfunction in neuropsychiatric disorders. Cerebellum 6, 268–279. Grossberg, S., Schmajuk, N.A., 1989. Neural dynamics of adaptive timing and temporal discrimination during associative learning. Neural Netw. 2, 79–102. Haith, A., Vijayakumar, S., 2009. Implications of different classes of sensorimotor disturbance for cerebellar-based motor learning models. Biol. Cybern. 100, 81–95. Herreros, I., Verschure, P.F.M.J., 2013. Nucleo-olivary inhibition balances the interaction between the reactive and adaptive layers in motor control. Neural Netw. 47, 64–71. Hesslow, G., 1986. Inhibition of inferior olivary transmission by mesencephalic stimulation in the cat. Neurosci. Lett. 63, 76–80.

187

188

CHAPTER 7 Decorrelation Learning and the Cerebellum

Hesslow, G., Ivarsson, M., 1996. Inhibition of the inferior olive during conditioned responses in the decerebrate ferret. Exp. Brain Res. 110, 36–46. Hesslow, G., Yeo, C.H., 2002. The functional anatomy of skeletal conditioning. In: Moore, J.W. (Ed.), A Neuroscientist’s Guide to Classical Conditioning. Springer, New York, NY. Hesslow, G., Jirenhed, D.A., Rasmussen, A., Johansson, F., 2013. Classical conditioning of motor responses: what is the learning mechanism? Neural Netw. 47, 81–87. Hopp, J.J., Fuchs, A.F., 2004. The characteristics and neuronal substrate of saccadic eye movement plasticity. Prog. Neurobiol. 72, 27–53. Imamizu, H., 2010. Prediction of sensorimotor feedback from the efference copy of motor commands: a review of behavioral and functional neuroimaging studies. Jpn. Psychol. Res. 52, 107–120. Imamizu, H., Kuroda, T., Miyauchi, S., Yoshioka, T., Kawato, M., 2003. Modular organization of internal models of tools in the human cerebellum. Proc. Natl. Acad. Sci. U. S. A. 100, 5461–5466. Isope, P., Barbour, B., 2002. Properties of the cerebellar granule cell-Purkinje cell synapse: an anatomical and functional study. FENS Abstr. 081, 1. Ito, M., 1970. Neurophysiological aspects of the cerebellar motor control system. Int. J. Neurol. (Montevideo) 7, 162–176. Ito, M., 1984. The Cerebellum and Neural Control. Raven Press, New York, NY. Ito, M., 1997. Cerebellar microcomplexes. Int. Rev. Neurobiol. 41, 475–487. Ito, M., 2001. Cerebellar long-term depression: characterization, signal transduction, and functional roles. Physiol. Rev. 81, 1143–1195. Ito, M., 2008. Control of mental activities by internal models in the cerebellum. Nat. Rev. Neurosci. 9, 304–313. Ito, M., 2012. The Cerebellum: Brain for an Implicit Self. FT Press, Upper Saddle River, NJ. Ito, M., Sakurai, M., Tongroach, P., 1982. Climbing fibre induced depression of both mossy fibre responsiveness and glutamate sensitivity of cerebellar Purkinje cells. J. Physiol. Lond. 324, 113–134. Izawa, J., Riscimagna-Hemminger, S.E., Shadmehr, R., 2012. Cerebellar contributions to reach adaptation and learning sensory consequences of action. J. Neurosci. 32, 4230–4239. Jacobson, G.A., Rokni, D., Yarom, Y., 2008. A model of the olivo-cerebellar system as a temporal pattern generator. Trends Neurosci. 31, 617–625. Jirenhed, D.A., Bengtsson, F., Hesslow, G., 2007. Acquisition, extinction, and reacquisition of a cerebellar cortical memory trace. J. Neurosci. 27, 2493–2502. Jirenhed, D.A., Bengtsson, F., Jo¨rntell, H., 2013. Parallel fiber and climbing fiber responses in rat cerebellar cortical neurons in vivo. Front. Syst. Neurosci. 7, Article 16. Jo¨rntell, H., Ekerot, C.F., 2002. Reciprocal bidirectional plasticity of parallel fiber receptive fields in cerebellar Purkinje cells and their afferent interneurons. Neuron 34, 797–806. Jo¨rntell, H., Ekerot, C.F., 2003. Receptive field plasticity profoundly alters the cutaneous parallel fiber synaptic input to cerebellar interneurons in vivo. J. Neurosci. 23, 9620–9631. Jo¨rntell, H., Ekerot, C.F., 2011. Receptive field remodeling induced by skin stimulation in cerebellar neurons in vivo. Front. Neural Circuits 5, Article 3. Jo¨rntell, H., Hansel, C., 2006. Synaptic memories upside down: bidirectional plasticity at cerebellar parallel fiber-Purkinje cell synapses. Neuron 52, 227–238. Jo¨rntell, H., Bengtsson, F., Schonewille, M., De Zeeuw, C.I., 2010. Cerebellar molecular layer interneurons—computational properties and roles in learning. Trends Neurosci. 33, 524–532.

References

Ke, M.C., Guo, C.C., Raymond, J.L., 2009. Elimination of climbing fiber instructive signals during motor learning. Nat. Neurosci. 12, 1171–1179. Kim, J.J., Krupa, D.J., Thompson, R.F., 1998. Inhibitory cerebello-olivary projections and blocking effect in classical conditioning. Science 279, 570–572. Knolle, F., Schroger, E., Baess, P., Kotz, S.A., 2012. The cerebellum generates motor-to-auditory predictions: ERP lesion evidence. J. Cogn. Neurosci. 24, 698–706. Lenz, A., Anderson, S.R., Pipe, A.G., Melhuish, C., Dean, P., Porrill, J., 2009. Cerebellar inspired adaptive control of a compliant robot actuated by pneumatic artificial muscles. IEEE Trans. Syst. Man Cyber. B Cyber. 39, 1420–1433. Lepora, N.F., Mavritsaki, E., Porrill, J., Yeo, C.H., Evinger, C., Dean, P., 2007. Evidence from retractor bulbi EMG for linearised motor control of conditioned nictitating membrane responses. J. Neurophysiol. 98, 2074–2088. Lepora, N., Porrill, J., Yeo, C.H., Dean, P., 2010. Sensory prediction or motor control? Application of Marr-Albus models of cerebellar function to classical conditioning. Front. Comput. Neurosci. 4, 1–16. Lev-Ram, V., Wong, S.T., Storm, D.R., Tsien, R.Y., 2002. A new form of cerebellar long-term potentiation is postsynaptic and depends on nitric oxide but not cAMP. Proc. Natl. Acad. Sci. U. S. A. 99, 8389–8393. Lev-Ram, V., Mehta, S.B., Kleinfeld, D., Tsien, R.Y., 2003. Reversing cerebellar long-term depression. Proc. Natl. Acad. Sci. U. S. A. 100, 15989–15993. Llina´s, R.R., Leznik, E., Makarenko, V.I., 2004. The olivo-cerebellar circuit as a universal motor control system. IEEE J. Oceanic Eng. 29, 631–639. Mackintosh, N.J., 1974. The Psychology of Animal Learning. Academic Press Inc, London. Marr, D., 1969. A theory of cerebellar cortex. J. Physiol. Lond. 202, 437–470. Marzban, H., Hawkes, R., 2011. On the architecture of the posterior zone of the cerebellum. Cerebellum 10, 422–434. Mauk, M.D., Ruiz, B.P., 1992. Learning-dependent timing of Pavlovian eyelid responses— differential conditioning using multiple interstimulus intervals. Behav. Neurosci. 106, 666–681. Mcelvain, L.E., Bagnall, M.W., Sakatos, A., Du Lac, S., 2010. Bidirectional plasticity gated by hyperpolarization controls the gain of postsynaptic firing responses at central vestibular nerve synapses. Neuron 68, 763–775. Medina, J.F., 2011. The multiple roles of Purkinje cells in sensori-motor calibration: to predict, teach and command. Curr. Opin. Neurobiol. 21, 616–622. Medina, J.F., Garcia, K.S., Nores, W.L., Taylor, N.M., Mauk, M.D., 2000a. Timing mechanisms in the cerebellum: testing predictions of a large-scale computer simulation. J. Neurosci. 20, 5516–5525. Medina, J.F., Nores, W.L., Ohyama, T., Mauk, M.D., 2000b. Mechanisms of cerebellar learning suggested by eyelid conditioning. Curr. Opin. Neurobiol. 10, 717–724. Medina, J.F., Nores, W.L., Mauk, M.D., 2002. Inhibition of climbing fibres is a signal for the extinction of conditioned eyelid responses. Nature 416, 330–333. Menzies, J.R.W., Porrill, J., Dutia, M., Dean, P., 2010. Synaptic plasticity in medial vestibular nucleus neurons: comparison with computational requirements of VOR adaptation. PLoS ONE 5, e13182. Miall, R.C., Christensen, L.O., Cain, O., Stanley, J., 2007. Disruption of state estimation in the human lateral cerebellum. PLoS Biol. 5, 2733–2744. Montgomery, J.C., Bodznick, D., Yopak, K.E., 2012. The cerebellum and cerebellum-like structures of cartilaginous fishes. Brain Behav. Evol. 80, 152–165.

189

190

CHAPTER 7 Decorrelation Learning and the Cerebellum

Moore, J.W., Gormezano, I., 1961. Yoked comparisons of instrumental and classical eyelid conditioning. J. Exp. Psychol. 62, 552–559. Moore, J.W., Desmond, J.E., Berthier, N.E., 1989. Adaptively timed conditioned responses and the cerebellum: a neural network approach. Biol. Cybern. 62, 17–28. Mostofi, A., Holtzman, T., Grout, A.S., Yeo, C.H., Edgley, S.A., 2010. Electrophysiological localization of eyeblink-related microzones in rabbit cerebellar cortex. J. Neurosci. 30, 8920–8934. Nicholson, D.A., Freeman, J.H., 2003. Addition of inhibition in the olivocerebellar system and the ontogeny of a motor memory. Nat. Neurosci. 6, 532–537. Nicolson, R.I., Fawcett, A.J., Dean, P., 2001. Developmental dyslexia: the cerebellar hypothesis. Trends Neurosci. 24, 508–511. Nisimaru, N., Mittal, C., Shirai, Y., Sooksawate, T., Anandaraj, P., Hashikawa, T., Nagao, S., Arata, A., Sakurai, T., Yamamoto, M., Ito, M., 2013. Orexin-neuromodulated cerebellar circuit controls redistribution of arterial blood flows for defense behavior in rabbits. Proc. Natl. Acad. Sci. U. S. A. 110, 14124–14131. Paige, G.D., 1983. Vestibulo-ocular reflex and its interaction with visual following mechanisms in the squirrel monkey. I. Response characteristics in normal animals. J. Neurophysiol. 49, 134–151. Pellegrini, J.J., Evinger, C., 1997. Role of cerebellum in adaptive modification of reflex blinks. Learn. Mem. 4, 77–87. Porrill, J., Dean, P., 2007a. Cerebellar motor learning: when is cortical plasticity not enough? PLoS Comput. Biol. 3, 1935–1950. Porrill, J., Dean, P., 2007b. Recurrent cerebellar loops simplify adaptive control of redundant and nonlinear motor systems. Neural Comput. 19, 170–193. Porrill, J., Dean, P., 2008. Silent synapses, LTP and the indirect parallel-fibre pathway: computational consequences of optimal noise processing. PLoS Comput. Biol. 4, e1000085. Porrill, J., Dean, P., Stone, J.V., 2004. Recurrent cerebellar architecture solves the motor error problem. Pro. Biol. Sci. 271, 789–796. Porrill, J., Dean, P., Anderson, S.R., 2013. Adaptive filters and internal models: multilevel description of cerebellar function. Neural Netw. 47, 134–149. Prestori, F., Bonardi, C., Mapelli, L., Lombardo, P., Goselink, R., De Stefano, M.E., Gandolfi, D., Mapelli, J., Bertrand, D., Schonewille, M., De Zeeuw, C., D’Angelo, E., 2013. Gating of long-term potentiation by nicotinic acetylcholine receptors at the cerebellum input stage. PLoS ONE 8, e64828. Ramnani, N., 2006. The primate cortico-cerebellar system: anatomy and function. Nat. Rev. Neurosci. 7, 511–522. Rasmussen, A., Jirenhed, D.A., Hesslow, G., 2008. Simple and complex spike firing patterns in Purkinje cells during classical conditioning. Cerebellum 7, 563–566. Rasmussen, A., Jirenhed, D.A., Zucca, R., Johansson, F., Svensson, P., Hesslow, G., 2013. Number of spikes in climbing fibers determines the direction of cerebellar learning. J. Neurosci. 33, 13436–13440. Raymond, J.L., Lisberger, S.G., 1998. Neural learning rules for the vestibulo-ocular reflex. J. Neurosci. 18, 9112–9129. Requarth, T., Sawtell, N.B., 2011. Neural mechanisms for filtering self-generated sensory signals in cerebellum-like circuits. Curr. Opin. Neurobiol. 21, 602–608. Rescorla, R.A., Wagner, A.R., 1972. A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and non-reinforcement. In: Black, A.H., Prokasy, W.F.

References

(Eds.), Classical conditioning II: current research and theory. Appleton-Century-Crofts, New York, NY. Roberts, P.D., Bell, C.C., 2000. Computational consequences of temporally asymmetric learning rules: II. Sensory image cancellation. J. Comput. Neurosci. 9, 67–83. Roth, M.J., Synofzik, M., Lindner, A., 2013. The cerebellum optimizes perceptual predictions about external sensory events. Curr. Biol. 23, 930–935. Roy, J.E., Cullen, K.E., 2004. Dissociating self-generated from passively applied head motion: neural mechanisms in the vestibular nuclei. J. Neurosci. 24, 2102–2111. Sawtell, N.B., Williams, A., 2008. Transformations of electrosensory encoding associated with an adaptive filter. J. Neurosci. 28, 1598–1612. Schlerf, J., Ivry, R.B., Diedrichsen, J., 2012. Encoding of sensory prediction errors in the human cerebellum. J. Neurosci. 32, 4913–4922. Schmahmann, J.D., 2010. The role of the cerebellum in cognition and emotion: personal reflections since 1982 on the dysmetria of thought hypothesis, and its historical evolution from theory to therapy. Neuropsychol. Rev. 20, 236–260. Schonewille, M., Belmeguenai, A., Koekkoek, S.K., Houtman, S.H., Boele, H.J., Van Beugen, B.J., Gao, Z., Badura, A., Ohtsuki, G., Amerika, W.E., Hosy, E., Hoebeek, F.E., Elgersma, Y., Hansel, C., De Zeeuw, C.I., 2010. Purkinje cell-specific knockout of the protein phosphatase PP2B impairs potentiation and cerebellar motor learning. Neuron 67, 618–628. Schonewille, M., Gao, Z., Boele, H.J., Vinueza Veloz, M.F., Amerika, W.E., Simek, A.A., De Jeu, M.T., Steinberg, J.P., Takamiya., K., Hoebeek, F.E., Linden, D.J., Huganir, R.L., De Zeeuw, C.I., 2011. Reevaluating the role of LTD in cerebellar motor learning. Neuron 70, 43–50. Scott, M., 2013. Corollary discharge provides the sensory content of inner speech. Psychol. Sci. 24, 1824–1830. Sears, L.L., Steinmetz, J.E., 1991. Dorsal accessory olive activity diminishes during acquisition of the rabbit classically conditioned eyelid response. Brain Res. 545, 112–122. Shmuelof, L., Huang, V.S., Haith, A.M., Delnicki, R.J., Mazzoni, P., Krakauer, J.W., 2012. Overcoming motor “forgetting” through reinforcement of learned actions. J. Neurosci. 32, 14617–14622. Soetedjo, R., Fuchs, A.F., Kojima, Y., 2009. Subthreshold activation of the superior colliculus drives saccade motor learning. J. Neurosci. 29, 15213–15222. Sugihara, I., Shinoda, Y., 2004. Molecular, topographic, and functional organization of the cerebellar cortex: a study with combined aldolase C amd olivocerebellar labeling. J. Neurosci. 24, 8771–8785. Thieme, A., Thurling, M., Galuba, J., Burciu, R.G., Goricke, S., Beck, A., Aurich, V., Wondzinski, E., Siebler, M., Gerwig, M., Bracha, V., Timmann, D., 2013. Storage of a naturally acquired conditioned response is impaired in patients with cerebellar degeneration. Brain 136, 2063–2076. Thompson, R.F., Steinmetz, J.E., 2009. The role of the cerebellum in classical conditioning of discrete behavioral responses. Neuroscience 162, 732–755. Tseng, Y.W., Diedrichsen, J., Krakauer, J.W., Shadmehr, R., Bastian, A.J., 2007. Sensory prediction errors drive cerebellum-dependent adaptation of reaching. J. Neurophysiol. 98, 54–62. Voogd, J., 2011. Cerebellar zones: a personal history. Cerebellum 10, 334–350.

191

192

CHAPTER 7 Decorrelation Learning and the Cerebellum

Voogd, J., Ruigrok, T.J.H., 2004. The organization of the corticonuclear and olivocerebellar climbing fiber projections to the rat cerebellar vermis: the congruence of projection zones and the zebrin pattern. J. Neurocytol. 33, 5–21. Wang, S.S., Khiroug, L., Augustine, G.J., 2000. Quantification of spread of cerebellar longterm depression with chemical two-photon uncaging of glutamate. Proc. Natl. Acad. Sci. U. S. A. 97, 8635–8640. Widrow, B., Stearns, S.D., 1985. Adaptive Signal Processing. Prentice-Hall Inc, Engelwood Cliffs, NJ. Widrow, B., Glover, J.R., Mccool, J.M., Kaunitz, J., Williams, C.S., Hearn, R.H., Zeidler, J.R., Dong, E., Goodlin, R.C., 1975. Adaptive noise cancelling—principles and applications. Proc. IEEE 63, 1692–1716. Wolpert, D.M., Miall, R.C., Kawato, M., 1998. Internal models in the cerebellum. Trends Cogn. Sci. 2, 338–347.

Decorrelation learning in the cerebellum: computational analysis and experimental questions.

Many cerebellar models use a form of synaptic plasticity that implements decorrelation learning. Parallel fibers carrying signals positively correlate...
2MB Sizes 0 Downloads 5 Views