Training pair-housed Rhesus macaques (Macaca mulatta) using a combination of negative and positive reinforcement.

Behavioural Processes 113 (2015) 51–59

Contents lists available at ScienceDirect

Behavioural Processes journal homepage: www.elsevier.com/locate/behavproc

Training pair-housed Rhesus macaques (Macaca mulatta) using a combination of negative and positive reinforcement Eva-Marie Wergård a,b,∗ , Hans Temrin b , Björn Forkman c , Mats Spångberg a , Hélène Fredlund a , Karolina Westlund a a

Department of Comparative Medicine, Karolinska Institutet, SE-171 82 Stockholm, Sweden Department of Zoology, Stockholm University, SE-106 91 Stockholm, Sweden c Department of Large Animal Science, Faculty of Health and Medical Science, University of Copenhagen, DK.1870, Frederiksberg C, Denmark b

a r t i c l e

i n f o

Article history: Received 28 August 2014 Received in revised form 8 December 2014 Accepted 26 December 2014 Available online 30 December 2014 Keywords: Negative reinforcement Positive reinforcement Rhesus macaque Primate Training Combined reinforcement Animal behaviour management

a b s t r a c t When training animals, time is sometimes a limiting factor hampering the use of positive reinforcement training (PRT) exclusively. The aim of this study was to evaluate the effects of a combination of negative and positive reinforcement training (NPRT). Twenty naïve female Rhesus macaques (Macaca mulatta) were trained in 30 sessions with either PRT (n = 8) or NPRT (n = 12) to respond to a signal, move into a selected cage section and accept confinement. In the NPRT-group a signal preceded the presentation of one or several novel, and thus aversive, stimuli. When the correct behaviour was performed, the novel stimulus was removed and treats were given. As the animal learned to perform the correct behaviour, the use of novel stimuli was decreased and finally phased out completely. None of the PRT-trained animals finished the task. Ten out of 12 monkeys in the NPRT-group succeeded to perform the task within the 30 training sessions, a significant difference from the PRT-group (p = 0.0007). A modified approach test showed no significant difference between the groups (p = 0.67) in how they reacted to the trainer. The results from this study suggest that carefully conducted NPRT can be an alternative training method to consider, especially when under a time constraint. © 2015 Published by Elsevier B.V.

1. Introduction Training methods should ideally be practical and efficient, yielding fast responses without compromising the welfare of the animal being trained. Positive reinforcement training (PRT), the addition of a reward following a desired behaviour (Laule, 2003; Skinner, 1938). is often considered to be better for the welfare of animals compared with other training methods and is therefore the main method used in animal training today (Prescott et al., 2005; Ramirez, 1999) (see Table 1 for more explicit definitions and explanations of training methods). In our facility, we use PRT as our standard training procedure. Nonetheless, we have had limited success in using PRT to obtain full cooperation in capture procedures on newly arrived monkeys within the time available. It is known that training with PRT often initially requires some time investment before becoming efficient

Abbreviations: NPRT, Negative and positive reinforcement training; PRT, Positive reinforcement training. ∗ Corresponding author. Tel.: +46 8 524 85978; fax: +46 8 524 278519. E-mail address: [email protected] (E.-M. Wergård). http://dx.doi.org/10.1016/j.beproc.2014.12.008 0376-6357/© 2015 Published by Elsevier B.V.

(Perlman et al., 2012), and in our situation there is not always enough time available to rely solely on PRT. We thus need to consider alternative training methods to obtain the desired behaviour without compromising the animals’ welfare. One such alternative is negative reinforcement training. This involves the removal of an aversive stimulus contingently on the animal displaying the correct behaviour (Vargas, 2009 Table 1). By performing the behaviour again, the animal can avoid aversive stimulation. It is the termination of the negative reinforcer that acts reinforcing on the correct behaviour and will influence its future recurrence – timing is therefore crucial (Kazdin, 2001), as in all training. Negative reinforcement is often misunderstood by animal trainers (McLean, 2005). In addition, some authors have advised against using negative reinforcement, since it involves exposing the animals to aversive stimuli (Reinhardt, 1992; Laule et al., 2003). This may give the animal a potential unpleasant experience of the training. However, combining such training with PRT is suggested to reduce the potential aversiveness of the situation (McKinley, 2004; Warren-Smith and McGreevy, 2007). Using combined reinforcement, NPRT, results in both the removal of the aversive stimulus and the subsequent addition of a reward contingent on the correct behaviour. Since an aversive stimulus is followed by the presen-

52

E.-M. Wergård et al. / Behavioural Processes 113 (2015) 51–59

Table 1 Definitions of terminology found in this paper. Positive reinforcement training (PRT) Primary reinforcer Secondary reinforcer Negative reinforcement training (NRT)

Aversive stimuli

Combination of negative and positive reinforcement training (NPRT) Approximation step Desensitisation Counter conditioning (active desensitisation) Habituation (passive desensitisation) Systematic desensitisation

Signal (predicting or response eliciting)

Least reinforcing scenario (LRS)

As the animal responds correctly a desired reward is delivered; the response thus becomes more likely to recur. The animal repeats the behaviour in order to obtain the reward (Laule et al. 2003). An inherently rewarding stimulus that satisfies biological drives, such as e.g. food(Egger and Miller, 1962). A stimulus that has gained significance to the animal through association with primary reinforcers (Egger and Miller, 1962), e.g. a clicker. By removing an aversive stimuli contingent upon the animal performing a specific behaviour, that specific behaviour is reinforced and the probability of it occurring again will increase. The repetition of behaviour occurs as the animal tries to avoid the aversive(Ramirez, 1999; Vargas, 2009). An aversive stimulus is anything the animal moves away from, i.e. wants to avoid or escape from. They may range from low-intensity to painful stimuli, including conditioned stimuli (Carter and Wheeler, 2005) and, as in this study, novel stimuli to which animals inherently tend to keep a distance (Misslin and Ropartz, 1981). When combining positive and negative reinforcement, correct behaviour is followed by the removal of the aversive stimulus and the subsequent addition of a reward. Behaviour is reinforced, however it is unclear whether the behaviour change is driven by negative or positive reinforcement - or a combination. The progressive steps in training,reinforcing behaviour incrementally one step at a time until the desired behaviour is completed (i.e. shaping, McMillan et al., 2014). A process in which the animal’s perception of a certain event is changed to a more neutral one with the help of time and/or experience (Ramirez, 1999). An active desensitization technique where the trainer associates the aversive stimulus or event with something the animal desires, thus lessening the impact of the aversive stimulus (Chance, 2009). A process in which the animal is repeatedly exposed to a stimulus in order to decrease its response when exposed. No reinforcement is involved in the process (McMillan et al., 2014). Gradual exposure to the aversive stimulus, always below response threshold, enabling the animal to gradually get used to the stimulus (Wolpe, 1961). Systematic desensitization is a type of habituation procedure, often combined with counter conditioning (ref). A sound or any other distinct stimulus that is presented in order to inform the animal that something is either going to happen (predicting signal; Bassett and Buchanan-Smith, 2007) or to elicit a certain response as a result of a learned association (also known as cue, Ramirez, 1999). If the animal performs an incorrect behaviour the trainer pauses for 3?5 seconds before continuing the training session, i.e. the least reinforcing scenario has been provided (Ramirez, 1999). This procedure may reduce the likelihood of the unwanted behaviour being repeated.

tation of a desired reward, this procedure can be construed as counter conditioning (Table 1; Yin, 2009), and Chance (2009) purports that this active pairing of an aversive event followed by a rewarding stimulus gradually decreases the ability of the aversive event to adversely affect the animal. Thus, we propose that there is a potential difference between using negative reinforcement solely and a combination of reinforcement, in terms of how the aversive stimulus is perceived by the animal. Stacey et al. (1999) included negative reinforcement in their normal PRT to successfully train a common bottlenose dolphin (Tursiops truncatus) to be restrained and injected for medical reasons. In this case, they positively reinforced the dolphin as long as it participated in the session, but if it refused a net was used to guide the dolphin to the selected area where it was once again positively reinforced. Thus, the desired behaviour resulted in both the removal of the aversive stimulus and the addition of a reward. The choice of aversive stimulus in negative reinforcement is delicate and warrants ethical consideration. The negative reinforcer may range from a light aid (McLean, 2005) to highly intense, painful stimuli (McGreevy and Boakes, 2011). Sometimes it is difficult to foresee whether the addition or removal of a specific stimulus will reinforce a behaviour. From a trainer perspective, the behaviour of the animal will indicate whether or not a stimulus functions as a reinforcer. If the animal increases a targeted behaviour in order to avoid an object, that object is negatively reinforcing the behaviour (Vargas, 2009). “Aversive” denotes something the animal wants to avoid (Ulrich et al., 1964), and it does not have to be frightening or painful (Innes and McBride, 2008). Novel objects are often initially aversive (Misslin and Ropartz, 1981), inducing neophobic reactions (Misslin and Cigrang, 1986) a phenomenon demonstrated in e.g. rodents, humans and non-human primates (Corey, 1978). This suggests that novel objects could potentially be used as negative reinforcers. Moreover, a signal preceding an aversive event can be used to further decrease potential discomfort, as aversive events become predictable and even avoidable if the animal performs the correct

behaviour (Bassett and Buchanan-Smith, 2007). When using negative reinforcement on horses, McGreevy and Boakes (2011) suggest the use of a signal before the pressure of the bit in the horse’s mouth is increased, thus giving the horse the chance to respond correctly before the negative reinforcer even is applied. A training regime that involves aversive elements may negatively affect the relationship between the animals and the trainer, which could cause problems for future interactions (McKinley, 2004). Since combined reinforcement training, i.e. NPRT, involves the avoidance/removal of an aversive stimulus, the aversive events could potentially be associated with the trainer. To test if the aversiveness of the training situation becomes associated with the trainer, a modified approach test may be used (for a review of human fear tests see Forkman et al., 2007). If the reaction of the monkeys towards the trainer is not affected, we propose that the level of aversion experienced in the training situation is small enough not to contaminate the overall interaction with the trainer. As mentioned, in laboratory settings it is sometimes crucial to obtain results within a limited time period, as the animals are predestined for biomedical experiments. We had the possibility to use the monkeys quarantine period of three months, a short and valuable opportunity, to prepare them for their participation in the upcoming experiments. The aims of this study were therefore firstly to investigate if NPRT, using novel objects as negative reinforcers, was more efficient than PRT alone when training monkeys to perform a specific behaviour, and secondly to investigate if such training methods affected the response towards the trainer. We did this by comparing two groups of Rhesus macaques (Macaca mulatta), one group being trained solely with PRT and one with NPRT. The central task for the monkeys was to move into a selected section of their cage and accept the gate being closed. For each of the two training groups we evaluated (1) how many individuals that performed the behaviour within a given time frame and (2) if the monkeys’ response towards the trainer was affected by the training methods used.


2. Material and methods This study was completed over 94 consecutive days, from the arrival of the monkeys in December 2010 until the last modified approach test had been completed in February 2011. It was conducted during the monkeys’ quarantine period of three months before they went into a biomedical experiment. The Stockholm North Ethical Committee on Animal Experiments approved the study (no. N217/10). 2.1. Animals and management In this study 20 four-year old female Rhesus monkeys (M. mulatta), originating in China, at least F2 generation, were used. Before arrival in Sweden and the start of the study, they had been housed in quarantine in The Netherlands for six weeks. The animals were completely naïve to training and had limited experience in human interaction. All monkeys had their own individual name and number for identification. During our study the monkeys were housed indoors in a biosafety level 3 environment. They were housed in pairs, with access to daylight in cages measuring 2 m2 (floor area) and 2 m high (Fig. 1). The grids of the cage enabled the monkeys to have both visual contact with other individuals in the same room and tactile access to their closest neighbours. The ambient temperature was 22 ◦ C (+/−0.5 ◦ C) and the relative humidity at least 40%. In order to avoid one training method influencing the outcome of the other, the positive reinforcement (PRT) group (n = 8) and the negative and positive reinforcement (NPRT) group (n = 12) were situated in two different rooms. Both rooms were similar in interior, design and environment. In the PRT-room there were 11 monkeys, grouped in four pairs and one trio. As only pair-housed animals were included, the trio did not take part in this study. All monkeys were given pellets (Special Diet Services (NDS), UK) in the morning, fruits and vegetables in the afternoon, and all were

53

similarly enriched according to the facility’s normal enrichment schedule with e.g. tennis balls, treat puzzles and fir cones. All cages also had bedding material on the floor. Experimental training (see below) was conducted about one hour after the morning feeding sessions. In the afternoon, identical husbandry training sessions were conducted in both groups according to standard facility procedure. During these PRT sessions a clicker was used to mark the correct behaviour and treats were given as reinforcers (Laule et al., 2003). All sessions were conducted by the same person, who had undergone our internal animal trainer education and had an understanding of the theoretical background of training as well as proper training terminology (Table 1). As the same trainer performed all training sessions, this enabled the trainer to build a relationship with the animals in both treatment groups through the establishment of a positive reinforcement history (Minier et al., 2011). The husbandry behaviours, also known as cooperative behaviours (Ramirez, 1999), trained during the 22 afternoon sessions were taking treats from the trainer’s hand, touching a target and stationing in the small protruding den-boxes (Fig. 1) according to our facility’s Standard Operating Procedures. After husbandry training, afternoon fruit and vegetables were distributed. Data are presented on the number of animals that had ever taken treats out of the trainer’s hand, touched a target versus stationed for at least 30 s in the den-box. In order to differentiate the husbandry training from the experimental training (see Section 2.2), the trainer performed the experimental training in the morning wearing a yellow scrub and the husbandry training in the afternoon wearing a blue scrub. Other staff working with the monkeys wore white scrubs in order not to interfere with the training study. During the study period there were nine occasions when there were preparations for a later biomedical experiment, where some or all monkeys were sedated and blood sampled. After such occasions we did not train or test the monkeys for at least 24 h. The trainer did not take part in these experimental procedures. 2.2. Experimental set up After arrival to the laboratory, the monkeys were given ten days of acclimatisation (Fig. 2). During this period they were given time to habituate to their new environment and the everyday activi-

Fig. 1. Biosafety level 3 cage housing two animals. During the experimental training, the animals were trained to move into the upper left corner and accept being enclosed through the use of a horizontal floor board and a vertical gate board. Animals entering the balcony box at chest level in the modified approach test received a score of six (see Section 2.3).

Fig. 2. Experimental set-up with the two experimental training regimes explained.

54


ties in the room, such as cleaning, feeding and the provisioning of enrichment. No systematic human interaction was conducted. Thereafter the trainer performed seven desensitisation sessions (Fig. 2). This was done using a combination of counter conditioning (Chance, 2009) and systematic desensitisation (Wolpe, 1961) (Table 1). During these sessions the trainer offered treats to the monkeys. Some did not approach the trainer, in which case the trainer placed the treats on the cage edge and stepped away the distance required until at least one individual approached and ate the treats. That distance was gradually shortened, increasing the contact between trainer and monkey contingently on the response of the most courageous individual. Desensitisation was primarily a chance for these newly arrived monkeys to habituate to the trainer coming in and offering them treats. After seven sessions, three out of eight monkeys in the PRT-room and four out of twelve in the NPRT-room took treats out of the trainer’s hand. All individuals accepting treats, except one, were dominant in their respective pairs. 2.2.1. Experimental training Following desensitisation, the two treatments diverged (Fig. 2), and experimental training consisted of 30 sessions during a period of nine weeks (ranging from one to five sessions a week). The goal of the training was to teach the monkeys to respond to a signal, a keyboard signal for the NPRT-group and a verbal signal for the PRT-group, move into a selected section of the cage and accept being enclosed. In order to achieve this, two boards were needed; one vertical gate board and one horizontal floor board, both boards separating the upper left cage section from the rest of the cage (see Fig. 1). A variety of fruits, nuts and pasta were given as positive reinforcers in both groups. All experimental training was started one hour after the monkeys had been fed in the morning. The contrast between reinforcer quality and morning feed probably helped motivate the animals to participate in experimental training, and as suggested by McKinley et al. (2003) the recent feeding may have reduced the potential aggressiveness of the dominant animal. The trainer alternated which treatment group that was trained first every other session, i.e. some mornings the trainer started to train the PRT-group and some mornings the NPRT-group. In the first approximation step, the trainer inserted all the floor boards in the room, and distributed desired treats on the boards. The animals were given the opportunity to voluntarily explore the new feature and discover the treats during 10–15 min. It was noted whether treats were eaten, and the session ended when boards were taken down and leftover treats removed. In the PRT-group, this was repeated until all treats were eaten before moving on to the next approximation step, since mastering this behaviour was a prerequisite for continued training using positive reinforcement. In contrast, in the NPRT-group the trainer proceeded to the next approximation step after eight sessions regardless of whether the monkeys would venture out on the boards and take the treats, since training using combined reinforcement did not require voluntary shifting to the selected section. A similar approach was taken with the next approximation step, with floor- and gate boards mounted. The gate board was inserted half way into the cage. NPRT-animals were given five 10–15 min sessions to explore and retrieve the treats placed upon the floor board before moving on to the next approximation, whereas PRT-pairs only advanced to the next step contingent on one individual retrieving the treats in the first two approximation steps (see Section 2.2.1.1. for further description of the PRT-training). It should be noted that the first two approximation steps were similar for the two treatments groups, and that the actual stimulus-response training for the two groups started after the two approximation steps involving desensitisation to the boards. The reason why this approach was taken, spending 13 sessions desensitizing the animals to the apparatus and trainer rather

than immediately starting the operant training, was to reduce the aversiveness of the training procedure for the NPRT- group. Generally, feed motivation was high and the animals either took all treats or none, in the early stages of training. 2.2.1.1. Positive reinforcement training (PRT). Once PRT was initiated, after the desensitisation to the boards in the first two approximation steps, training was performed using a clicker to mark when the monkey performed the correct behaviour (secondary reinforcer) followed by the delivery of treats (primary reinforcers), (Table 1; Laule et al., 2003). This stimulus-response training began in the approximation step 3, named “Responding to signal by moving into cage section”. Since the monkeys were pair housed and naïve to training, only one individual in each group could be trained in earnest initially. Due to the layout of the cage, the monkeys could not be separated and trained individually, so training progressed mainly with the dominant individual. The animals were shaped using successive approximations (Domjan, 2005) to orient towards a station on the floor board and respond to a verbal signal. The station was an exact location on the left part upon the floor board in the specific cage section. In the beginning, every time the monkey showed an intention of moving towards the station it was reinforced with a click followed by a treat. Successively the approximation step was increased, so the monkey needed to come closer and closer to the targeted location before it could receive a click and a treat. In the beginning the reinforcement was delivered at the gate board–the right part of the selected cage section. As the monkey became more and more familiar with the situation, the trainer reinforced in accordance with the principle of “click for behaviour and feed for position” (personal communication, Bob Bailey, 2009), marking the correct behaviour and delivering the treat at the actual targeted station. Once the monkey responded to the verbal signal from the trainer and moved into position reliably, the trainer started manipulating the gate board. Shaping the monkey to station and accept manipulation of the gate board involved the trainer touching the board, rattling it gently without moving it, and gradually closing it. If the monkey broke from position and left the location before reaching the current criterion, the trainer removed her hand from the board, performed a Least Reinforcing Stimulus (LRS; Ramirez, 1999) and gave the verbal signal again. The trainer strove to choose criteria that allowed the monkey to remain below the threshold above which it would fail; if the current criterion resulted in the monkey breaking from position, the criterion was lowered so that the animal could succeed. During PRT, raising criteria was contingent on the monkey succeeding and performing correctly on the previous level. Targets were not used during the training due to the importance of target training during the afternoon sessions for all individuals, including the NPRT-group. All monkeys needed to be taught target training in order to perform other husbandry behaviours where targets could be useful, e.g. to restrict their hand movements, such as during later training of voluntary injection, which is risky if the monkey attempts to grab the syringe. If the PRT-group had been trained with the help of targets twice a day, morning and afternoon sessions, this would have interfered with the experimental design and a comparison between the two treatment groups would not have been be possible. 2.2.1.2. Combined reinforcement training (NPRT). In the NPRTgroup, a combination of both negative and positive reinforcement was used, without the use of a clicker. We chose three different stimuli as negative reinforcers to prompt the monkeys to move into the selected section of the cage: a yellow bucket, a chain and a glove – all of which were novel objects for the monkeys. Animals’ reactions to novelty often follow the pattern of initial avoidance, followed by exploration and finally indifference (Corey,


1978). We took advantage of this initial neophobic response (moving away from novelty) by presenting the monkeys with unknown objects to initiate movement into the selected section of the cage. Since the objects were removed contingently on moving into the selected cage section, the monkeys never had the opportunity to explore them. Moving into the selected section of the cage was also positively reinforced with treats. In contrast to the PRT-pairs, both monkeys responded to the novel stimulus and performed the behaviour of moving away from the objects, sometimes simultaneously, sometimes one after the other. We taped the three novel objects on the end of three different broom stick handles (120 cm long). The trainer stood 40 cm from the centre of the cage facing forwards, and avoiding eye contact. The trial started when the trainer pressed a toy keyboard delivering a signal unique to the pair currently being trained. Thereafter, the trainer waited 3 s, giving the monkeys the opportunity to respond and move into the selected cage section, before starting to expose them to the first novel stimulus (the yellow bucket). During a fivesecond interval, the bucket was slowly raised, approximately 40 cm from the cage, and 10–20 cm from the right cage wall. From the monkeys’ perspective, the objects appeared from underneath the cage and moved vertically until they stopped outside the top right corner of the cage. They responded to this novel stimulus by moving away from it into the upper left cage section. As soon as the animals moved, the trainer immediately lowered the pole and delivered multiple treats in the upper left cage section. Both boards were then taken away and more treats were offered, this time in both of the upper cage sections, concluding the experimental training session of the day. If the monkeys did not respond to the presentation of the yellow bucket by moving into the selected cage section, the pole containing the chain was also raised. If there was no reaction to the bucket and chain poles while they were being kept still at the top of the cage, the chain pole was gently shaken once or twice to produce a small rattling sound. If one or both individuals still did not move away from the compound stimulus, the pole containing the glove was raised in a similar manner. If considered necessary, the trainer slowly moved the stimuli towards the cage until touching it (this most escalated scenario occurred one to four times in three out of the six groups). The monkeys were given three opportunities to condition to the scenario of momentarily closing the gate board. When working at this step, the trainer would give the keyboard signal, wait for the behaviour of moving into the selected section of the cage, and present the stimuli if needed. When both individuals had moved into the correct section, the stimuli were removed, the gate board closed and immediately opened, and treats delivered as described above.

55

Table 2 The number of monkeys in each training group, NPRT vs. PRT that succeeded at least once during the training period. Success involved responding to a specific signal, entering the selected section of the cage and accepting the gate board being closed (Fisher’s Exact Test, p = 0.0007).

Behaviour completed Behaviournot completed

NPRT n = 12

PRT n = 8

10 2

0 8

When using negative reinforcement, it is important that the animals respond to the aversive stimulation by performing the desired behaviour (Stacey et al., 1999). In all instances of this training, all animals eventually moved to the desired compartment, but the degree of stimulation was quite variable (Table 3). However, it should be emphasised that also within the use of the three different negative reinforcers, they were only raised as much as needed for the monkeys to respond and move. Sometimes only the hand movement of starting to raise the bucket was necessary to obtain the complete behaviour – the monkeys moving into the selected cage section. 2.3. Modified approach tests Modified approach tests were performed in order to evaluate potential changes in the monkeys’ reactions to the trainer. A baseline modified approach test was performed in the afternoon on the second day of desensitisation to the floor board (Fig. 2), before treatments diverged. Two days after the 30th experimental training session, two more modified approach tests were made on consecutive days. The scores in the modified approach test ranged from 1 to 6. At the start of the test, the trainer stood in front of the cage and assessed where the monkeys were located. It was recorded if they were out of sight (score 1), located towards the ceiling in the rear part of the cage (score 2), remaining at a distance lower down (score 3) or moving towards the trainer (higher scores). Next, the monkeys were offered treats from the trainer’s hand and were given 30 s to retrieve them (score 6 if they took the treat from within the protruding balcony box (Fig. 1); score 5 if from outside the box). If neither of the monkeys took the treat, it was placed on the bars and the trainer stood passively waiting in front of the cage, avoiding eye contact, for an additional 30 s. It was recorded if they took the treats (score 4) or not within the selected interval. 2.4. Statistical methods The statistical package used was IBM SPSS statistics 21.

Table 3 The number of desensitisation sessions, times thatthe bucket (B), the bucket and chain (B + C) and all three negative reinforcers (bucket, chain and glove (B + C + G))were used inthe NPRT-group; additionally, the number of occasionseach pair in the NPRT-group completed the behaviour: responded to the keyboard signal without the use of negative reinforcement and accepting confinement in the selected cage section. Desensitisation

Negative reinforcement (NR) B+C+G

Pair 1 Pair 2 Pair 3 Pair 4 Pair 5 Pair 6 Summary % of experimental training sessions

20 6 20 2 20 0 20 4 20 1 20 0 120 13 62% 7% NR used in 31 % of experimental training sessions

B+C 1 3 2 3 0 1 10 5%

Behaviour completed B 5 6 7 3 7 8 36 19%

1 2 5 0 2 5 15 8%

56


2.4.1. Training outcome To investigate the effectiveness of the two training methods, we performed a Fisher’s Exact Test of the number of monkeys from each training group that had reached the final goal within the training period of 30 training sessions, i.e. responded to a specific signal and entered the selected section of the cage and accepted the gate board being closed. The NPRT-monkeys were always scored according to the least cooperative animal, to avoid overestimating the performance of the least compliant individual. 2.4.2. Modified approach tests A Wilcoxon Matched-Pairs Signed-Rank test was performed to analyse if there was any change in the individual monkey’s reaction towards the trainer during desensitisation and the mean of the two post-training modified approach tests. For each individual, we also calculated the difference in score. A Mann Whitney U-test was then conducted comparing the relative change in responses between the two groups. We chose nonparametric statistics because the sample sizes were small and we could not assume that variables were normally distributed.

Fig. 3. Median values (1st and 3rd quartiles) from the modified approachtest before the training methods diverged and after the training period for NPRT (n = 12) and PRT (n = 8). The scores ranged between 1 (out-of-sight) to 6 (retrieving treats from the trainer?s hand).

4. Discussion 3. Results 3.1. Training outcome Significantly more monkeys trained with negative and positive reinforcement combined (NPRT) compared to monkeys trained only with positive reinforcement (PRT) responded to a specific signal, moved into a selected section of the cage and accepted confinement on at least one training occasion within the 30 training sessions (Fisher’s Exact Test, p = 0.0007; Table 2). Of the occasions where novel stimuli were needed, using only the yellow bucket was sufficient to move the animals 36 times (19%). Negative reinforcers were used in total 31% of the sessions completed for the NPRT-group, including bucket, chain and glove. On average each pair was trained for 6 min during each of the 30 training sessions. Of the monkeys trained using PRT, only two of eight individuals reached approximation step four (stationing on cue, gate manipulated; Fig. 2). The trainer was never able to totally close the gate board and complete the behaviour without the monkeys leaving their station. However, these two individuals reliably went to their station when the verbal signal was given. On average each pair was trained for seven minutes during each of the 30 training sessions. There were no significant differences in the results of the afternoon husbandry training between the two treatment groups. 1) Taking treats from the trainer’s hand: six of 12 NPRT-monkeys versus five out of eight PRT monkeys (Fisher’s Exact Test, p = 0.67). 2) Stationing in den box: three out of 12 individuals from the NPRTgroup versus four out of eight from the PRT-group (Fisher’s Exact Test, p = 0.36). 3) Touching a target: two out of 12 NPRT-monkeys versus two out of eight PRT-monkeys (Fisher’s Exact Test, p = 1.00). 3.2. Modified approach tests In the NPRT-group (n = 12), the response of the monkeys towards the trainer did not change significantly from the baseline to the post-experimental period (Wilcoxon Matched-Pairs Signed-Rank test: z = 1.608, p = 0.108). In the PRT-group (n = 8), the monkeys were significantly closer to the trainer after than before the training period (Wilcoxon Matched-Pairs Signed-Rank test: z = 2.041, p = 0.041) (Fig. 3). There was no difference in the relative change in responses between the two groups (Mann Whitney U-test: U = 42.5, p = 0.67).

In our study, ten out of twelve monkeys trained with a combination of negative and positive reinforcement training (NPRT) achieved the goal of responding to a specific signal, moving into a selected section of the cage and accepting confinement within the training period. In contrast, none of the monkeys in the group trained solely with positive reinforcement training (PRT) completed the behaviour within the allotted time. Novel and therefore aversive stimuli were used initially in the NPRT-group, but had no negative effect on the monkeys’ responses towards the trainer. Relatively few studies have used a combination of negative reinforcement training and PRT. Christensen (2012) trained horses to approach a feeding container surrounded by novel objects. One group was trained with negative reinforcement training (the pressure of a rope and halter, a gentle whip tap on the shoulder of the animal). This group was compared with a ‘voluntary approach group’. Both groups were positively reinforced when reaching the feeding container. Christensen (2012) concluded that a negatively reinforced approach to novel objects facilitated object habituation and was more efficient than a voluntary approach. Even though the stress response initially increased in the “negative reinforcement” group, the difference between the two groups of horses disappeared after the first test round. In contrast, Hendriksen et al. (2011) found that horses trained with PRT obtained faster training results than those trained with only negative reinforcement. In Hendriksen et al.’s study, horses showing trailer-loading problems were randomly assigned to either negative reinforcement training or PRT. The negative reinforcers consisted of various degrees of pressure while the PRT involved clicker and target training to teach the horses to move into the trailer. One reason for the different results of Hendriksen et al. (2011) versus Christensen’s and our study could be the use of combined reinforcements. In Christensen’s and our study, NPRT was used in training, i.e. our training schedule included positive reinforcement as well. An additional difference was that in our study monkeys were naive to training and newly arrived, while the horses in the study by Hendriksen et al. (2011) were between 7–20 years old. We may assume that they had established relationships with humans and had already learned avoidance behaviours, which may confound reinforcement studies. Other studies have also shown that complementing the use of PRT with negative reinforcement training may bring faster results. As mentioned earlier, this applies to the dolphin in Brookfield Zoo where they needed to restrain the animal quickly for


medical reasons (Stacey et al., 1999). In addition, in a study of rhesus macaques, Bliss-Moreau et al. (2013) used NPRT when chair training monkeys, and the use of negative reinforcement became unnecessary over time for most animals in the study. The results of both Christensen (2012) and Bliss-Moreau et al. (2013) indicate that negative reinforcement could be restricted to initial use and then gradually decreased. In our study, ten out of twelve NPRTmonkeys responded to the keyboard signal without the use of aversive stimuli by moving into the selected compartment on at least one occasion during the experimental period. However, only six individuals performed this behaviour on the last experimental training occasion. It seems fair to assume that they were still in the process of learning the association. In contrast, none of the PRT-monkeys completed the behaviour during any of the experimental training sessions. We propose that compliance would have increased over future training sessions in both groups. Note that in the primate literature, negative reinforcers are described as highly aversive, such as squeeze back cages being moved, being chased by humans or the sight of a net (Prescott and Buchanan-Smith, 2007). In this study, another, less intrusive, approach was taken, and proved successful. We used novel objects mounted on a stick and presented slowly at some distance (typically about 120 cm) from the animals. Furthermore, the monkeys in our study were housed in pairs. This should not have affected the training of the monkeys in the NPRT-group, since both monkeys could be trained simultaneously under similar conditions. In the PRT-group, the presence of a dominant female sometimes prevented the subordinate female from taking part in the training. However, even if this could have affected the training of the monkeys it could not explain that we found a difference between the two groups, since none of the monkeys in the PRT-group completed the task. The second aim of our study was to determine if the two training techniques, NPRT and PRT, affected the relation to the trainer in different ways. Even though NPRT was a more effective method, it involved aversive events and could therefore have potentially negative consequences for the relation between the trainer and the monkey. We assessed the relation between the animal and the trainer by measuring the monkeys’ responses to the trainer when offering treats (Waiblinger et al., 2006). Davis, (2002) suggests that humans act as “walking conditioned stimuli” to specific events, pleasant or aversive, for animals. The monkeys’ responses towards the trainer could then be an adequate measure of how the animals perceived the training. For instance, Bloomsmith et al. (1997) showed that Chimpanzees (Pan troglodytes) preferred individuals who they associated with earlier pleasant experiences. In addition, McKinley (2004) showed that if a specific human had been associated with aversive experiences, common marmosets (Callithrix jacchus) showed signs of both fear and aggression in later contacts. Furthermore, Sankey et al. (2010) showed that ponies, which had learned to move backwards with negative reinforcement, exhibited less contact seeking behaviours towards the trainer compared with those that had been trained with PRT. However, judging from the description of their PRT procedure, it involved stepping towards the pony and could therefore be construed as NPRT. In this scenario, stepping back was negatively reinforced by increasing distance to the human, and positively reinforced by a food reward. There was thus an element of negative reinforcement in both procedures. Hence, it is unclear whether the difference in contact seeking behaviours after training was due to the food rewards or to the difference in intensity of the aversive stimulus–agitation of a riding stick in front of the head in the NRT technique versus a step forward in the “PRT” procedure. The modified approach tests showed that there was a significant increase in the contact seeking behaviour towards the trainer in the group trained with PRT only, while increased contact seek-

57

ing behaviour was only a tendency in the NPRT-group (p = 0.11). Furthermore, there was no significant difference between the relative changes in responses toward the trainer between the two groups of monkeys. This similarity between the two treatment groups could perhaps be explained by the fact that much of the overall training (experimental and husbandry training combined) in the NPRT-group actually consisted of PRT (used in combination reinforcement during experimental training and solely during husbandry training). The monkeys then had many opportunities to associate the trainer with something pleasant. It is also noteworthy that the animals did not show any overt signs of anxiety possible for us to detect during the actual NPRT. In addition, the baseline response test was conducted after nine sessions where the monkeys had been exposed to counter conditioning and systematic desensitisation. We thus expect that some improvement in relationship had occurred already at the baseline response test for both groups, which may have protected the NPRT-group from fear learning.

4.1. Ethical considerations This experimental study compared monkeys trained systematically using two different training regimes: either pure PRT or combined reinforcement. In many training situations, modern trainers choose to use the least invasive effective method (LIEBI) to teach a behaviour (O’Heare, 2009). If PRT is not effective, other techniques may thus be attempted to reach the training goal. Knowledgeable animal trainers may thus start training a behaviour using PRT, and switching techniques as needed if behavioural goals are not met (McMillan et al., 2014). Care should also be taken to problem solving, perhaps reassessing PRTprotocols, before switching to a potentially more aversive technique. To increase the predictability in the occurrence of the novel objects, we used a keyboard signal that preceded the negative reinforcer(s). Several studies, for instance Weiss (1971) and Bassett and Buchanan-Smith (2007), stress the importance of predictability of aversive events and recommend the use of a signal preceding them. In the fourth approximation step of the shaping protocol of the NPRT-group, the aversive stimulus was reduced as the monkeys understood the keyboard signal and performed the correct behaviour. Thus, the possibility to control and predict the aversive stimuli in combination with rewarded correct responses could have resulted in the achievement of the NPRT-monkeys in our study. In our study we tried to decrease the aversiveness of the NPRT by:

• Building relationships between trainer and monkey before introducing any aversive stimuli, thus reducing the risk of fear conditioning, (e.g. Misslin and Cigrang, 1986; de Passillé et al., 1996; Grandin, 1997). • Conducting “pure” PRT-sessions outside the NPRT-sessions, further building the relationship between the trainer and the monkey. • The same trainer preparing the boards in the same way and at the same time every day. • gradually letting the animals get familiar with the experimental set-up through acclimatisation and desensitisation. • Using low-intensity aversive stimulation. In contrast to other authors, we did not use electric shocks, squeeze-back cages or loud noises (reviewed in Bassett and Buchanan-Smith, 2007) but simply the presentation of novelty and exploiting the initial neophobic reaction of the animals in combination with positive reinforcement. Our procedure did not result in tangible anxiety.

58


• Always offering a reward when the monkeys performed the correct behaviour, thereby counter conditioning the aversive elements of the process. • Sounding a specific signal before the first aversive stimulus was presented and using unique signals for every animal pair. • Presenting the monkeys with the opportunity to avoid the aversive stimuli by immediately responding to the keyboard signal. The learning of a specific signal may have facilitated the transition from the initial ‘escape response’ to an ‘avoidance response’. Note that the terms ‘escape’ and ‘avoidance’ are technical (Hineline, 1977) rather than descriptive and that the monkeys moved away from the aversive stimuli without signs of fear or anxiety, as if the stimuli were disturbing rather than frightening. This is in line with McLean’s (2005) suggestion that if negative reinforcement training is used correctly, only a minimum of aversive stimuli is needed to obtain the desired behaviour. Furthermore, combining a signalled low-intensity negative reinforcer with PRT may further facilitate learning and minimize discomfort. As a further refinement, to lessen the impact of potentially aversive stimuli, we suggest to assess each individual’s response to PRT and whether training progresses timely (see e.g. McMillan et al., 2014) before using aversive elements in training. Our study suggests that training may be tailored to include combined reinforcement for animals that otherwise would not achieve training objectives using pure PRT. Trainers should flexibly adjust their training decisions to individual animals. It could also be mentioned that the animal technicians described the NPRT-monkeys to be more easily restrained and calmer compared to the PRT-group during the biomedical experiments following our training study. 5. Conclusion When training animals, PRT should be the standard procedure. However, under certain conditions when time is a limiting factor, a combination of positive and carefully conducted negative reinforcement could be more efficient without risking the welfare of the animals (e.g. Stacey et al., 1999; Christensen, 2012). In our study the monkeys in the NPRT-group were more successful in achieving the goal and the training did not seem to adversely affect the relationship with the trainer. Acknowledgements We thank The Swedish Research Council (521-2010-3820) for funding. Maria Valsjö and Sandra Söderberg and the other animal technicians at Astrid Fagræus Laboratory gave valuable assistance throughout the experiments. References Bassett, L., Buchanan-Smith, H.M., 2007. Effects of predictability on the welfare of captive animals. Appl. Anim. Behav. Sci. 102, 223–245, http://dx.doi.org/10.1016/j.applanim.2006.05.029. Bliss-Moreau, E., Theil, J.H., Moadab, G., 2013. Efficient cooperative restraint training with rhesus macaques. J. Appl. Anim. Welf. Sci. 16, 98–117. Bloomsmith, M.A., Lambeth, S.P., Stone, A.M., Laule, G.E., 1997. Comparing two types of human interaction as enrichment for chimpanzees. Am. J. Primatol. 42, 96–104, http://dx.doi.org/10.1080/10888705.2013.768897. Chance, P., 2009. Learning and behavior: Active learning, Edition 6. CA: Wadsworth, Belmont, pp. 96. Christensen, J.W., 2012. Object habituation in horses: The effect of voluntary versus negatively reinforced approach to frightening stimuli. Equine Vet. J. 45, 298–301, http://dx.doi.org/10.1111/j. 2042-3306.2012.00629. x. Corey, D.T., 1978. The determination of exploration and neophobia. Neurosci. Biobehav. Rev. 2, 235–253, http://dx.doi.org/10.1016/0149-7634(78) 90033-7. Davis, H., 2002. Prediction and preparation: pavlovian implications of research animals discriminating among humans. ILAR J. 43, 19–26, http://dx.doi.org/10.1093/ilar.43.1.19.

de Passillé, A.M., Rushen, J., Ladewig Petherick, C., 1996. Dairy calves’ discrimination of people based on previous handling. J. Anim.Sci. 74, 969–974. Domjan M.,2005. The Essentials of Conditioning and Learning, 3rd edition. Thompson Wadsworth, Belmont, CA., Egger, M. D., Miller, N. E.1962. Secondary reinforcement in rats as a function of information value and reliability of the stimulus. Journal of Experimental Psychology, 64, 97–104. 10.1037/ h0040364. Forkman, B., Boissy, A., Meuniers-Salaün, M.-C., Canali, E., Jones, R.B., 2007. A critical review of fear tests used on cattle, pigs, sheep, poultry and horses. Phys. Behav. 91, 531–565, http://dx.doi.org/10.1016/j.physbeh.2007.03.016. Grandin, T., 1997. Assessment of stress during handling and transport. J. Anim. Sci. 75, 249–255. Hendriksen, P., Elmgreen, K., Ladewig, J., 2011. Trailer-loading of horses: is there a difference between positive and negative reinforcement concerning effectiveness and stress-related signs? J. Vet. Behav.: Clin. Appl. Rev. 6, 261–266, http://dx.doi.org/10.1016/j.jveb.2011.02.007. Hineline, P.N., 1977. Negative reinforcement and avoidance. In: Honig, W.K., Staddon, J.E.R. (Eds.), Handbook of operant behaviour. Englewood Cliffs, NJ: Prentice-Hall, pp. 364–414. Innes, L., McBride, S., 2008. Negative versus positive reinforcement: an evaluation of training strategies for rehabilitated horses, Appl. Appl. Anim. Behav. Sci. 112, 357–368, http://dx.doi.org/10.1016/j.applanim.2007.08.011. Kazdin, A.E., 2001. Behavior Modification in Applied Settings, sixth ed. Wadsworth, California, pp. 64–68. Laule, G.E., Bloomsmith, M.A., Schapiro, S.J., 2003. The use of positive reinforcement training techniques to enhance the care, management, and welfare of primates in the laboratory. J. Appl. Anim. Welf. Sci. 6, 163–173, http://dx.doi.org/10.1207/S15327604JAWS0603 02. McGreevy, P., Boakes, R., 2011. Carrots and Sticks: Principles of Animal Training. Darlington Press, pp. 70–72. McKinley, J., 2004. Training In a Laboratory Environment: Methods, Effectiveness and Welfare Implications of Two Species of Primate. Unpublished PhD thesis University of Stirling Scotland, UK, pp. 164–205. McKinley, J., Buchanan-Smith, H.M., Bassett, L., Morris, K., 2003. Training common marmosets (Callithrix jacchus) to cooperate during routine laboratory procedures: ease of training and time investment. J. Appl. Anim. Welf. Sci. 6, 209–220, http://dx.doi.org/10.1207/S15327604JAWS0603 06. McLean, A.N., 2005. The positive aspects of correct negative reinforcement. Anthrozoos 18, 245–254, http://dx.doi.org/10.2752/089279305785594072. McMillan, J.L., Perlman, J.E., Galvan, A., Wichmann, T., Bloomsmith, M.A., 2014. Refining the pole-and-collar method of restraint: emphasizing the use of positive training techniques with Rhesus macaques (Macaca mulatta). J. Am. Assoc. Lab. Anim. Sci. 53, 61–68. Minier, D.E., Tatum, L., Gottlieb, D.H., Cameron, A., Snarr, J., Elliot, R., Cook, A., Elliot, K., Banta, K., Heagerty, A., 2011. Human-directed contra-aggression training using positive reinforcement with single and multiple trainers for indoor-housed rhesus macaques. Appl. Anim. Behav. 132, 178–186, http://dx.doi.org/10.1016/j.applanim.2011.04.009. Misslin, R., Cigrang, M., 1986. Does neophobia necessarily imply fear or anxiety? Behav. Proc. 12, 45–50, http://dx.doi.org/10.1016/0376-6357(86) 90,069-0. Misslin, R., Ropartz, P., 1981. Responses in mice to a novel object. Behaviour 78, 3–4, http://dx.doi.org/10.1163/156853981X00301. O’Heare, J., 2009. The least intrusive effective behavior intervention (LIEBI) algorithm and levels of intrusiveness table: A proposed best practices model. J. Appl. Comp. Anim. Behav. 3, 7–25. Perlman, J.E., Bloomsmith, M.A., Whittaker, M.A., McMillan, J.L., Minier, D.E., McCowan, B., 2012. Implementing positive reinforcement animal training programs at primate laboratories. Appl. Anim. Behav. 137, 114–126, http://dx.doi.org/10.1016/j.applanim.2011.11.003. Prescott, M. J., Bowell, V. A., Buchanan-Smith, H.M.,2005. Training laboratory-housed non-human primates, part 2: Resources for developing and implementing training programmes. Anim. Tech. Welf. 4, 133–48. Prescott, M. J., Buchanan-Smith, H. M.,2007. Training laboratory-housed non-human primates, part 1: a UK survey. Anim. Welf. 16, 21–36. Ramirez, K., 1999. Animal Training: Successful Animal Management Through Positive Reinforcement. John G. Shedd Aquarium, Chicago, pp. 533–554. Reinhardt, V., 1992. Improved handling of experimental rhesus monkeys. The inevitable bond: Examining scientist–animal interactions. 171–77. Sankey, C., Richard-Yris, M.-A., Henry, S., Fureix, C., Nassur, F., Hausberger, M., 2010. Reinforcement as a mediator of the perception of humans by horses (Equus caballus). Anim. Cog. 13, 753–764, http://dx.doi.org/10.1007/s10071-010-0326-9. Skinner, B.F., 1938. The behaviour of organisms: an experimental analysis. Appleton-Century-Crofts, New York, pp. 62. Stacey, R., Messinger, D., Dye, G., Komar, W., McGee, J., Mika, C., Peak, S., Sustman, J., Sullivan, T., Weiner, J., 1999. Passive restraint training in Tursiops truncatus. In: Ken Ramirez (Ed.), Animal Training: Successful Animal Management Through Positive Reinforcement. John G. Shedd Aquarium, Chicago. Ulrich, R.E., Holz, W.C., Azrin, N.H., 1964. Stimulus control of avoidance behaviour. J. Exp. Anal. Behav. 7, 129–133, http://dx.doi.org/10.1901/jeab.1964. 7-129. Vargas, S.J., 2009. Behaviour Analysis for Effective Teaching. Routledge, New York. Waiblinger, S., Boivin, X., Pedersen, V., Tosi, M., Janczak, A.M., Visser, E.K., Bryan, R., 2006. Assessing the human – animal relationship in farmed species: a critical review. Appl. Anim. Behav. 101, 185–242, http://dx.doi.org/10.1016/j.applanim.2006.02.001.

E.-M. Wergård et al. / Behavioural Processes 113 (2015) 51–59 Warren-Smith, A.K., McGreevy, P.D., 2007. The use of blended positive and negative reinforcement in shaping the halt response of horses (Equus caballus). Anim. Welf. 16, 481–488. Weiss, J.M., 1971. Effects of coping behavior in different warning signal conditions on stress pathology in rats. J. Comp. Phys. Psych. 77, 1–13.

59

Wolpe, J., 1961. The systematic desensitization treatment of neurosis. J. Nerv. Ment. Dis. 132, 189–203. Yin, S., 2009. Low Stress Handling, Restraint and Behavior Modification of Dogs and Cats. CattleDog Publishing, Davis, pp. 116–118.

Positive reinforcement training as enrichment for singly housed rhesus macaques (Macaca mulatta).

Extraintestinal campylobacteriosis in rhesus macaques (Macaca mulatta).

Endometrial decidualization and deciduosis in aged rhesus macaques (Macaca mulatta).

Age-Associated Pathology in Rhesus Macaques (Macaca mulatta).

Refining the pole-and-collar method of restraint: emphasizing the use of positive training techniques with rhesus macaques (Macaca mulatta).

Measurement of Blood Volume in Adult Rhesus Macaques (Macaca mulatta).

Cryotolerance of Sperm from Transgenic Rhesus Macaques (Macaca mulatta).

Recent refinements to cranial implants for rhesus macaques (Macaca mulatta).

Coagulation Biomarkers in Healthy Chinese-Origin Rhesus Macaques (Macaca mulatta).

Invasive ductular carcinoma in 2 rhesus macaques (Macaca mulatta).

Hepatic abscesses in five outdoor-housed rhesus macaques (Macaca mulatta).

Seed dispersal by rhesus macaques Macaca mulatta in Northern India.

Affiliative vocalizations in infant rhesus macaques (Macaca mulatta).

Serum Cobalamin (Vitamin B12) Concentrations in Rhesus Macaques (Macaca mulatta) and Pigtailed Macaques (Macaca nemestrina) with Chronic Idiopathic Diarrhea.

Using porches to decrease feces painting in rhesus macaques (Macaca mulatta).

Neurocysticercosis in a Rhesus Macaque (Macaca mulatta).

Use of an aquarium as a novel enrichment item for singly housed rhesus macaques (Macaca mulatta).

Acute-phase responses in healthy and diseased rhesus macaques (Macaca mulatta).

Rhesus macaques (Macaca mulatta) exhibit the decoy effect in a perceptual discrimination task.

Videotaped behavior as a predictor of clinical outcome in rhesus macaques (Macaca mulatta).

Diversity and molecular phylogeny of mitochondrial DNA of rhesus macaques (Macaca mulatta) in Bangladesh.

Severity and Distribution of Wounds in Rhesus Macaques (Macaca mulatta) Correlate with Observed Self-Injurious Behavior.

Intentional gestural communication and discrimination of human attentional states in rhesus macaques (Macaca mulatta).

Pharmacokinetics of hydromorphone after intravenous and intramuscular administration in male rhesus macaques (Macaca mulatta).