One way both humans and non-humans animals learn is through consequences. In science, this type of learning is called operant conditioning (or also known as instrumental learning). Operant
conditioning focuses on using reinforcement or punishment to increase or decrease the frequency of behaviours.
Video: Explain the basis of positive reinforcement and clicker training in 2minutes.
In Operant conditioning there are 4 quadrants (two reinforcers and two punishers):
Positive reinforcement (+R) involves the addition of a pleasurable (appetitive) stimulus following a behaviour, making it more likely for this behaviour to occur again. For example: When the horse comes to the gate, his owner gives him a carrot. (+R) As a result, the horse always comes when he sees his owner standing by the gate.
Negative reinforcement (-R) involves the removal (or avoidance) of an aversive stimulus following a behaviour, making it more likely for this behaviour to occur again. In the equestrian world, negative reinforcement is often marketed under the euphemism “pressure/release”. The pressure is the application of the aversive stimulus, and the release is the removal of this aversive. For example: A rider kicks his horse’s flanks to get him moving. When the horse starts moving the kicking stop (-R). After a few sessions the rider barely have to move his legs and the horse start walking.
Positive punishment (+P) involves the addition of an aversive stimulus following a behaviour, making it less likely for this behaviour to occur again. For example: A horse fails to
canter when cued to by his rider, as a punishment the rider hit him with a whip (+P). When the rider tells the horse to start cantering again, the horse quickly goes into canter in order to avoid
getting hit again.
Negative punishment (-P) involves the removal of something desirable (appetitive) following a behaviour, making it less likely for this behaviour to occur again. For example: When
I scratch my horse, my horse start biting my clothes (wanting to engage in mutual grooming). When my horse does this, I stop scratching him and step back a little. (-P) After a few repetitions,
my horse quickly learns not to use his teeth when engaging in mutual grooming with me.
Drawing by Fed up Fred: illustrating the 4 quadrants of operant conditioning.
Both traditional and natural horsemanship primarily use negative reinforcement (to get desired behaviour) and positive punishment (to remove undesired behaviour) however a raising number of trainers are starting to incorporate the use of appetitives in their training and ditch the use of aversives.
A trainer or a rider using appetitive based training will obtain desired behaviours by adding a pleasurable stimulus (positive reinforcement) and will discourage undesired behaviour by removing a pleasurable stimulus (negative punishment). However unlike with aversive trainers, appetitive trainers tend to have a greater understanding of learning theory (due to the training method being first developed by scientists) and therefore will be cautious with their use of negative punishment and try using more ethical alternatives when applicable. For example, antecedent arrangement, redirection to an appropriate stimuli or teaching an incompatible behaviour.
Drawbacks of positive reinforcement
Unlike negative reinforcement that has many issues including inducing fear to the animal, damaging animal/handler relationship and causing aggression there are few drawbacks to positive reinforcement. However one important issue we see from time to time is attractive-type aggression due to the trainer possessing a desired stimulus such as food. Such issue can be solved by appropriate horse management, change of reinforcer, improvement in the food delivery etc.
Benefits of positive reinforcement
Clicker training
Clicker trainers uses positive reinforcement with the addition of a bridge signal which is most
often a small noise maker called a clicker. The purpose of the clicker is to improve timing, to be able to precisely mark the moment the horse did the desired behaviour so he knows what earned
him a reinforcer. For example: A rider is working on improving his horse canter by
rewarding every time he takes the canter on the correct lead. Using a bridge signal, the rider can mark when the horse takes the canter on the correct lead. He can then go back to walk, halt and
deliver the reinforcer to his horse.
The use of a bridge signal may not always be necessary (eg. when the reinforcer can be delivered
immediately) or recommend (eg. when counter-conditioning a horse to an intense fear you do not want to pair the bridge signal with the feared stimuli).
What positive reinforcement training can look like:
Mini Glossary: