Neuroscientists just upended our understanding of Pavlovian learning
A recent study published in Nature Neuroscience suggests that the brain learns to associate a specific signal with a reward based on the amount of time that passes between rewards, rather than the sheer number of repetitions. This challenges a century-old assumption about conditioning, providing evidence that total learning over a given period depends entirely on timing. These findings could shift our understanding of both animal and human learning.
For over a hundred years, scientists have generally accepted that associative learning operates through trial and error. Associative learning is the process by which a human or animal learns to link a specific signal with a specific outcome, like a dog learning that a bell means dinner is ready. The prevailing thought has been that more practice leads to better learning.
Scientists previously developed a mathematical model suggesting that animals learn by looking backward in time to identify the causes of meaningful effects. In this framework, the brain does not try to predict the future effects of a cue, but rather works backward from a reward to figure out what predicted it. While testing this idea, scientists noticed that animals learned proportionally faster when the time between rewards was extended.
“We had realized soon after publishing this paper that this model predicts that animals will learn cue-reward associations proportionally faster when the trials are spaced out, which should mean that over a fixed duration, total learning is independent of the number of experienced cue-reward pairings,” said study author Vijay Mohan K. Namboodiri, an associate professor at UC San Francisco.
This observation prompted researchers to test whether a strict mathematical rule governs the rate of learning. They aimed to determine if learning speeds up proportionally in relation to the time elapsed between cue and reward experiences. They designed a series of experiments to measure both physical behavior and brain chemistry in real time.
“We set out to test whether there is a rule governing learning rate control and whether learning rate scales proportionally with the time between cue-reward experiences,” Namboodiri explained.
The researchers conducted their study using 101 adult male and female mice. They classically conditioned thirsty mice by playing a brief auditory tone followed by the delivery of sugar-sweetened water. The mice were physically held in a fixed position, ensuring the testing conditions were controlled and uniform across all subjects.
As the mice learned the association, they would begin licking the water spout as soon as they heard the tone, anticipating the sugar water. To measure the underlying brain activity, the researchers used a technique called fiber photometry. They injected a special fluorescent sensor into the nucleus accumbens core, a brain region heavily involved in processing rewards.
This sensor lit up when the brain released dopamine, a chemical messenger strongly linked to pleasure, motivation, and learning. This allowed the scientists to monitor exactly when the brain processed the tone and the reward. The researchers divided the mice into different groups based on how much time passed between the trials. Some mice experienced the tone and reward every 60 seconds, while others waited 600 seconds between pairings.
The mice that waited 600 seconds learned the association in about one-tenth the number of trials compared to the mice on the 60-second schedule. This indicates a proportional relationship where the rate of learning per trial increases as the time between rewards increases. As a result, both groups of mice learned the association in the exact same amount of total conditioning time, despite one group experiencing far fewer total tone and reward pairings.
“The main finding of the study, that learning rate (how much is learned from each experience) scales proportionally with the time between rewards was very surprising,” Namboodiri told PsyPost. “While it was a prediction made by our retrospective learning model mentioned above, we expected our initial experiments to falsify that prediction and necessitate an update to the model.”
The dopamine measurements provided evidence matching the behavioral observations. In the mice with longer gaps between rewards, the brain required proportionally fewer experiences before it started releasing dopamine in response to the tone alone. The dopamine response actually emerged a few trials before the mice started physically licking the spout in anticipation.
“Trial by trial, we tracked how dopamine responses to cues evolved during learning under the same timing manipulations we used behaviorally,” Namboodiri said. “We found that dopamine signals followed the same learning rule: the rate and magnitude of changes in dopamine cue responses depended on the average time between rewards, not on the raw number of cue–reward pairings. This parallel between behavior and dopamine activity shows that the brain’s reward system implements a time‑based learning rule, revealing a simple biological underpinning for how animals learn from rewards.”
To ensure their results were not caused by other factors, the scientists ran several control experiments. They tested whether the mice simply learned faster because they received fewer rewards per day, which might make the sugar water seem more novel.
The researchers also tested whether spending more time in the testing chamber without hearing tones played a role. Even when controlling for these variables, the proportional scaling rule remained consistent. The time between rewards consistently dictated the speed of learning per trial.
The scientists then tested aversive learning by pairing a tone with a mild foot shock in freely moving mice. They observed the same proportional scaling rule in this scenario. Mice with longer times between shocks learned to freeze in response to the tone in proportionally fewer trials.
In another variation, researchers tested partial reinforcement. They played the tone every 60 seconds but only gave the sugar water 10 percent to 50 percent of the time. Because the actual rewards were spaced further apart in time, the mice learned the underlying dopamine association in far fewer rewarded trials than mice receiving rewards every single time.
Traditional theories of learning assume the brain calculates a prediction error on a moment-by-moment basis. A prediction error is the difference between the reward an animal expects and the reward it actually receives. The researchers compared these older models against their newer framework that calculates associations by looking backward in time only when a reward is received.
When running computer simulations of these different theories, the traditional models failed to match the behavior of the mice. The traditional models could not explain why learning rates scaled proportionally with the time between rewards. The newer backward-looking model naturally predicted this exact proportional scaling, providing strong theoretical support for the experimental findings.
“A key takeaway from our study is that that what really drives reward‑based learning is how much time passes between rewards, not how many cue–reward pairings an animal experiences,” Namboodiri summarized. “In simple terms, we found that when rewards are spaced farther apart in time, each individual reward leads to proportionally greater learning. Thus, if rewards occur ten times farther apart, each reward leads to roughly ten times more learning.”
“As a result, when you look over a fixed amount of time, the total amount of learning ends up the same despite vastly different number of cue-reward experiences (over a 20-fold range). This previously unknown learning rule suggests that the total number of experiences is not the key determinant of learning, which challenges some longstanding assumptions in neuroscience and reinforcement learning. The field had known that spreading pairings out in time speeds up learning per pairing, but it was still assumed that the final level of learning depended on the total number of pairings. Our experiments showed that, instead, total learning is determined by time, not count.”
Readers might easily confuse these findings with the well-known spacing effect. The spacing effect is a broad educational concept suggesting that taking breaks between study sessions yields better learning than cramming. The new research points to something much more specific than a general benefit of taking breaks.
“We would like to highlight that our results are not simply a restatement of the spacing effect or its biological underpinnings, but instead that we have identified a previously unknown rule of learning,” Namboodiri told PsyPost. “The spacing effect can be summarized in broad terms as ‘spacing out experiences = better learning,’ which implies that when experiences are closer together in time, there are diminishing returns on their contribution to learning.”
“However our findings that learning rate scales proportionally with the time between rewards (rewards, specifically) require a fundamental shift from the above perspective because it necessitates (as we show) that over a fixed amount of time, number of cue-reward experiences has NO impact on overall learning.”
One potential limitation is that the researchers tested this specific rule primarily in simple conditioning setups using mice. They also noted that the proportional scaling rule tends to break down at extreme intervals, such as when mice waited an entire hour between rewards.
Future research will explore where exactly in the brain this time duration is calculated. Scientists also plan to investigate whether this rule applies to drug rewards, which could offer insights into addiction and habit formation. Because nicotine patches deliver a constant stream of nicotine, for example, they might disrupt the brain’s association between the act of smoking and the reward, blunting the urge to smoke.
Applying these timing principles to artificial intelligence systems might also help machines learn much faster from fewer pieces of data. Current systems learn slowly because they make tiny refinements after billions of interactions. A model borrowing from these new biological findings could potentially accelerate artificial learning.
The study, “Duration between rewards controls the rate of behavioral and dopaminergic learning,” was authored by Dennis A. Burke, Annie Taylor, Huijeong Jeong, SeulAh Lee, Leo Zsembik, Brenda Wu, Joseph R. Floeder, Gautam A. Naik, Ritchie Chen & Vijay Mohan K Namboodiri.
First Appeared on
Source link