Predictive brains

Whatever next? Predictive brains, situated agents, and the future of cognitive science (Andy Clark 2013, Behavioral and Brain Sciences) is an interesting paper on the computational architecture of the brain. It’s arguing that a large part of the brain is made up of hierarchical systems, where each system uses an internal model of the lower system in an attempt to predict the next outputs of the lower system. Whenever a higher system mispredicts a lower system’s next output, it will adjust itself in an attempt to make better predictions in the future.

So, suppose that we see something, and this visual data is processed by a low-level system (call it system L). A higher-level system (call it system H) attempts to predict what L’s output will be and sends its prediction down to L. L sends back a prediction error, indicating the extent to which H’s prediction matches L’s actual activity and processing of the visual stimulus. H will then adjust its own model based on the prediction error. By gradually building up a more accurate model of the various regularities behind L’s behavior, H is also building up a model of the world that causes L’s activity. At the same time, systems H+, H++ and so on that are situated “above” H build up still more sophisticated models.

So the higher-level systems have some kind of model of what kind of activity to expect from the lower-level systems. Of course, different situations elicit different kinds of activity: one example given in the paper is that of an animal “that frequently moves between a watery environment and dry land, or between a desert landscape and a verdant oasis”. The kinds of visual data that you would expect in those two situations differs, so the predictive systems should adapt their predictions based on the situation.

And apparently, that is what happens – when salamanders and rabbits are put to varying environments, half of their retinal ganglion cells rapidly adjust their predictions to keep up with the changing image predictions. Presumably, if the change of scene was unanticipated, the higher-level systems making predictions of the ganglion cells will then quickly get an error signal indicating that the ganglion cells are now behaving differently from what was expected based on how they acted just a moment ago; this should also cause them to adjust their predictions, and data about the scene change gets propagated up through the hierarchy.

This process involves the development of “novelty filters”, which learn to recognize and ignore the features of the input that most commonly occur together within some given environment. Thus, things that are “familiar” (based on previous experience) and behave in expected ways aren’t paid attention to.

So far we’ve discussed a low-level system sending the higher-level an error signal when the predictions of the higher-level system do not match the activity of the lower-level system. But the predictions sent by the higher-level system also serve a function, by acting as Bayesian priors for the lower-level systems.

Essentially, high up in the hierarchy we have high-level models of how the world works, and what might happen next based on those models. The highest-level system, call it H+++, makes a prediction of what the next activity of H++ is going to be like, and the prediction signal biases the activity of H++ in that direction. Now the activity of H++ involves making a prediction of H+, so this also causes H++ to bias the activity of H+ in some direction, and so on. When the predictions of the high-level models are accurate, this ends up minimizing the amount of error signals sent up, as the high-level systems adjust the expectations of the lower-level systems to become more accurate.

Let’s take a concrete example (this one’s not from the paper but rather one that I made up, so any mistakes are my own). Suppose that I am about to take a shower, and turn on the water. Somewhere in my brain there is a high-level world model which says that turning on the shower faucet will lead to water pouring out, and because I’m standing right below it, the model also predicts that the water will soon be falling on my body. This prediction is expressed in terms of the expected neural activity of some (set of) lower-level system(s). So the prediction is sent down to the lower systems, each of which has its own model of what it means for water to fall on my body, and each of which send that prediction down to yet more lower-level systems.

Eventually we reach some pretty low-level system, like one predicting the activity of the pressure- and temperature-sensing cells on my skin. Currently there isn’t yet water falling down on me, and this system is a pretty simple one, so it is currently predicting that the pressure- and temperature-sensing cells will continue to have roughly the same activity as they do now. But that’s about to change, and if the system did continue predicting “no change”, then it would end up being mistaken. Fortunately, the prediction originating from the high-level world-model has now propagated all the way down, and it ends up biasing the activity of this low-level system, so that the low-level system now predicts that the sensors on my skin are about to register a rush of warm water. Because this is exactly what happens, the low-level system generates no error signal to be sent up: everything happened as expected, and the overall system acted to minimize the overall prediction error.

If the prediction from the world-model would have been mistaken – if the water had been cut, or I accidentally turned on cold water when I was expecting warm water – then the biased prediction would have been mistaken, and an error signal would have been propagated upwards, possibly causing an adjustment to the overall world-model.

This ties into a number of interesting theories that I’ve read about, such as the one about conscious attention as an “error handler”: as long as things follow their familiar routines, no error signals come up, and we may become absent-minded, just carrying out familiar habits and routines. It is when something unexpected happens, or something of where we don’t have a strong prediction of what’s going to happen next, that we are jolted out of our thoughts and forced to pay attention to our surroundings.

This would also help explain why meditation is so notoriously hard: it involves paying attention to a single unchanging stimuli whose behavior is easy to predict, and our brains are hardwired to filter any unchanging stimuli whose behavior is easy to predict out of our consciousness. Interestingly, extended meditation seems to bring some of the lower-level predictions into conscious awareness. And what I said about predicting short-term sensory stimuli ties nicely into the things I discussed back in anticipation and meditation. Savants also seem to have access to lower-level sensory data. Another connection is the theory of autism as weakened priors for sensory data, i.e. as a worsened ability for the higher-level systems to either predict the activity of the lower-level ones, or to bias their activity as a consequence.

The paper has a particularly elegant explanation of how this model would explain binocular rivalry, a situation where a test subject is shown one image (for example, a house) to their left eye and another (for example, a face) to their right eye. Instead of seeing two images at once, people report seeing one at a time, with the two images alternating. Sometimes elements of unseen image are perceived as “breaking through” into the seen one, after which the perceived image flips.

The proposed explanation is that there are two high-level hypotheses of what the person might be seeing: either a house or a face. Suppose that the “face” hypothesis ends up dominating the high-level system, which then sends its prediction down the hierarchy, suppressing activity that would support the “house” interpretation. This decreases the error signal from the systems which support the “face” interpretation. But even as the error signal from those systems decreases, the error signal from the systems which are seeing the “house” increases, as their activity does not match the “face” prediction. That error signal is sent to the high-level system, decreasing its certainty in the “face” prediction until it flips its best guess prediction to be one of a house… propagating that prediction down, which eliminates the error signal from the systems making the “house” prediction but starts driving up the error from the systems making the “face” prediction, and soon the cycle repeats again. No single hypothesis of the world-state can account for all the existing sensory data, so the system ends up alternating between two conflicting hypotheses.

One particularly fascinating aspect of the whole “hierarchical error minimization” theory as presented so far is that it can also cover not only perception, but also action! As hypothesized in the theory, when we decide to do something, we are creating a prediction of ourselves doing something. The fact that we are actually not yet doing anything causes an error signal, which in turn ends up modifying the activity of our various motor systems so as to cause the predicted behavior.

As strange as it sounds, when your own behaviour is involved, your predictions not only precede sensation, they determine sensation. Thinking of going to the next pattern in a sequence causes a cascading prediction of what you should experience next. As the cascading prediction unfolds, it generates the motor commands necessary to fulfill the prediction. Thinking, predicting, and doing are all part of the same unfolding of sequences moving down the cortical hierarchy.

Everything that I’ve written here so far only covers approximately the first six pages of the paper: there are 18 more pages of it, as well as plenty of additional commentaries. I haven’t yet had the time to read the rest, so I recommend checking out the paper itself if this seemed interesting to you.

4 comments

  1. Some connections:

    As I understand it, the recent success of “deep learning” neural networks is that the earlier layers of the neural network can be used for automatic feature selection in machine learning. Essentially, these layers find regularity in the input, which is then passed to later layers and boosts their effectiveness.

    You might be interested in this paper, if you haven’t seen it: Bayesian Surprise Attracts Human Attention: “We find that subjects are strongly attracted towards surprising locations, with 72% of all human gaze shifts directed towards locations more surprising than the average.”

    There’s also a bit of Hawkin’s On Intelligence where he suggests a sort of emerging perspective in sensory neuroscience that only surprising information is propagated to higher levels of sensory processing, which seems like it would dovetail nicely with this model.

    Regarding meditation (specifically vipassana), my theory of my own practice is that it’s a process of bring automatic mental processes under conscious control, resulting in more conscious access to lower levels of sensory processing. (Note, for instance, that meditators perform better on the Stroop task, but that might just be an executive attention thing.) This seems (at least weakly) supported by larger brain regions associated with sensory processing among meditators than controls, such as in this study.

    Oh, and the whole idea of brains as prediction engines is very reminiscent of Schmidhuber’s formal theory of creativity.

  2. Oooh, I love this summary Kaj. Very succinct, and using excellent concrete examples to pull the whole concept together. It hits all the points on how it’s related to other interesting ideas, leaving me burning with curiosity to find out more. Nicely done.

Leave a Reply