laurence hunt has been pointing out how decisions aren't computed in the brain: by one region or circuit doing a particular computation and passing the finished result on to another region or circuit, that then does a different computation.
instead, millisecond by millisecond, decision-related signals appear almost simultaneously all over cortex:
another example is how downstream circuits can be biased by an upstream calculation before that calculation is finished:
these things make sense intuitively because neurons have a lot of connections to each other. laurence also points outs that one of the canonical computations in the brain is lateral inhibition. he outlines a big picture of parallel competition across the brain - with different areas emphasizing competition in different features spaces:
and this matches up with "mixed selectivity". for example, if you look at the cells that project to midbrain dopamine neurons, you might think these cells would contain cleanly separated representations of the different signals you need to calculate RPE, like reward, expectation etc. but instead, many of the neurons are a bit correlated with multiple of these signals:
it's a signal smoothie.
stefano fusi and others observe that keeping this high dimensionality is important in order to respond to task demands:
i like how this is analogous to life. keeping lots of potential around is probably what life is. i wonder how this connects to mate lengyl's idea that moment-to-moment variability in neural signals encodes uncertainty in the beliefs they represent.
i also like this because it fits exactly with the idea of continuously metabolizing the world. compared to "decision making", i think this is a fundamentally different way of looking at what the brain is doing. for one thing, you generate rather than select actions. i suppose this is good in the real world where the action dimensionality is very high and candidate actions are rarely proposed to you. for another thing, it means that in a "stable state", you can keep emitting actions from your current abstract goal representation. maybe it's unfolded to different things depending on the current inputs and internal state...
but now we get to the really interesting part. let's start with laurence's model of lots of brain areas processing feature competitions in parallel, with simultaneous influences on one another. what i've been thinking is that this links together planning and imagination in a concrete way.
the key to this, in my current (probably infantile) thinking, is that PFC networks may be very good at holding on to their representations (both because individual neurons have longer time constants, and because of network-level properties). if you think of PFC as being the top of a sensory hierarchy, you could look at it as encoding a very abstract state of the world. or if you think of it as being the top of a motor hierarchy, you could view it as encoding a very abstract action plan. these two things are the same thing.
now, let's say that in the feature competitions, you include some bits of forward model -- i.e., how your potential actions will change the world -- and some modelling of the dynamics of the world itself (let's lump these both under the name "forward model" for now). as the overall state of the brain evolves forwards (under its lateral inhibition dynamics), your current abstract action plan will push the modelled future state of the world toward whatever the predicted consequences are. these predicted consequences are processed through the sensory/state/value hierarchy and compared against current goals. if they match current goals, fine, you continue unfolding your current action plan. if they don't match, then this is where the hyper-stability of PFC comes in. PFC, as the top of the sensory hierarchy, should be pushed toward believing the predicted consequences. but because it's pathologically stable, it doesn't get its own state pushed around. instead, it pushes on the action side of the hierarchy to make the abstract action representations a little bit more like something that produces consequences that match its beliefs.
importantly, when the current goals are not working (maybe signalled by something like low tonic dopamine), the PFC can let go of its hyper-stability and reorganize around new goals/abstract action plans. e.g.:
spiritually speaking, this can feel difficult. the more of the long-term goal-patterns are released, the more it's letting go of the self, which is what we're afraid of.
(here's a very ignoreable side note. it wouldn't have to be PFC alone that "insists" on goals. this insistence could be distributed over the whole brain too. the PFC idea is just a clean way to visualize the story.)
this process is an energy minimization in activation space (to satisfy as many of the constraints imposed by weights as possible). but it's nice that you never have to compute the energy or its gradient. i have no idea how learning would work in this kind of system but i'm sure people must have worked on it. something hebbian and simple?
so what i've just described is a way of doing planning, right? but you never need any part of the brain doing something like tree search over the whole abstract state space. instead, little bits of forward model nested in various areas can unfold in different kinds of feature spaces, to different lengths of time, but these computations are being influenced by constraints from ongoing computations in other areas.
this could explain pavlovian pruning:
because if you start simulating bad consequences in one part of the feature space, this suppresses the search. although maybe this kind of explanation is overkill.
i suppose this fits with replay because the dynamics of the brain have learned the statistics of the dynamics of the real world. it makes sense that little bits of the brain are perpetually playing out the little snippets of dynamics that they know about. obviously it's much more complicated with learning and stuff. this is just some stray thoughts.
where does dopamine come in? some more speculative thoughts. if you get a reward prediction error, this means the overall action plan/expectations didn't account for everything, so maybe you need to update your action plan. (or, i could see this going the other way, that a positive RPE means you should stabilize your abstract action plan, and engage more action along it.) this loosely matches with seamans/yang or cohen/braver/brown ideas that dopamine modulates PFC representational stability. rui costa's work seems consistent with this being at some level more abstract than just simple actions, e.g.:
what about "episodic RL"?
sam and nathaniel's model remembers discrete sequences, but maybe the whole system is on a continuum from non-parametric to parametric. when you experience a sequence/trajectory, this gets written into the weights of your RNNs. if it's written in really strongly, it becomes a path that future activity can almost deterministically follow (if it comes near the starting state). if it's written in weakly, it just influences the future dynamics a bit.
what about striatum? forward models ("what will happen if i do this?") and dynamics models are probably relatively easy to learn: you just match what you observe. i suppose inverse models are harder to learn ("what action should i do?"). but the striatum and habits maybe cache little pieces of inverse model. (i haven't thought through how this fully fits in yet.) it is a nice idea that the continuum between "model-based" and "model-free" has to do with which parts of forward-model-energy-minimization you replace with bits of inverse model.
there are some ML systems that need to be fed a "goal state", like this:https://arxiv.org/abs/1609.05143
these kind of systems should mesh well with the metaphors i've been thinking about here.
this is getting so schizophrenic, let me close by reflecting about how the building blocks of nervous systems are things like central pattern generators. it's oscillations modulating oscillations.