so, let's say in the brain you have an abstract representation of your action-plan, which predicts through forward models the consequences, and which also continuously contrasts these predicted consequences against goals. the system attempts to minimize the energy of its activations continuously, like a big multiple constraint satisfaction problem, with all comparisons/prediction errors happening in parallel.
how does exploration work in this system? as i understand karl's model, he has a term for epistemic value. or, in like dan russo's information directed sampling, you try to optimize your knowledge about which actions lead to reward.
how could i fit this into my intuitive framework?
generally, i think exploration is the drive to metabolize the outside world. why is this a property of living systems? because life is fundamentally dynamic. the energy of that metabolism is what keeps moving the system forward.
so what is exploration in my continuous action-generation system? why isn't it just making the actions that make predictions look most like goals? i guess one possible answer is simply that it's minimizing prediction errors everywhere. on the surface, it seems like this has a problem - that you should avoid exploring because it would generate surprise.
but i had an idea about this - maybe exploration is only relative to something that has *already* caused a prediction error. for example, let's say there's a door in your environment, which you've never seen. before you see it, you obviously can't choose to explore behind it. when you see it, *this* is what generates the prediction errors. especially because you know some stuff about doors: and your generative models fill in lots of uncertainty about what's behind the door. your exploration is then to minimize this uncertainty.
and finally, we avoid the darkroom problem because of the dopamine stuff, and hyper-stability in PFC etc.
No comments:
Post a Comment