would some of the trajectories have to be very long? like they give the example of college acceptance's value being conditional on whether you graduate...
how do you decide what to store in a trace? would a trajectory hop from homework to college application to graduation?
episodic trajectories might be one way of looking at "conflicting beliefs". you could easily have two episodes, which start from different starting points, and constitute logically incompatible beliefs. like, "everyone should earn a fair wage, therefore we should set a high minimum wage". and, "higher minimum age forces employers to reduce full-time employees".
how separable is the state generalization problem from the episodic idea? (although there's the good point about advantages of calculating similarity at decision time.)
they don't talk much (or maybe i missed it) about content-based lookup for starting trajectories. is it basically "for free" that you start trajectories near where you are now (or near where you're simulating yourself to be)? is it like rats where the trajectories are constantly going forward from your current position, maybe in theta rhythm?
i was thinking about a smooth continuum between "episodic" and "statistical/model-based". what if we picture the episodic trajectories as being stored in a RNN. when you experience a new trajectory, you could update the weights of the RNN such that whenever the RNN comes near the starting state of that trajectory, it will probably play out the exact sequence. but you could also update the weights in a different way (roughly: with a smaller update) such that the network is influenced by the statistics of the sequence, but doesn't deterministically play out that particular sequence.
this view is also kind of nice because episodes aren't separate from your statistical learning. some episodes from early in life might even form a big part of your statistical beliefs. especially as the episodes are replayed more and more, over time, from hippocampus out into cortex.
is the sampling of trajectories influenced by current goals?
like, the current goal representations in PFC could be continually exerting pressure on the sampling dynamics to produce outcomes that look like the goal representations?