World Models Are Great: JEPA Meets Neural ODE
I was looking into V-JEPA 2 (Meta, 2025) and related predictive representation approaches recently, and it got me wondering whether we are slowly circling back toward something more continuous-time and dynamical-systems flavored again.
A lot of current world-model work focuses on predictive latent representations:
- predict abstract structure
- avoid pixel reconstruction
- learn semantics through prediction
Which makes perfect sense.
But at the same time, many of these systems still evolve internally through fundamentally discrete transitions:
Coming from the Neural ODE / continuous dynamics side of things, I keep wondering:
What happens if the latent world model itself is treated as a continuous dynamical system?
Not just "predict the next embedding," but learn a latent flow:
This starts to blur boundaries between:
- predictive representation learning
- continuous-time latent dynamics
- neural differential equations
- control theory
- and eventually differentiable simulation
One interesting aspect is that JEPA-style objectives and Neural ODE-style dynamics are not competitors at all. They solve different problems:
- JEPA learns what matters
- Neural ODEs learn how it evolves
Combining the two feels surprisingly natural: semantic latent spaces with continuous trajectories instead of discrete jumps.
It also raises interesting questions:
- Should world models conserve structure?
- Should latent flows obey geometric constraints?
- Do continuous latent trajectories produce more stable long-horizon predictions?
- Is "time" even best represented discretely in learned simulators?
Feels like there is still a lot of unexplored territory between self-supervised predictive learning and continuous dynamical systems.