World Models Are Great: JEPA Meets Neural ODE

I was looking into V-JEPA 2 (Meta, 2025) and related predictive representation approaches recently, and it got me wondering whether we are slowly circling back toward something more continuous-time and dynamical-systems flavored again.

A lot of current world-model work focuses on predictive latent representations:

predict abstract structure
avoid pixel reconstruction
learn semantics through prediction

Which makes perfect sense.

But at the same time, many of these systems still evolve internally through fundamentally discrete transitions:

$z_t \to z_{t+1}$

Coming from the Neural ODE / continuous dynamics side of things, I keep wondering:

What happens if the latent world model itself is treated as a continuous dynamical system?

Not just "predict the next embedding," but learn a latent flow:

$\frac{dz}{dt} = f(z, t)$

This starts to blur boundaries between:

predictive representation learning
continuous-time latent dynamics
neural differential equations
control theory
and eventually differentiable simulation

One interesting aspect is that JEPA-style objectives and Neural ODE-style dynamics are not competitors at all. They solve different problems:

JEPA learns what matters
Neural ODEs learn how it evolves

Combining the two feels surprisingly natural: semantic latent spaces with continuous trajectories instead of discrete jumps.

It also raises interesting questions:

Should world models conserve structure?
Should latent flows obey geometric constraints?
Do continuous latent trajectories produce more stable long-horizon predictions?
Is "time" even best represented discretely in learned simulators?

Feels like there is still a lot of unexplored territory between self-supervised predictive learning and continuous dynamical systems.

World Models Are Great: JEPA Meets Neural ODE

Related work from my side