Where do continuous latent dynamics actually pay off?
Blog 01 argued the idea is interesting. This one is the honesty pass: where do continuous latent dynamics buy you something concrete that a discrete RSSM / transformer cannot?
If we can't answer that crisply, the whole thing is aesthetic.
Below: the regimes where I think the case is actually defensible, plus the ones where it isn't.
1. Irregular and asynchronous observations
This is the cleanest case. Real-world data is almost never on a uniform grid:
- ICU vitals: heart rate every 5 s, labs every 6 h, imaging once.
- Multi-sensor robots: camera at 30 Hz, LiDAR at 10 Hz, IMU at 200 Hz, GPS at 1 Hz.
- Event cameras: timestamps arrive when something changes, not on a clock.
- Financial telemetry: trades are inherently event-driven.
Discrete-step models either (a) resample to a common grid (loses information / forces imputation) or (b) tokenize timestamps as auxiliary features (works, but timestamps become "yet another input" rather than the axis everything lives on).
A latent ODE dz/dt = f(z, t) lets you query the state at any time, including times no observation was made. The Latent ODE (Rubanova 2019) and ODE-RNN line of work was specifically built around this. For a multi-sensor robot world model this is borderline necessary, not optional.
2. Known physical structure
When you know the system is Hamiltonian, Lagrangian, symplectic, or port-Hamiltonian, you can bake it in:
- Hamiltonian NN (Greydanus 2019): parameterize
H(q, p), derive dynamics from the symplectic form — energy is conserved by construction. - Lagrangian NN (Cranmer 2020): parameterize
L(q, q̇), get Euler-Lagrange equations for free. - Symplectic integrators preserve phase-space volume over long rollouts.
- Equivariant ODEs preserve symmetry groups (SO(3) for rotations, SE(3) for rigid bodies).
A discrete latent transformer can learn energy conservation approximately, but it will drift. For long-horizon physical rollouts (orbital mechanics, robotic manipulation, fluid sim), the structure-preserving option is dominant.
This matters most when rollouts are long relative to training horizon — exactly the regime where Dreamer-style discrete models start to hallucinate.
3. Memory efficiency: the adjoint method
The argument every Neural ODE paper makes, but it's real:
- Discrete RNN/transformer: backprop through
Tsteps storesTintermediate activations —O(T)memory. - Adjoint method (Chen 2018): backprop through the ODE solver by integrating a second ODE backwards —
O(1)memory in trajectory length.
For long-horizon rollouts (thousands of steps in robotics, climate, biology), this is the difference between "fits on the GPU" and "doesn't."
Caveat: the adjoint trades memory for numerical accuracy, and naive adjoints are unstable for stiff systems. Modern variants (interpolated adjoint, ANODE, seminorm-based adjoint) mitigate this but it's not free.
4. Elegance — and actual leverage — for control
Continuous dynamics open up a century of optimal-control machinery:
- Pontryagin's Maximum Principle for trajectory optimization.
- Hamilton–Jacobi–Bellman equations for value functions in continuous state/time.
- Model Predictive Control that natively handles continuous-time horizons and constraints.
- Differentiable physics simulators speak the same language — backprop through contact, friction, fluids.
You can get planning out of a discrete latent model (CEM, MPPI in Dreamer), but you're discretizing a continuous problem to use discrete tools and then re-continuizing for the controller. A native continuous latent dynamics + a continuous controller is one fewer impedance mismatch.
This is the case I think is most under-appreciated — and where the World Labs / Genie / Cosmos crowd hasn't really pushed yet.
5. Multi-timescale / stiff dynamics
Most interesting systems have widely separated timescales: fast electrical dynamics + slow mechanical, fast chemistry + slow diffusion, fast micro-policy + slow macro-strategy. Adaptive ODE solvers (Dormand–Prince, implicit methods for stiff regimes) automatically vary step size to track whatever timescale is currently dominant.
A fixed-step discrete model is stuck either spending compute on the slow-and-boring or undersampling the fast-and-critical.
6. Querying state at arbitrary time — beyond rollout
Once dynamics are continuous, you can ask:
- "What was the state halfway between frame 17 and 18?" — video interpolation / super-resolution at any FPS.
- "When was the joint at maximum velocity?" — peak-finding without per-step search.
- "Integrate the cost function exactly between two events" — clean reward shaping.
Discrete models can only answer questions defined on their grid. Continuous models answer continuous questions.
7. Latent SDEs for principled uncertainty
dz = f(z,t) dt + g(z,t) dW is the same machinery as score-based diffusion, but used as a world model, not a generator. The diffusion term gives calibrated aleatoric uncertainty that propagates correctly forward in time — discrete ensembles fake this by sampling.
Score-based world models (Latent SDE, neural SDE) are a small but growing literature; if you care about risk-sensitive control, this matters.
And — to keep it honest — where ODEs don't help
- Discrete event-driven systems: game state transitions, dialog acts, symbolic reasoning. There is no
dz/dtbecause nothing is "between" the discrete states. - Tokenized regimes: LLMs, VQ-VAE world models like IRIS or Genie (Bruce 2024). The latent is a sequence of discrete codes; smoothing them with an ODE doesn't help.
- Short-horizon high-frequency rollouts at uniform rate (Atari, most current video models): the discrete machinery is fine, and the implementation overhead of ODE solvers is real engineering tax.
- When you don't actually care about intermediate states: if the only thing you ever ask the model is "next frame at +30 ms", continuous internal dynamics buy you nothing.
The honest summary is: continuous-time pays off when the target task is continuous (irregular sampling, physical conservation, control synthesis, multi-timescale), not just because continuous-time is conceptually elegant.
So where does this leave the JEPA + Neural ODE pitch from blog 01?
The strongest version of the pitch is not "replace all world models with continuous latents." It's:
For robotic / physical / multi-sensor world models — exactly the kind World Labs, Wayve, and Toyota Research care about — the data is irregular, the dynamics are structured, the horizons are long, and the downstream task is control. Every single one of those favors continuous latent dynamics over discrete-step RSSMs.
JEPA gives you the right space. Neural ODEs give you the right evolution in that space. The two are complementary; the combined object is a defensible model class, not a vanity project.