Where do continuous latent dynamics actually pay off?

Blog 01 argued the idea is interesting. This one is the honesty pass: where do continuous latent dynamics buy you something concrete that a discrete RSSM / transformer cannot?

If we can't answer that crisply, the whole thing is aesthetic.

Below: the regimes where I think the case is actually defensible, plus the ones where it isn't.

1. Irregular and asynchronous observations

This is the cleanest case. Real-world data is almost never on a uniform grid:

ICU vitals: heart rate every 5 s, labs every 6 h, imaging once.
Multi-sensor robots: camera at 30 Hz, LiDAR at 10 Hz, IMU at 200 Hz, GPS at 1 Hz.
Event cameras: timestamps arrive when something changes, not on a clock.
Financial telemetry: trades are inherently event-driven.

Discrete-step models either (a) resample to a common grid (loses information / forces imputation) or (b) tokenize timestamps as auxiliary features (works, but timestamps become "yet another input" rather than the axis everything lives on).

A latent ODE dz/dt = f(z, t) lets you query the state at any time, including times no observation was made. The Latent ODE (Rubanova 2019) and ODE-RNN line of work was specifically built around this. For a multi-sensor robot world model this is borderline necessary, not optional.

2. Known physical structure

When you know the system is Hamiltonian, Lagrangian, symplectic, or port-Hamiltonian, you can bake it in:

Hamiltonian NN (Greydanus 2019): parameterize H(q, p), derive dynamics from the symplectic form — energy is conserved by construction.
Lagrangian NN (Cranmer 2020): parameterize L(q, q̇), get Euler-Lagrange equations for free.
Symplectic integrators preserve phase-space volume over long rollouts.
Equivariant ODEs preserve symmetry groups (SO(3) for rotations, SE(3) for rigid bodies).

A discrete latent transformer can learn energy conservation approximately, but it will drift. For long-horizon physical rollouts (orbital mechanics, robotic manipulation, fluid sim), the structure-preserving option is dominant.

This matters most when rollouts are long relative to training horizon — exactly the regime where Dreamer-style discrete models start to hallucinate.

3. Memory efficiency: the adjoint method

The argument every Neural ODE paper makes, but it's real:

Discrete RNN/transformer: backprop through T steps stores T intermediate activations — O(T) memory.
Adjoint method (Chen 2018): backprop through the ODE solver by integrating a second ODE backwards — O(1) memory in trajectory length.

For long-horizon rollouts (thousands of steps in robotics, climate, biology), this is the difference between "fits on the GPU" and "doesn't."

Caveat: the adjoint trades memory for numerical accuracy, and naive adjoints are unstable for stiff systems. Modern variants (interpolated adjoint, ANODE, seminorm-based adjoint) mitigate this but it's not free.

4. Elegance — and actual leverage — for control

Continuous dynamics open up a century of optimal-control machinery:

Pontryagin's Maximum Principle for trajectory optimization.
Hamilton–Jacobi–Bellman equations for value functions in continuous state/time.
Model Predictive Control that natively handles continuous-time horizons and constraints.
Differentiable physics simulators speak the same language — backprop through contact, friction, fluids.

You can get planning out of a discrete latent model (CEM, MPPI in Dreamer), but you're discretizing a continuous problem to use discrete tools and then re-continuizing for the controller. A native continuous latent dynamics + a continuous controller is one fewer impedance mismatch.

This is the case I think is most under-appreciated — and where the World Labs / Genie / Cosmos crowd hasn't really pushed yet.

5. Multi-timescale / stiff dynamics

Most interesting systems have widely separated timescales: fast electrical dynamics + slow mechanical, fast chemistry + slow diffusion, fast micro-policy + slow macro-strategy. Adaptive ODE solvers (Dormand–Prince, implicit methods for stiff regimes) automatically vary step size to track whatever timescale is currently dominant.

A fixed-step discrete model is stuck either spending compute on the slow-and-boring or undersampling the fast-and-critical.

6. Querying state at arbitrary time — beyond rollout

Once dynamics are continuous, you can ask:

"What was the state halfway between frame 17 and 18?" — video interpolation / super-resolution at any FPS.
"When was the joint at maximum velocity?" — peak-finding without per-step search.
"Integrate the cost function exactly between two events" — clean reward shaping.

Discrete models can only answer questions defined on their grid. Continuous models answer continuous questions.

7. Latent SDEs for principled uncertainty

dz = f(z,t) dt + g(z,t) dW is the same machinery as score-based diffusion, but used as a world model, not a generator. The diffusion term gives calibrated aleatoric uncertainty that propagates correctly forward in time — discrete ensembles fake this by sampling.

Score-based world models (Latent SDE, neural SDE) are a small but growing literature; if you care about risk-sensitive control, this matters.

And — to keep it honest — where ODEs don't help

Discrete event-driven systems: game state transitions, dialog acts, symbolic reasoning. There is no dz/dt because nothing is "between" the discrete states.
Tokenized regimes: LLMs, VQ-VAE world models like IRIS or Genie (Bruce 2024). The latent is a sequence of discrete codes; smoothing them with an ODE doesn't help.
Short-horizon high-frequency rollouts at uniform rate (Atari, most current video models): the discrete machinery is fine, and the implementation overhead of ODE solvers is real engineering tax.
When you don't actually care about intermediate states: if the only thing you ever ask the model is "next frame at +30 ms", continuous internal dynamics buy you nothing.

The honest summary is: continuous-time pays off when the target task is continuous (irregular sampling, physical conservation, control synthesis, multi-timescale), not just because continuous-time is conceptually elegant.

So where does this leave the JEPA + Neural ODE pitch from blog 01?

The strongest version of the pitch is not "replace all world models with continuous latents." It's:

For robotic / physical / multi-sensor world models — exactly the kind World Labs, Wayve, and Toyota Research care about — the data is irregular, the dynamics are structured, the horizons are long, and the downstream task is control. Every single one of those favors continuous latent dynamics over discrete-step RSSMs.

JEPA gives you the right space. Neural ODEs give you the right evolution in that space. The two are complementary; the combined object is a defensible model class, not a vanity project.