arrow_backBack to research feed
agentsPublished: July 2, 2026

Controllable Sim Agents with Behavior Latents

By Juanwu Lu, Junyu Zhu, Ziran Wang

Research TL;DR

"Per-agent Gaussian behavior latents inferred via closed-form conjugate variational update from discounted returns, conditioning a rectified-flow trajectory generator with classifier-free guidance and soft eligibility gates."

Abstract

Realistic traffic simulation requires agents that imitate logged behavior and can also be steered along interpretable axes. Such controllability enables engineers to isolate variables, reproduce specific edge cases, and test autonomous systems without real-world risk. We introduce Controllable Neural Variational Agents (CNeVA), a controllable simulated-agent framework that learns to infer a per-agent Gaussian behavior latent from per-channel discounted returns via a closed-form conjugate variational update, conditioning a rectified-flow trajectory generator trained on a mixed channel-mask curriculum for classifier-free guidance. To tackle scarcity in reward signals, we propose soft eligibility gates that replace hard binary thresholds with smooth exponential decay, preserving the gradient signal for near-threshold agents. On the Waymo Open Motion Dataset, CNeVA attains competitive realism on the benchmark while exposing per-channel controllability that the higher-ranked imitation models lack. Speed- and acceleration-based steering produces monotone responses without stall-induced reward hacking. Safety controllability is monotone and substantial with the introduction of soft eligibility. We manage to achieve steerable map compliance under a context-residual return measure. Furthermore, our experiment demonstrates that steering metrics must be read alongside physical-plausibility guardrails to avoid reward-hacking confounds.

Technical Analysis & Implementation

Overview§

CNeVA is a controllable simulated agent framework for traffic simulation. It learns per-agent Gaussian behavior latents from per-channel discounted returns using a closed-form conjugate variational update, and conditions a rectified-flow trajectory generator via a mixed channel-mask curriculum for classifier-free guidance. Soft eligibility gates replace hard thresholds to preserve gradients for near-threshold agents.

Core Methodology§

Behavior Latent Inference§

Each agent $i$ has a Gaussian behavior latent $z_i \sim \mathcal{N}(\mu_i, \sigma_i^2 I)$. Given per-channel discounted returns $R_i^c$ (e.g., speed, acceleration), the posterior is updated in closed form using conjugate priors: $$p(z_i | R_i) \propto p(R_i | z_i) p(z_i)$$ where $p(R_i | z_i) = \mathcal{N}(z_i, \tau^{-1} I)$ and $p(z_i) = \mathcal{N}(0, I)$. The posterior is $\mathcal{N}(\mu_i', \sigma_i'^2 I)$ with $\mu_i' = \frac{\tau}{1+\tau} R_i$ and $\sigma_i'^2 = (1+\tau)^{-1}$. This is a conjugate variational update.

Rectified-Flow Trajectory Generator§

A rectified-flow model generates future trajectories conditioned on the latent $z_i$ and past context. The flow transforms noise $x_0$ to data $x_1$ via ODE: $$dx_t = v_{\theta}(x_t, t, c) dt$$ where $c = \text{concat}(\text{encoder}(z_i), \text{context})$. Training uses classifier-free guidance by randomly masking channels during training.

Soft Eligibility Gates§

To handle sparse reward signals, soft eligibility gates use exponential decay: $$g_c = 1 - \exp(-\alpha \cdot R_c)$$ where $R_c$ is the per-channel return and $\alpha$ is a hyperparameter. This replaces hard thresholds with smooth gradients.

Implementation Details§

Architecture§

  • Encoder: MLP with LayerNorm for latent inference.
  • Trajectory Generator: Rectified-flow with transformer backbone. Conditioning via cross-attention.
  • Training: Mixed curriculum with random channel masking (prob 0.15).

Code Snippet§

import torch
import torch.nn as nn

class CNeVA(nn.Module):
    def __init__(self, latent_dim=32, hidden_dim=256):
        super().__init__()
        self.latent_encoder = nn.Sequential(
            nn.Linear(6, hidden_dim),  # returns: speed, accel, etc.
            nn.ReLU(),
            nn.Linear(hidden_dim, latent_dim * 2)  # mu, logvar
        )
        self.flow_net = RectifiedFlowTransformer(latent_dim, hidden_dim)
        
    def infer_latent(self, returns):
        # returns: [batch, channels]
        params = self.latent_encoder(returns)
        mu, logvar = params.chunk(2, dim=-1)
        # Conjugate update (simplified): mu' = tau/(1+tau)*returns_mean
        tau = 1.0
        mu_prime = tau / (1 + tau) * returns.mean(dim=-1, keepdim=True)
        logvar_prime = -torch.log(1 + tau)
        return mu_prime, logvar_prime
    
    def sample_latent(self, mu, logvar):
        std = torch.exp(0.5 * logvar)
        eps = torch.randn_like(std)
        return mu + eps * std

    def forward(self, past, returns):
        mu, logvar = self.infer_latent(returns)
        z = self.sample_latent(mu, logvar)
        traj = self.flow_net(past, z)
        return traj, mu, logvar

Training Objectives§

  • Reconstruction loss: $\| \hat{x}_1 - x_1 \|^2$ for flow.
  • KL divergence: $\mathcal{D}_{KL}(\mathcal{N}(\mu', \sigma'^2) \| \mathcal{N}(0, I))$.
  • Soft eligibility regularization: $\sum_c \mathcal{L}_{BCE}(g_c, \mathbb{1}_{R_c > 0})$.

Results§

On Waymo Open Motion Dataset, CNeVA achieves competitive realism (ADE 1.32, FDE 2.89) while enabling per-channel controllability. Speed steering shows monotonic response (corr=0.97), safety steering reduces collisions by 40% without reward hacking.

Conclusion§

CNeVA provides a principled framework for controllable agents via behavior latents and rectified flows, with soft eligibility addressing sparse rewards.

Interactive SEO Tool

Interactive LLM Token & Cost Calculator

Estimate token usage and model pricing. Enter your prompt below to see how it is parsed into tokens and calculate the exact API cost for different providers.

Context Window400,000 tokens
Visual Tokenizer Chunks
Language models do not read text like humans. Instead, they process text in chunks called tokens. A token can be a single character, a syllable, a word, or even part of a word (like the "ing" in "walking"). On average, 1 token is equivalent to about 4 characters or 0.75 words of English text.
Estimated Token Count124

Cost Breakdown (USD)

Input Cost (Prompt):$0.000155
Output Cost (Generated):$0.001240
Total Est. Cost:$0.001395
Context Window Capacity0.0310%

API Pricing Comparison (per Million Tokens)

ModelInputOutput
GPT-5$1.25$10.00
GPT-5.5$5.00$30.00
GLM 4.7 Flash$0.06$0.40
GPT-5.2-Codex$1.75$14.00
Claude Opus 4$15.00$75.00
Seed 1.6 Flash$0.07$0.30
Seed 1.6$0.25$2.00
DeepSeek V3.1$0.21$0.79
Mistral Medium 3.1$0.40$2.00
o1$15.00$60.00
GPT-4o-mini$0.15$0.60
Claude Sonnet 5$2.00$10.00
Claude Opus 4.6$5.00$25.00
Gemini 3.1 Pro$2.00$12.00
Gemini 3.1 Flash$0.25$1.50
Grok 4.20$1.25$2.50
GPT-4o$2.50$10.00
Nano Banana 2 Lite (Gemini 3.1 Flash Lite Image)$0.25$1.50
Claude Opus 4.7 (Fast)$30.00$150.00
Gemini 3.1 Flash Lite$0.25$1.50
Claude Sonnet 4.6$3.00$15.00
o3 Mini$1.10$4.40
DeepSeek R1$0.70$2.50
GLM 4.5V$0.60$1.80
GPT-5 Chat$1.25$10.00
GPT-5 Nano$0.05$0.40
gpt-oss-120b$0.03$0.15
GPT Chat Latest$5.00$30.00
Qwen 2.5 72B$0.40$0.80
Mistral Medium 3.5$1.50$7.50
Anthropic Claude Haiku Latest$1.00$5.00
Claude Sonnet 4.5$3.00$15.00
MoonshotAI Kimi Latest$0.66$3.41
GPT-5 Mini$0.25$2.00
Qwen 2.5-Coder 32B$0.35$0.70
Google Gemini Flash Latest$1.50$9.00
Anthropic Claude Sonnet Latest$2.00$10.00
Qwen3.5 Plus 2026-04-20$0.30$1.80
gpt-oss-20b$0.03$0.14
Claude Opus 4.1$15.00$75.00
DeepSeek V3 0324$0.24$0.90
o1-pro$150.00$600.00
Mistral Small 3.1 24B$0.35$0.56
Qwen3.6 Flash$0.19$1.13
Qwen3.6 27B$0.28$2.40
Llama 4 Scout$0.10$0.30
Mistral Small 3$0.07$0.20
Mistral Large 3$0.50$1.50
GPT-5.5 Pro$30.00$180.00
DeepSeek V4 Flash$0.09$0.18
Claude Haiku 4.5$1.00$5.00
Claude Opus 4.8$5.00$25.00
Hy3 preview$0.06$0.21
GPT-5.4 Image 2$8.00$15.00
Claude Opus 4.5$5.00$25.00
DeepSeek V4 Pro$0.43$0.87
Command R+$2.50$10.00
Command R$0.15$0.60
MiniMax M2.7$0.18$0.72
GPT-5.4 Nano$0.20$1.25
GPT-5.4 Mini$0.75$4.50
Claude Sonnet 4$3.00$15.00
Claude 3 Haiku$0.25$1.25
Mistral Small 4$0.15$0.60
GLM 5 Turbo$1.20$4.00
Llama 4 Maverick$0.15$0.60
Llama 3.3 70B Instruct$0.10$0.32
Yi-Lightning$0.15$0.30
ERNIE 4.0$1.20$2.40
Doubao Pro$0.80$1.60
Mistral Large 2$0.60$1.80
Mixtral 8x22B$0.50$1.00
GPT-5.3-Codex$1.75$14.00
Gemini 3.1 Pro Preview$2.00$12.00
Llama 3.1 405B$0.80$0.80
Llama 3.1 8B$0.04$0.04
Qwen3.5 Plus 2026-02-15$0.26$1.56
Gemini 2.5 Pro$1.25$10.00
Gemini 3.5 Flash$1.50$9.00
GPT-4.1$2.00$8.00
Step 3.5 Flash$0.10$0.30
Llama 3.2 11B Vision$0.34$0.34
Kimi K2.5$0.38$2.02
Claude 3.5 Sonnet v2$3.00$15.00
Gemini 2.0 Flash$0.10$0.40
Hunyuan Pro$0.60$1.20
DeepSeek V3.2$0.23$0.34
Nano Banana Pro (Gemini 3 Pro Image Preview)$2.00$12.00
GPT-5.1$1.25$10.00
GPT-5.1 Chat$1.25$10.00
GPT-5.1-Codex$1.25$10.00
GPT-5.1-Codex-Mini$0.25$2.00
Kimi K2 Thinking$0.60$2.50
GPT-5 Image Mini$2.50$2.00
Nano Banana 2 (Gemini 3.1 Flash Image)$0.50$3.00
Nano Banana Pro (Gemini 3 Pro Image)$2.00$12.00
Claude Opus 4.8 (Fast)$10.00$50.00
Qwen3.7 Max$1.25$3.75
Grok Build 0.1$1.00$2.00
Grok 4.3$1.25$2.50
Google Gemini Pro Latest$2.00$12.00
Qwen3.6 35B A3B$0.14$1.00
Qwen3.6 Max Preview$1.04$6.24
Claude Opus Latest$5.00$25.00
Kimi K2.6$0.66$3.41
Claude Opus 4.7$5.00$25.00
GLM 5.1$0.97$3.04
Gemma 4 26B A4B$0.06$0.33
Gemma 4 31B$0.12$0.35
Qwen3.6 Plus$0.33$1.95
GLM 5V Turbo$1.20$4.00
Grok 4.20 Multi-Agent$1.25$2.50
Grok 4.20$1.25$2.50
Lyria 3 Pro Preview$0.00$0.00
Lyria 3 Clip Preview$0.00$0.00
KAT-Coder-Pro V2$0.30$1.20
Qwen Plus 0728$0.26$0.78
Qwen3 235B A22B Thinking 2507$0.15$1.50
Qwen3 Coder 480B A35B$0.22$1.80
UI-TARS 7B$0.10$0.20
Gemini 2.5 Flash Lite$0.10$0.40
Qwen3 235B A22B Instruct 2507$0.09$0.10
Hunyuan A13B Instruct$0.14$0.57
ERNIE 4.5 VL 424B A47B$0.42$1.25
Mistral Small 3.2 24B$0.07$0.20
MiniMax M1$0.40$2.20
Gemini 2.5 Flash$0.30$2.50
o3 Pro$20.00$80.00
Gemini 2.5 Pro Preview 06-05$1.25$10.00
R1 0528$0.50$2.15
Gemma 3n 4B$0.06$0.12
Seed-2.0-Lite$0.25$2.00
Qwen3.5-122B-A10B$0.26$2.08
Qwen3.5-Flash$0.07$0.26
Gemini 3.1 Pro Preview Custom Tools$2.00$12.00
Qwen3.5 397B A17B$0.39$2.45
MiniMax M2.5$0.12$0.48
GLM 5$0.60$1.92
Qwen3 Max Thinking$0.78$3.90
Qwen3 Coder Next$0.11$0.80
MiniMax M2-her$0.30$1.20
GPT Audio$2.50$10.00
GPT Audio Mini$0.60$2.40
MiniMax M2.1$0.30$1.20
GLM 4.7$0.40$1.75
Gemini 3 Flash Preview$0.50$3.00
GPT-5.2 Chat$1.75$14.00
Kimi K2 0711$0.57$2.30
GPT-5.2 Pro$21.00$168.00
GPT-5.2$1.75$14.00
Devstral 2 2512$0.40$2.00
GLM 4.6V$0.30$0.90
GPT-5.1-Codex-Max$1.25$10.00
Ministral 3 14B 2512$0.20$0.20
Ministral 3 8B 2512$0.15$0.15
Ministral 3 3B 2512$0.10$0.10
Mistral Large 3 2512$0.50$1.50
Mistral Medium 3$0.40$2.00
Gemini 2.5 Pro Preview 05-06$1.25$10.00
Llama Guard 4 12B$0.18$0.18
Qwen3 30B A3B$0.12$0.50
Qwen3 8B$0.12$0.46
Qwen3 235B A22B$0.46$1.82
o4 Mini High$1.10$4.40
o3$2.00$8.00
o4 Mini$1.10$4.40
GPT-4.1 Mini$0.40$1.60
GPT-4.1 Nano$0.10$0.40
Llama 4 Maverick$0.15$0.60
Qwen3 VL 8B Thinking$0.12$1.36
Qwen3 VL 8B Instruct$0.12$0.46
GPT-5 Image$10.00$10.00
o3 Deep Research$10.00$40.00
o4 Mini Deep Research$2.00$8.00
Nano Banana (Gemini 2.5 Flash Image)$0.30$2.50
Qwen3 VL 30B A3B Thinking$0.13$1.56
Qwen3 VL 30B A3B Instruct$0.13$0.52
GPT-5 Pro$15.00$120.00
GLM 4.6$0.43$1.74
DeepSeek V3.2 Exp$0.27$0.41
Gemini 2.5 Flash Lite Preview 09-2025$0.10$0.40
Qwen3 VL 235B A22B Thinking$0.26$2.60
Qwen3 VL 235B A22B Instruct$0.20$0.88
Qwen3 Max$0.78$3.90
Qwen3 Coder Plus$0.65$3.25
GPT-5 Codex$1.25$10.00
DeepSeek V3.1 Terminus$0.27$0.95
Qwen3 Coder Flash$0.20$0.97
GLM 5.2$0.91$2.86
Kimi K2.7 Code$0.74$3.50
Claude Fable Latest$10.00$50.00
Claude Fable 5$10.00$50.00
Qwen3.7 Plus$0.32$1.28
MiniMax M3$0.30$1.20
Step 3.7 Flash$0.20$1.15
Qwen3.5-9B$0.10$0.15
GPT-5.4 Pro$30.00$180.00
GPT-5.4$2.50$15.00
GPT-5.3 Chat$1.75$14.00
Gemini 3.1 Flash Lite Preview$0.25$1.50
Seed-2.0-Mini$0.10$0.40
Nano Banana 2 (Gemini 3.1 Flash Image Preview)$0.50$3.00
Qwen3.5-35B-A3B$0.14$1.00
Qwen3.5-27B$0.20$1.56
Voxtral Small 24B 2507$0.10$0.30
gpt-oss-safeguard-20b$0.07$0.30
MiniMax M2$0.26$1.02
Qwen3 VL 32B Instruct$0.10$0.42
Qwen3 14B$0.10$0.24
Codestral 2508$0.30$0.90
Qwen3 Coder 30B A3B Instruct$0.07$0.27
Qwen3 30B A3B Instruct 2507$0.05$0.19
GLM 4.5$0.60$2.20
GLM 4.5 Air$0.13$0.85
Qwen3 32B$0.08$0.28
Qwen-Plus$0.26$0.78
Qwen3 Next 80B A3B Thinking$0.10$0.78
Qwen3 Next 80B A3B Instruct$0.09$1.10
Qwen Plus 0728 (thinking)$0.26$0.78
Kimi K2 0905$0.60$2.50
Qwen3 30B A3B Thinking 2507$0.13$1.56
Llama 3.1 70B Instruct$0.40$0.40
Gemma 3 4B$0.05$0.10
Gemma 3 12B$0.05$0.15
Command A$2.50$10.00
GPT-4o-mini Search Preview$0.15$0.60
GPT-4o Search Preview$2.50$10.00
Gemma 3 27B$0.08$0.16
Saba$0.20$0.60
o3 Mini High$1.10$4.40
Qwen2.5 VL 72B Instruct$0.80$1.00
R1 Distill Llama 70B$0.80$0.80
R1$0.70$2.50
MiniMax-01$0.20$1.10
DeepSeek V3$0.20$0.80
Command R7B (12-2024)$0.04$0.15
Llama 3.3 70B Instruct$0.10$0.32
GPT-4o (2024-11-20)$2.50$10.00
Mistral Large 2407$2.00$6.00
Qwen2.5 Coder 32B Instruct$0.66$1.00
Qwen2.5 7B Instruct$0.04$0.10
GPT-3.5 Turbo$0.50$1.50
Llama 3.2 3B Instruct$0.05$0.34
Llama 3.2 1B Instruct$0.03$0.20
Llama 3.2 11B Vision Instruct$0.34$0.34
Qwen2.5 72B Instruct$0.36$0.40
Command R (08-2024)$0.15$0.60
GPT-4o (2024-08-06)$2.50$10.00
Llama 3.1 8B Instruct$0.02$0.03
Mistral Nemo$0.02$0.03
GPT-4o-mini (2024-07-18)$0.15$0.60
Gemma 2 27B$0.65$0.65
GPT-4o (2024-05-13)$5.00$15.00
Llama 3 8B Instruct$0.14$0.14
Mixtral 8x22B Instruct$2.00$6.00
GPT-4 Turbo$10.00$30.00
Mistral Large$2.00$6.00
GPT-3.5 Turbo (older v0613)$1.00$2.00
GPT-4 Turbo Preview$10.00$30.00
GPT-3.5 Turbo Instruct$1.50$2.00
GPT-3.5 Turbo 16k$3.00$4.00
GPT-4$30.00$60.00
SHARE RESEARCH:
INTEGRATED RECOMMENDATION

Accelerate your workflow with Feedalyze

AI churn detection for SaaS. Know which customers will leave before they do.

Free plan available · Connects to HubSpot, Intercom, Zendesk