DanceOPD: On-Policy Generative Field Distillation

Overview§

DanceOPD proposes an on-policy generative field distillation framework for flow-matching models. It enables a single student model to learn multiple image generation capabilities (text-to-image, local editing, global editing, classifier-free guidance, realism enhancement) by distilling from separate expert velocity fields. The key innovation is that the student is trained on its own rollout states (on-policy) rather than fixed data, allowing it to effectively compose expert behaviors without degrading individual performance.

Core Methodology§

Flow-Matching Preliminaries§

Flow-matching models define a velocity field $v(x_t, t)$ that transports samples from a noise distribution $p_1$ to a data distribution $p_0$ along a probability flow. The training objective for a single capability is: $$\mathcal{L}_{\text{FM}} = \mathbb{E}_{t, x_0, x_1} \| v_{\theta}(x_t, t) - u_t(x_t | x_0, x_1) \|^2$$ where $u_t$ is the target velocity (often derived from a linear interpolation between noise and data).

On-Policy Distillation§

DanceOPD defines each capability (e.g., T2I, editing) as a separate velocity field $v_{\text{cap}}$. The student model $v_{\theta}$ learns from multiple experts by minimizing: $$\mathcal{L}_{\text{OPD}} = \mathbb{E}_{t, x_t^{\text{rollout}}} \| v_{\theta}(x_t^{\text{rollout}}, t) - v_{\text{expert}}(x_t^{\text{rollout}}, t) \|^2$$ where $x_t^{\text{rollout}}$ are states sampled from the student's own generation trajectory (on-policy). For each sample, a routing mechanism selects which expert to distill from (e.g., based on task label or a learned router).

Multi-Capability Composition§

To compose multiple fields, DanceOPD uses a simple linear combination rule: $$v_{\text{composite}}(x_t, t) = w_1 v_{\text{T2I}} + w_2 v_{\text{edit}} + \dots$$ where weights can be fixed or dynamically adjusted. Alternatively, the student can directly learn to absorb operator-defined fields (like CFG) by adding them as additional experts.

Implementation Details§

Base model: Flow-matching architecture (e.g., DiT or similar U-Net) with sinusoidal timestep conditioning.
Experts: Pre-trained velocity fields for each capability. T2I expert is a standard text-conditioned flow model; editing experts are fine-tuned on paired edit data.
Training: Student is initialized from a pretrained T2I model. Rollouts are generated by the student in each training iteration (using a few steps of Euler integration). The distillation loss is applied for each expert on corresponding routing conditions.
Routing: A simple classifier (e.g., learned from a small amount of labeled data) predicts which expert to use for each sample during training. For inference, user provides explicit task flags.

Code Snippet (PyTorch-like)§

class DanceOPD(nn.Module):
    def __init__(self, student, experts, router):
        super().__init__()
        self.student = student  # velocity model v_theta
        self.experts = nn.ModuleList(experts)  # frozen expert fields
        self.router = router  # light classifier

    def forward(self, x_1, condition):
        # Sample noise
        t = torch.rand((x_1.shape[0], 1))
        x_t = (1 - t) * x_1 + t * torch.randn_like(x_1)  # linear interpolation
        # Student rollout (simplified: one step for illustration)
        with torch.no_grad():
            v_student = self.student(x_t, t, condition)
            x_next = x_t + (1/1000) * v_student  # one Euler step
        # Get expert velocities on student state
        task_idx = self.router(condition)  # determine which expert
        v_expert = self.experts[task_idx](x_next, t, condition)
        # Compute distillation loss
        loss = F.mse_loss(self.student(x_next, t, condition), v_expert)
        return loss

Experiments§

DanceOPD is evaluated on T2I (MS-COCO, FID), local editing (quantitative edit success rate), global editing (style transfer, CLIP score), and CFG absorption (performance with vs without CFG). Results show that the student trained with on-policy distillation maintains T2I quality while achieving strong editing capabilities, outperforming multi-task training and off-policy distillation baselines.

Key Takeaways§

On-policy sampling from student's own trajectory is crucial for stability and performance when distilling multiple velocity fields.
The framework can absorb additional operator-defined fields (e.g., CFG) without retraining experts.
Simple linear combination of expert velocities works well for composing capabilities, but routing requires task labels during training.

Abstract

Technical Analysis & Implementation

Overview§

Core Methodology§

Flow-Matching Preliminaries§

On-Policy Distillation§

Multi-Capability Composition§

Implementation Details§

Code Snippet (PyTorch-like)§

Experiments§

Key Takeaways§

Related Research

FLUX3D: High-Fidelity 3D Gaussian Generation with Diffusion-Aligned Sparse Representation

OrbitForge: Text-to-3D Scene Generation via Reconstruction-Anchored Video Synthesis

Segment anything