DexCompose: Reusing Dexterous Policies for Multi-Task Manipulation with a Single Hand
By Dihong Huang, Zhenyu Wei, Zhuxiu Xu, Yunchao Yao, Sikai Li, Mingyu Ding
"Proposes DexCompose, a role-aware residual composition framework that reuses pretrained dexterous policies via finger-level action ownership and dual residuals, achieving 77.4% composite success on 16 multi-task manipulation benchmarks."
Abstract
Dexterous manipulation policies can solve individual skills, but composing them to perform multiple tasks with a single hand remains challenging. Adding a new task on top of an existing manipulation skill often imposes conflicting demands on overlapping fingers and contact modes, causing destructive interference between preserving an existing manipulation outcome and executing a new one. We propose DexCompose, a role-aware residual composition framework that reuses pretrained dexterous policies for multi-task manipulation through explicit finger-level action ownership. Given two pretrained full-hand policies, DexCompose first collects successful post-task states from the first skill and performs release tests over candidate finger masks to identify which fingers are necessary for maintaining the established skill state. It then trains two asymmetric residual modules: a bounded residual stabilizer for task preservation, and a context-aware residual that adapts the frozen downstream policy only within the action subspace assigned to the new task. We evaluate the framework on 16 composite dexterous manipulation tasks spanning four object-retention skills and four downstream interactions. DexCompose achieves a 77.4% average composite success rate, demonstrating that structural action ownership with dual residuals offers a promising direction for composing dexterous skills beyond conventional policy chaining.
Technical Analysis & Implementation
DexCompose: Reusing Dexterous Policies for Multi-Task Manipulation§
Core Idea§
DexCompose addresses the challenge of composing two pretrained full-hand dexterous manipulation policies for multi-task execution with a single hand. The key insight is that overlapping finger usage between tasks causes destructive interference. The framework decomposes action spaces at the finger level and introduces two asymmetric residual modules: a stabilizer that preserves the first skill's outcome, and an adaptor that enables the second skill within a residual subspace.
Methodology§
Given two pretrained policies $\pi_1$ (e.g., object retention) and $\pi_2$ (e.g., interaction), the goal is to execute both sequentially without forgetting. DexCompose consists of three phases:
1. Finger Mask Release Test: Roll out $\pi_1$ to collect successful final states $s^$. For each candidate finger mask $m \in \{0,1\}^{D_{finger}}$ (where $D_{finger}$ is total finger DoF), release the fingers specified by $m$ (zero out their actions) and check if the object state $s_{obj}$ remains unchanged (e.g., object still grasped). Record the minimal mask that maintains the state: $m^ = \arg\min_{m} \|m\|_0$ s.t. $\text{release}(s^*, m)$ preserves object state. This identifies necessary fingers for the first skill.
2. Residual Stabilizer: Train a small network $f_\theta$ that outputs a bounded residual $\delta a_1 = \text{tanh}(f_\theta(s, z))$ added to $\pi_1$'s action only for the necessary fingers (mask $m^$). The stabilizer is trained with reinforcement learning (RL) to minimize state deviation $\|s - s^\|$ while the hand interacts with the second skill.
3. Context-Aware Residual: For the downstream policy $\pi_2$, its action space is projected onto the available fingers (complement of $m^$). A second residual module $g_\phi$ learns to output $\delta a_2$ only in that subspace. The composite action: $$a = \underbrace{(\pi_1(s) + \delta a_1) \odot m^}_{\text{preservation}} + \underbrace{(\pi_2(s) + \delta a_2) \odot (1 - m^*)}_{\text{new task}}$$ where $\odot$ is element-wise multiplication. Both residuals are trained jointly via RL with a composite reward $r = r_1 + r_2$, where $r_1$ penalizes object state deviation and $r_2$ rewards downstream task success.
Implementation Details§
- Policies are based on Proximal Policy Optimization (PPO) with a Gaussian action distribution.
- Stabilizer and adaptor use 2-layer MLPs (64 hidden units) with tanh activations.
- The bounded residual uses $\text{tanh}$ to keep outputs within $[-1, 1]$.
- Training uses the Isaac Gym simulator with a Shadow Hand.
Code Snippet (PyTorch-style)§
class DexComposeResidual(nn.Module):
def __init__(self, state_dim, action_dim, finger_mask):
super().__init__()
self.stabilizer = nn.Sequential(
nn.Linear(state_dim + 64, 64), nn.Tanh(),
nn.Linear(64, action_dim), nn.Tanh()
)
self.adaptor = nn.Sequential(
nn.Linear(state_dim + 64, 64), nn.Tanh(),
nn.Linear(64, action_dim), nn.Tanh()
)
self.register_buffer('mask', finger_mask)
def forward(self, state, z1, z2):
# z1, z2 are context embeddings (e.g., from task encoders)
delta1 = self.stabilizer(torch.cat([state, z1], dim=-1))
delta2 = self.adaptor(torch.cat([state, z2], dim=-1))
# composite action
a1 = (pi1(state) + delta1) * self.mask
a2 = (pi2(state) + delta2) * (1 - self.mask)
return a1 + a2Results§
Evaluated on 16 composite tasks (4 object retention + 4 interactions). Average success rate 77.4%, outperforming naive chaining (19.2%) and fine-tuning baselines (39.1%). Ablations confirm the finger mask and dual residuals are crucial.
Key Equations§
Composite action: $$a = (\pi_1 + \delta_1) \odot m^ + (\pi_2 + \delta_2) \odot (1-m^)$$
Finger mask optimization: $$m^ = \min_{m \in \mathcal{M}} \|m\|_0 \quad \text{s.t.} \quad \|\Phi(s_{\text{release}}) - \Phi(s^)\| < \epsilon$$ where $\Phi$ is object state encoder.
Conclusion§
DexCompose provides a principled way to compose dexterous policies by assigning explicit finger-level ownership, enabling effective multi-task manipulation with a single hand.