Is One Layer Enough? Training A Single Transformer Layer Can Match Full-Parameter RL Training

Abstract

Reinforcement learning (RL) has become a central component of post-training large language models (LLMs), yet little is understood about how RL adaptation is distributed across transformer layers. Existing approaches typically update all model parameters uniformly, implicitly assuming that every layer contributes similarly to the gains obtained during RL post-training. In this work, we challenge this assumption through a systematic layer-wise study of RL training. Surprisingly, we find that training a single transformer layer can recover most of the gains achieved by full-parameter RL training, and in some cases even surpass it. To quantify this phenomenon, we introduce the quantity layer contribution, which measures the fraction of full RL improvement recovered by training a layer in isolation. Across seven models spanning two model families (Qwen3, Qwen2.5), three RL algorithms (GRPO, GiGPO, Dr. GRPO), and multiple task domains including mathematical reasoning, code generation, and agentic decision-making, we observe a remarkably stable pattern: RL gains are highly concentrated in a small subset of, and in many cases even a single, transformer layers. More strikingly, the same structural pattern consistently emerges: high-contribution layers concentrate in the middle of the transformer stack, while layers near the input and output ends contribute substantially less. The resulting layer rankings remain strongly correlated across datasets, tasks, model families, and RL algorithms.

Technical Analysis & Implementation

Technical Breakdown§

Core Methodology§

The paper investigates how reinforcement learning (RL) post-training affects different transformer layers in large language models (LLMs). The authors propose a metric called layer contribution to quantify the fraction of full-parameter RL improvement recovered by training only a single layer in isolation. Formally, for a given layer $\ell$, the contribution is:

$$ C_\ell = \frac{\text{Score}_{\text{RL-only}(\ell)} - \text{Score}_{\text{base}}}{\text{Score}_{\text{full-RL}} - \text{Score}_{\text{base}}} $$

where $\text{Score}_{\text{base}}$ is the pretrained model score, $\text{Score}_{\text{full-RL}}$ is the score after full-parameter RL, and $\text{Score}_{\text{RL-only}(\ell)}$ is the score after RL training only on layer $\ell$ while freezing all other layers.

Experimental Setup§

The study spans seven models from two families (Qwen3, Qwen2.5), three RL algorithms (GRPO, GiGPO, Dr. GRPO), and tasks in mathematical reasoning, code generation, and agentic decision-making. For each model and task, they perform layer-wise ablation: for each layer, they freeze all other parameters and run the standard RL algorithm (e.g., GRPO) updating only that layer's parameters. The training budget (number of RL steps) is kept the same as for the full-parameter baseline.

Key Findings§

1. Concentration of gains: In most cases, training a single layer recovers over 80% of the full-parameter improvement, and sometimes even surpasses it. \ 2. Location of high-contribution layers: The high-contribution layers consistently appear in the middle of the transformer stack (e.g., layers 20-30 out of 72 for Qwen3-32B), while input and output layers contribute minimally. \ 3. Robustness: The layer ranking (sorted by contribution) remains highly correlated across different datasets, tasks, and RL algorithms (Spearman correlation >0.9).

Implementation Details§

The training procedure modifies the standard RL loop to update only the selected layer's parameters. Below is a simplified PyTorch-style code snippet for performing RL training on a single transformer layer:

import torch
import torch.nn as nn

class SingleLayerRLTrainer:
    def __init__(self, model, layer_idx, lr=1e-5):
        self.model = model
        self.layer_idx = layer_idx
        # Freeze all parameters
        for param in model.parameters():
            param.requires_grad = False
        # Unfreeze the selected layer
        for param in model.transformer.layers[layer_idx].parameters():
            param.requires_grad = True
        self.optimizer = torch.optim.AdamW(
            model.transformer.layers[layer_idx].parameters(), lr=lr
        )
    
    def train_step(self, batch):
        # Standard RL loss (e.g., from GRPO)
        loss = compute_rl_loss(self.model, batch)
        loss.backward()
        self.optimizer.step()
        self.optimizer.zero_grad()

Deeper Analysis§

The authors provide intuition: middle layers act as a "bottleneck" for learned RL behaviors, while early layers capture universal linguistic features and later layers specialize in final output distribution. The finding suggests that full-parameter RL is wasteful, and targeted layer training could be more compute-efficient.

Implications§

This work has major implications for practical RL post-training: one can identify and train only the most impactful layer(s), drastically reducing memory and computation costs without sacrificing performance. It also opens questions about layer-specific learning dynamics in LLMs.

Abstract

Technical Analysis & Implementation

Technical Breakdown§

Core Methodology§

Experimental Setup§

Key Findings§

Implementation Details§

Deeper Analysis§

Implications§

Embedding Vector Similarity Visualizer

Mathematical Formulation

Related Research

Measuring the Gap Between Human and LLM Research Ideas

When LLMs Read Tables Carelessly: Measuring and Reducing Data Referencing Errors

Introspective Coupling: Self-Explanation Training Tracks Behavioral Change Despite Fixed Supervision

Accelerate your workflow with Feedalyze