Combating Textual Noise and Redundancy: Entropy-Aware Dense Visual Token Pruning
By Xuehui Wang, Xuankun Yang, Wei Shen
"Proposes entropy-aware dense pruning (EADP) that filters textual noise via entropy and uses submodular maximization for non-redundant token selection, improving VLM efficiency under tight budgets."
Abstract
Visual token pruning is a crucial strategy for accelerating VLMs by compressing redundant image patches, yet existing methods often fail to preserve critical cues under dense instructions and fine-grained queries. In this paper, we investigate this failure and identify two underlying bottlenecks: the widespread dispersion of textual noise that corrupts dense cross-modal scoring, and the feature fragmentation inherent to standard token selection. To address these issues, we propose Entropy-Aware Dense Pruning (EADP), a framework that reformulates pruning as a structured compression problem. EADP first leverages statistical entropy to quantify and filter out textual noise, yielding a robust, fine-grained instruction relevance score. Subsequently, instead of naive Top-K selection, EADP casts token selection as a submodular maximization problem with a spatial prior, explicitly ensuring a holistic and non-redundant visual representation. Extensive experiments demonstrate that EADP improves the accuracy-efficiency trade-off of VLMs, robustly preserving fine-grained visual cues under strict token budgets while achieving SoTA performance on challenging multimodal benchmarks.
Technical Analysis & Implementation
Technical Breakdown§
Problem & Motivation§
Vision-Language Models (VLMs) process many visual tokens from images, leading to high computational cost. Prior token pruning methods use cross-modal attention scores to select top-K tokens, but these scores are corrupted by textual noise (e.g., irrelevant words) and selected tokens often exhibit redundancy due to feature fragmentation. This paper addresses these issues with Entropy-Aware Dense Pruning (EADP).
Core Methodology§
1. Textual Noise Filtering via Entropy Given a text instruction $T$ with $N$ tokens, compute the entropy of each token's attention distribution over initial visual tokens $V$: $$ H(t_i) = -\sum_{j} p_{ij} \log p_{ij}, \quad p_{ij} = \frac{\exp(q_i \cdot k_j)}{\sum_k \exp(q_i \cdot k_k)} $$ Tokens with high entropy (uniform attention) are considered noisy and filtered out. Retained tokens form a clean instruction set $T'$.
2. Dense Cross-Modal Scoring Recompute relevance scores $s_j$ for each visual token $v_j$ as the average attention from clean text tokens: $$ s_j = \frac{1}{|T'|} \sum_{t_i \in T'} a_{ij}, \quad a_{ij} = \text{softmax}(q_i \cdot k_j / \tau) $$ These scores are more robust to textual noise.
3. Structured Token Selection via Submodular Maximization Instead of Top-K selection, EADP casts selection as: $$ \max_{S \subseteq V, |S|=K} F(S) = \underbrace{\sum_{v_j \in S} s_j}_{\text{relevance}} + \lambda \underbrace{\sum_{v_j \in S} \sum_{v_k \in V, k \neq j} \phi(d_{jk})}_{\text{diversity penalty}} $$ where $d_{jk}$ is spatial distance (Euclidean coordinates) and $\phi(\cdot)$ is a decreasing function (e.g., Gaussian). This is a submodular function with a cardinality constraint, solved via a greedy algorithm that iteratively adds the token with highest marginal gain: $$ v^* = \arg\max_{v_j \in V \setminus S} \left( s_j + \lambda \sum_{v_k \in S} \phi(d_{jk}) \right) $$
Implementation Details§
- Architecture: Inserted after a few transformer layers in LLaVA-like VLMs.
- Entropy threshold: Remove tokens with $H(t_i) > \tau_H$ (e.g., 90th percentile).
- Diversity weight $\lambda$ tuned per benchmark.
- Code snippet (PyTorch-like):
def eadp_prune(attn_scores, text_feats, visual_coords, budget, entropy_thresh, lambda_reg):
# attn_scores: (L_text, L_visual)
text_entropy = -torch.sum(attn_scores * torch.log(attn_scores + 1e-8), dim=1)
clean_idx = text_entropy < entropy_thresh
clean_scores = attn_scores[clean_idx].mean(dim=0) # (L_visual,)
selected = []
candidates = set(range(len(clean_scores)))
for _ in range(budget):
best_token = None
best_gain = -float('inf')
for j in candidates:
marginal_reward = clean_scores[j]
diversity = 0
for k in selected:
dist = torch.norm(visual_coords[j] - visual_coords[k])
diversity += torch.exp(-dist) # Gaussian kernel
gain = marginal_reward + lambda_reg * diversity
if gain > best_gain:
best_gain = gain
best_token = j
selected.append(best_token)
candidates.remove(best_token)
return selectedResults & Significance§
EADP achieves state-of-the-art accuracy-efficiency trade-offs on VQA and captioning benchmarks (e.g., COCO, VizWiz, TextVQA) under 20-50% token budgets. It consistently outperforms top-K and prior adaptive pruning methods, especially under dense instructions (e.g., scene text queries) and fine-grained tasks.
Interactive LLM Token & Cost Calculator
Estimate token usage and model pricing. Enter your prompt below to see how it is parsed into tokens and calculate the exact API cost for different providers.
Cost Breakdown (USD)
API Pricing Comparison (per Million Tokens)
| Model | Input | Output |
|---|---|---|
| GPT-5.5 | $5.00 | $30.00 |
| MiniMax M1 | $0.40 | $2.20 |
| DeepSeek V3.1 | $0.21 | $0.79 |
| GLM 4.7 Flash | $0.06 | $0.40 |
| Mistral Medium 3.1 | $0.40 | $2.00 |
| GPT-5 | $1.25 | $10.00 |
| Gemini 2.5 Flash | $0.30 | $2.50 |
| GPT-5.2-Codex | $1.75 | $14.00 |
| GPT-4o-mini | $0.15 | $0.60 |
| Seed 1.6 Flash | $0.07 | $0.30 |
| o3 Pro | $20.00 | $80.00 |
| Gemini 2.5 Pro Preview 06-05 | $1.25 | $10.00 |
| Seed 1.6 | $0.25 | $2.00 |
| R1 0528 | $0.50 | $2.15 |
| Claude Opus 4 | $15.00 | $75.00 |
| Gemma 3n 4B | $0.06 | $0.12 |
| Qwen3 14B | $0.10 | $0.24 |
| o4 Mini | $1.10 | $4.40 |
| Claude Sonnet 5 | $2.00 | $10.00 |
| GPT-4.1 Mini | $0.40 | $1.60 |
| GPT-4.1 Nano | $0.10 | $0.40 |
| Llama 4 Maverick | $0.15 | $0.60 |
| Gemma 3 4B | $0.05 | $0.10 |
| Nano Banana 2 Lite (Gemini 3.1 Flash Lite Image) | $0.25 | $1.50 |
| Claude Opus 4.7 | $5.00 | $25.00 |
| Gemma 3 12B | $0.05 | $0.15 |
| Claude Opus 4.6 | $5.00 | $25.00 |
| Command A | $2.50 | $10.00 |
| GPT-4o-mini Search Preview | $0.15 | $0.60 |
| GPT-4o Search Preview | $2.50 | $10.00 |
| Gemma 3 27B | $0.08 | $0.16 |
| Saba | $0.20 | $0.60 |
| o3 Mini High | $1.10 | $4.40 |
| GLM 4.5V | $0.60 | $1.80 |
| GPT-5 Chat | $1.25 | $10.00 |
| Claude Opus 4.7 (Fast) | $30.00 | $150.00 |
| GPT-5 Nano | $0.05 | $0.40 |
| Gemini 3.1 Flash Lite | $0.25 | $1.50 |
| o3 Mini | $1.10 | $4.40 |
| GPT Chat Latest | $5.00 | $30.00 |
| Mistral Medium 3.5 | $1.50 | $7.50 |
| gpt-oss-120b | $0.03 | $0.15 |
| Anthropic Claude Haiku Latest | $1.00 | $5.00 |
| o1 | $15.00 | $60.00 |
| GPT-4o (2024-11-20) | $2.50 | $10.00 |
| Mistral Large 2407 | $2.00 | $6.00 |
| Claude Sonnet 4.6 | $3.00 | $15.00 |
| Qwen2.5 Coder 32B Instruct | $0.66 | $1.00 |
| DeepSeek V3 0324 | $0.24 | $0.90 |
| MoonshotAI Kimi Latest | $0.66 | $3.41 |
| Claude Sonnet 4.5 | $3.00 | $15.00 |
| o1-pro | $150.00 | $600.00 |
| Google Gemini Flash Latest | $1.50 | $9.00 |
| Anthropic Claude Sonnet Latest | $2.00 | $10.00 |
| Mistral Small 3.1 24B | $0.35 | $0.56 |
| GPT-5 Mini | $0.25 | $2.00 |
| gpt-oss-20b | $0.03 | $0.14 |
| Qwen3.5 Plus 2026-04-20 | $0.30 | $1.80 |
| Claude Opus 4.1 | $15.00 | $75.00 |
| Qwen2.5 7B Instruct | $0.04 | $0.10 |
| Qwen3.6 Flash | $0.19 | $1.13 |
| Llama 3.2 3B Instruct | $0.05 | $0.34 |
| Llama 3.2 1B Instruct | $0.03 | $0.20 |
| Llama 3.2 11B Vision Instruct | $0.34 | $0.34 |
| Llama 4 Scout | $0.10 | $0.30 |
| Qwen3.6 27B | $0.28 | $2.40 |
| Qwen2.5 72B Instruct | $0.36 | $0.40 |
| Command R (08-2024) | $0.15 | $0.60 |
| GPT-4o (2024-08-06) | $2.50 | $10.00 |
| GPT-5.5 Pro | $30.00 | $180.00 |
| Llama 3.1 8B Instruct | $0.02 | $0.03 |
| Claude Haiku 4.5 | $1.00 | $5.00 |
| Mistral Nemo | $0.02 | $0.03 |
| Gemini 3.1 Pro | $2.00 | $12.00 |
| GPT-4o-mini (2024-07-18) | $0.15 | $0.60 |
| GPT-4o | $2.50 | $10.00 |
| Gemini 3.1 Flash | $0.25 | $1.50 |
| GPT-4o (2024-05-13) | $5.00 | $15.00 |
| Claude Opus 4.8 | $5.00 | $25.00 |
| Llama 3 8B Instruct | $0.14 | $0.14 |
| Hy3 preview | $0.06 | $0.21 |
| Mixtral 8x22B Instruct | $2.00 | $6.00 |
| Mistral Large | $2.00 | $6.00 |
| GPT-5.4 Image 2 | $8.00 | $15.00 |
| Claude Opus 4.5 | $5.00 | $25.00 |
| Mistral Large 3 | $0.50 | $1.50 |
| GPT-3.5 Turbo (older v0613) | $1.00 | $2.00 |
| GPT-4 Turbo Preview | $10.00 | $30.00 |
| GPT-3.5 Turbo Instruct | $1.50 | $2.00 |
| GPT-3.5 Turbo 16k | $3.00 | $4.00 |
| GPT-4 | $30.00 | $60.00 |
| MiniMax M2.7 | $0.18 | $0.72 |
| Command R+ | $2.50 | $10.00 |
| Claude Sonnet 4 | $3.00 | $15.00 |
| Command R | $0.15 | $0.60 |
| GPT-5.4 Nano | $0.20 | $1.25 |
| GPT-5.4 Mini | $0.75 | $4.50 |
| Claude 3 Haiku | $0.25 | $1.25 |
| GPT-3.5 Turbo | $0.50 | $1.50 |
| Mistral Small 3 | $0.07 | $0.20 |
| Llama 4 Maverick | $0.15 | $0.60 |
| Llama 3.3 70B Instruct | $0.10 | $0.32 |
| Mistral Small 4 | $0.15 | $0.60 |
| GLM 5 Turbo | $1.20 | $4.00 |
| Grok 4.20 | $1.25 | $2.50 |
| GPT-5.3-Codex | $1.75 | $14.00 |
| Gemini 3.1 Pro Preview | $2.00 | $12.00 |
| Qwen3.5 Plus 2026-02-15 | $0.26 | $1.56 |
| Qwen3 30B A3B Thinking 2507 | $0.13 | $1.56 |
| Gemini 3.5 Flash | $1.50 | $9.00 |
| Grok 4.3 | $1.25 | $2.50 |
| Step 3.5 Flash | $0.10 | $0.30 |
| Gemini 2.5 Pro | $1.25 | $10.00 |
| Kimi K2.5 | $0.38 | $2.02 |
| Llama 3.2 11B Vision | $0.34 | $0.34 |
| DeepSeek R1 | $0.70 | $2.50 |
| DeepSeek V4 Pro | $0.43 | $0.87 |
| DeepSeek V4 Flash | $0.09 | $0.18 |
| DeepSeek V3.2 | $0.23 | $0.34 |
| Nano Banana Pro (Gemini 3 Pro Image Preview) | $2.00 | $12.00 |
| GPT-5.1 | $1.25 | $10.00 |
| GPT-5 Image Mini | $2.50 | $2.00 |
| Nano Banana 2 (Gemini 3.1 Flash Image) | $0.50 | $3.00 |
| Nano Banana Pro (Gemini 3 Pro Image) | $2.00 | $12.00 |
| Claude Opus 4.8 (Fast) | $10.00 | $50.00 |
| Qwen3 235B A22B Thinking 2507 | $0.15 | $1.50 |
| Qwen3 Coder 480B A35B | $0.22 | $1.80 |
| UI-TARS 7B | $0.10 | $0.20 |
| Lyria 3 Pro Preview | $0.00 | $0.00 |
| Qwen Plus 0728 | $0.26 | $0.78 |
| Seed-2.0-Lite | $0.25 | $2.00 |
| Kimi K2 0711 | $0.57 | $2.30 |
| Mistral Medium 3 | $0.40 | $2.00 |
| Gemini 2.5 Pro Preview 05-06 | $1.25 | $10.00 |
| Llama Guard 4 12B | $0.18 | $0.18 |
| Qwen3 30B A3B | $0.12 | $0.50 |
| Qwen3 8B | $0.12 | $0.46 |
| Qwen3 235B A22B | $0.46 | $1.82 |
| o4 Mini High | $1.10 | $4.40 |
| o3 | $2.00 | $8.00 |
| GPT-4 Turbo | $10.00 | $30.00 |
| Qwen 2.5 72B | $0.40 | $0.80 |
| Qwen 2.5-Coder 32B | $0.35 | $0.70 |
| Yi-Lightning | $0.15 | $0.30 |
| ERNIE 4.0 | $1.20 | $2.40 |
| Doubao Pro | $0.80 | $1.60 |
| Mistral Large 2 | $0.60 | $1.80 |
| Mixtral 8x22B | $0.50 | $1.00 |
| Llama 3.1 405B | $0.80 | $0.80 |
| Llama 3.1 8B | $0.04 | $0.04 |
| Qwen3.5-9B | $0.10 | $0.15 |
| Claude 3.5 Sonnet v2 | $3.00 | $15.00 |
| Gemini 2.0 Flash | $0.10 | $0.40 |
| Hunyuan Pro | $0.60 | $1.20 |
| GPT-5.4 Pro | $30.00 | $180.00 |
| GPT-5.4 | $2.50 | $15.00 |
| GPT-5.3 Chat | $1.75 | $14.00 |
| Gemini 3.1 Flash Lite Preview | $0.25 | $1.50 |
| Seed-2.0-Mini | $0.10 | $0.40 |
| GPT-4.1 | $2.00 | $8.00 |
| GLM 5.2 | $0.70 | $2.20 |
| Kimi K2.7 Code | $0.74 | $3.50 |
| Claude Fable Latest | $10.00 | $50.00 |
| Claude Fable 5 | $10.00 | $50.00 |
| Qwen3.7 Plus | $0.32 | $1.28 |
| MiniMax M3 | $0.30 | $1.20 |
| Step 3.7 Flash | $0.20 | $1.15 |
| Qwen3.7 Max | $1.25 | $3.75 |
| Grok Build 0.1 | $1.00 | $2.00 |
| Google Gemini Pro Latest | $2.00 | $12.00 |
| Qwen3.6 35B A3B | $0.14 | $1.00 |
| Qwen3.6 Max Preview | $1.04 | $6.24 |
| Claude Opus Latest | $5.00 | $25.00 |
| Kimi K2.6 | $0.66 | $3.41 |
| GLM 5.1 | $0.97 | $3.04 |
| Gemma 4 26B A4B | $0.06 | $0.33 |
| Gemma 4 31B | $0.12 | $0.35 |
| Qwen3.6 Plus | $0.33 | $1.95 |
| GLM 5V Turbo | $1.20 | $4.00 |
| Grok 4.20 Multi-Agent | $1.25 | $2.50 |
| Grok 4.20 | $1.25 | $2.50 |
| Lyria 3 Clip Preview | $0.00 | $0.00 |
| KAT-Coder-Pro V2 | $0.30 | $1.20 |
| Codestral 2508 | $0.30 | $0.90 |
| Qwen3 Coder 30B A3B Instruct | $0.07 | $0.27 |
| Qwen3 30B A3B Instruct 2507 | $0.05 | $0.19 |
| GLM 4.5 | $0.60 | $2.20 |
| GLM 4.5 Air | $0.13 | $0.85 |
| Gemini 2.5 Flash Lite | $0.10 | $0.40 |
| Qwen3 235B A22B Instruct 2507 | $0.09 | $0.10 |
| Hunyuan A13B Instruct | $0.14 | $0.57 |
| ERNIE 4.5 VL 424B A47B | $0.42 | $1.25 |
| Mistral Small 3.2 24B | $0.07 | $0.20 |
| Nano Banana 2 (Gemini 3.1 Flash Image Preview) | $0.50 | $3.00 |
| Qwen3.5-35B-A3B | $0.14 | $1.00 |
| Qwen3.5-27B | $0.20 | $1.56 |
| Qwen3.5-122B-A10B | $0.26 | $2.08 |
| Qwen3.5-Flash | $0.07 | $0.26 |
| Gemini 3.1 Pro Preview Custom Tools | $2.00 | $12.00 |
| Qwen3.5 397B A17B | $0.39 | $2.45 |
| MiniMax M2.5 | $0.12 | $0.48 |
| GLM 5 | $0.60 | $1.92 |
| Qwen3 Max Thinking | $0.78 | $3.90 |
| Qwen3 Coder Next | $0.11 | $0.80 |
| MiniMax M2-her | $0.30 | $1.20 |
| GPT Audio | $2.50 | $10.00 |
| GPT Audio Mini | $0.60 | $2.40 |
| MiniMax M2.1 | $0.30 | $1.20 |
| GLM 4.7 | $0.40 | $1.75 |
| Gemini 3 Flash Preview | $0.50 | $3.00 |
| GPT-5.2 Chat | $1.75 | $14.00 |
| GPT-5.2 Pro | $21.00 | $168.00 |
| GPT-5.2 | $1.75 | $14.00 |
| Devstral 2 2512 | $0.40 | $2.00 |
| GLM 4.6V | $0.30 | $0.90 |
| GPT-5.1-Codex-Max | $1.25 | $10.00 |
| Ministral 3 14B 2512 | $0.20 | $0.20 |
| Ministral 3 8B 2512 | $0.15 | $0.15 |
| Ministral 3 3B 2512 | $0.10 | $0.10 |
| Mistral Large 3 2512 | $0.50 | $1.50 |
| Qwen3 32B | $0.08 | $0.28 |
| GPT-5.1 Chat | $1.25 | $10.00 |
| GPT-5.1-Codex | $1.25 | $10.00 |
| GPT-5.1-Codex-Mini | $0.25 | $2.00 |
| Kimi K2 Thinking | $0.60 | $2.50 |
| Voxtral Small 24B 2507 | $0.10 | $0.30 |
| gpt-oss-safeguard-20b | $0.07 | $0.30 |
| MiniMax M2 | $0.26 | $1.02 |
| Qwen3 VL 32B Instruct | $0.10 | $0.42 |
| Qwen3 VL 8B Thinking | $0.12 | $1.36 |
| Qwen3 VL 8B Instruct | $0.12 | $0.46 |
| GPT-5 Image | $10.00 | $10.00 |
| o3 Deep Research | $10.00 | $40.00 |
| o4 Mini Deep Research | $2.00 | $8.00 |
| Nano Banana (Gemini 2.5 Flash Image) | $0.30 | $2.50 |
| Qwen3 VL 30B A3B Thinking | $0.13 | $1.56 |
| Qwen3 VL 30B A3B Instruct | $0.13 | $0.52 |
| GPT-5 Pro | $15.00 | $120.00 |
| GLM 4.6 | $0.43 | $1.74 |
| Qwen-Plus | $0.26 | $0.78 |
| DeepSeek V3.2 Exp | $0.27 | $0.41 |
| Gemini 2.5 Flash Lite Preview 09-2025 | $0.10 | $0.40 |
| Qwen3 VL 235B A22B Thinking | $0.26 | $2.60 |
| Qwen3 VL 235B A22B Instruct | $0.20 | $0.88 |
| Qwen3 Max | $0.78 | $3.90 |
| Qwen3 Coder Plus | $0.65 | $3.25 |
| GPT-5 Codex | $1.25 | $10.00 |
| DeepSeek V3.1 Terminus | $0.27 | $0.95 |
| Qwen3 Coder Flash | $0.20 | $0.97 |
| Qwen3 Next 80B A3B Thinking | $0.10 | $0.78 |
| Qwen3 Next 80B A3B Instruct | $0.09 | $1.10 |
| Qwen Plus 0728 (thinking) | $0.26 | $0.78 |
| Kimi K2 0905 | $0.60 | $2.50 |
| Llama 3.1 70B Instruct | $0.40 | $0.40 |
| Qwen2.5 VL 72B Instruct | $0.80 | $1.00 |
| R1 Distill Llama 70B | $0.80 | $0.80 |
| R1 | $0.70 | $2.50 |
| MiniMax-01 | $0.20 | $1.10 |
| DeepSeek V3 | $0.20 | $0.80 |
| Command R7B (12-2024) | $0.04 | $0.15 |
| Llama 3.3 70B Instruct | $0.10 | $0.32 |
| Gemma 2 27B | $0.65 | $0.65 |