arrow_backBack to research feed
otherPublished: June 26, 2026

Which Nash Equilibrium? Solver-Dependent Selection on Zero-Sum Nash Polytopes

By Luis Leal

Research TL;DR

"Investigates solver-dependent selection within Nash polytopes of zero-sum games; regularized methods yield maximum-entropy equilibria while regret methods drift to lower-entropy faces."

Abstract

Many two-player zero-sum games admit not a unique Nash equilibrium but a convex set of them: a polytope of profiles that all share the minimax value V* yet prescribe different behaviour. Standard solvers each converge to some equilibrium and are treated as interchangeable. We ask whether they instead select different members of the Nash set, systematically as a function of the algorithm rather than the seed. Using a tabular, exactly solvable testbed of six games with analytically known Nash sets -- including a two-dimensional Nash polytope and Kuhn poker -- we find that (i) selection is determined by the algorithm, not the seed, but families differ only on asymmetric Nash sets; (ii) regularized last-iterate methods (R-NaD, magnetic mirror descent) select the maximum-entropy member, the information projection of their uniform reference onto the Nash set -- exactly on the 2-D polytope and at 99.7% of maximum entropy in Kuhn -- while regret-averaging methods (CFR, CFR+, fictitious play) drift to a lower-entropy face; we confirm this on a randomized 180-game ensemble, where R-NaD attains the maximum-entropy member in 100% of converged games while CFR+ sits strictly below it in 94% (paired Wilcoxon p < 10^-27); (iii) the selected member has downstream consequences against sub-optimal opponents that scale with sequential/hidden-information structure but stay bounded -- in Kuhn the max-entropy member is a strictly better hedge, whereas on the matrix games the members differ without either dominating. We also report two negative results correcting common intuitions: removing CFR's positive-orthant (max(R,0)) projection does not eliminate boundary drift; and R-NaD's selection is anchor-following, not initialization-independent. We state the maximum-entropy / I-projection characterization as a strongly data-supported conjecture, checked throughout against analytic ground truth.

Technical Analysis & Implementation

Technical Synopsis§

This paper systematically studies the phenomenon that different Nash equilibrium solvers converge to different members of the convex set of equilibria (the Nash polytope) in zero-sum games. The authors construct a tabular testbed with analytically known Nash sets, including a 2D polytope and Kuhn poker, and compare algorithms from two families: regularized last-iterate methods (R-NaD, magnetic mirror descent) and regret-averaging methods (CFR, CFR+, fictitious play).

Core Methodology§

The Nash polytope for a zero-sum game is defined as: $$\mathcal{N} = \{ (\mathbf{x}, \mathbf{y}) \mid \mathbf{x} \in \Delta^m, \mathbf{y} \in \Delta^n, \mathbf{x}^\top A \mathbf{y} = V^ \}$$ where $V^$ is the game value. The key insight is that regularized methods with a uniform reference distribution $\mathbf{p} = (1/m, \dots, 1/m)$ converge to the maximum-entropy equilibrium: $$\mathbf{x}^ = \arg\max_{\mathbf{x} \in \mathcal{N}} H(\mathbf{x}), \quad \mathbf{y}^ = \arg\max_{\mathbf{y} \in \mathcal{N}} H(\mathbf{y})$$ which is equivalent to the information projection (I-projection) of the uniform reference onto the Nash set: $$\mathbf{x}^* = \arg\min_{\mathbf{x} \in \mathcal{N}} D_{KL}(\mathbf{x} \| \mathbf{p})$$

Algorithms Studied§

  • R-NaD: Regularized Nash Dynamics. Updates using softmax with temperature decay, adding an entropy bonus to the payoff.
  • Magnetic Mirror Descent (MMD): Mirror descent with a strongly convex regularizer that penalizes divergence from a reference.
  • CFR/CFR+: Counterfactual Regret Minimization. Accumulates regret and uses regret-matching to select actions.
  • Fictitious Play (FP): Best response to empirical average of opponent's actions.

Key Findings§

1. Algorithm-dependent selection: Within each family, seed randomness does not cause variation; the deterministic part of the algorithm dictates the selected equilibrium. Regularized methods converge to the maximum-entropy member (verified analytically on 2D polytope, >99.7% on Kuhn). Regret methods converge to lower-entropy faces (CFR+ attains max entropy in only ~6% of 180-game ensemble).

2. Consequences for robustness: The maximum-entropy equilibrium is a stricter hedge against suboptimal opponents, especially in games with hidden information (e.g., Kuhn poker). In matrix games, differences exist but no strict dominance.

3. Negative results: Removing CFR's $\max(R, 0)$ projection does not eliminate drift away from the center of the polytope. R-NaD's selection is anchor-following (depends on reference distribution), not initialization-independent.

Implementation Details§

The authors provide an exact tabular solver for small games (e.g., 3x3 matrix, Kuhn poker with 6 card states). The Nash polytope is computed via linear programming (vertex enumeration). The codebase (in Python) allows analytic verification of solver outputs against the known equilibrium set.

import numpy as np
from scipy.optimize import linprog

def nash_polytope(payoff_matrix):
    # compute V* via linear programming
    m, n = payoff_matrix.shape
    c = np.zeros(m + 1)
    c[-1] = -1
    A_ub = np.hstack([payoff_matrix, -np.ones((m, 1))])
    b_ub = np.zeros(m)
    A_eq = np.ones((1, m + 1))
    b_eq = np.array([1.0])
    bounds = [(0, None)] * m + [(None, None)]
    res = linprog(c, A_ub=A_ub, b_ub=b_ub, A_eq=A_eq, b_eq=b_eq, bounds=bounds, method='highs')
    V_star = -res.fun
    # then compute full polytope via vertex enumeration (omitted for brevity)
    return V_star

def rnad_update(strategy, payoff, temperature, reference):
    # single step of regularized Nash dynamics
    logits = payoff @ strategy
    strategy_new = np.exp((logits + temperature * np.log(reference))) 
    strategy_new /= np.sum(strategy_new)
    return strategy_new

Open Conjecture§

The maximum-entropy / I-projection characterization is stated as a strongly supported conjecture, with analytic verification on small polytopes and empirical backing on 180 games.