Second-Order KKT Guarantees for Bregman ADMM in Nonconvex and Non-Lipschitz Optimization

Technical Breakdown§

Core Problem and Assumptions§

The paper addresses linearly constrained nonconvex optimization where the objective lacks a global Lipschitz gradient. Instead, it assumes two-sided relative smoothness with respect to a Bregman kernel $D_h$: there exist constants $ ho_1, \rho_2 > 0$ such that for all $x, y$ in an invariant open domain,

$$ \rho_1 D_h(y, x) \leq f(y) - f(x) - \langle \nabla f(x), y - x \rangle \leq \rho_2 D_h(y, x). $$

This generalizes the standard smoothness and covers polynomial objectives like those in tensor factorization.

Bregman ADMM Update§

The algorithm solves $\min_{x,y} f(x) + g(y)$ subject to $Ax + By = c$ via:

\begin{align} x^{k+1} &= \operatorname{argmin}_x \left\{ f(x) + \langle \lambda^k, Ax \rangle + \frac{\beta}{2} \|Ax + By^k - c\|^2 \right\}, \\ y^{k+1} &= \operatorname{argmin}_y \left\{ g(y) + \langle \lambda^k, By \rangle + \frac{\beta}{2} \|Ax^{k+1} + By - c\|^2 + D_h(y, y^k) \right\}, \\ \lambda^{k+1} &= \lambda^k + \beta (Ax^{k+1} + By^{k+1} - c). \end{align}

The Bregman proximal term $D_h(y, y^k)$ replaces the standard quadratic penalty, allowing adaptation to the problem's geometry.

Key Theoretical Result§

By constructing a smooth primal-dual fixed-point map $T$, the authors show that KKT points satisfy $T(z) = z$. Using a spectral analysis with a determinant reduction and Bregman-specific symmetrization, they prove that strict saddle points (where the Hessian has at least one negative eigenvalue) are unstable fixed points. Consequently, from random initialization, iterates avoid strict saddles with probability 1, and converge to second-order stationary KKT points.

Distributed Star Consensus Extension§

For a star graph with $N$ agents, the problem is

$$ \min_{x_0, \{x_i\}} \sum_{i=1}^N f_i(x_i) \quad \text{s.t.} \quad x_i = x_0, $$

where $x_0$ is the central variable. The multi-block Bregman ADMM exploits the null space of the constraint matrix to apply a null space cancellation, reducing the spectral analysis to a single block.

Implementation Example§

Below is a PyTorch-like snippet for distributed matrix factorization using Bregman ADMM with the kernel $h(x) = \frac{1}{2}\|x\|^2$ (Euclidean case, simplifying to standard ADMM) or a more general kernel like Shannon entropy.

import torch

def bregman_admm(f, g, A, B, c, beta, h, T=1000):
    # Initialize primal and dual variables
    x = torch.randn(A.shape[1])
    y = torch.randn(B.shape[1])
    lam = torch.zeros(c.shape[0])
    
    for t in range(T):
        # x-update: unconstrained quadratic if f is simple
        x = torch.linalg.solve(A.T @ A, A.T @ (c - B @ y - lam/beta))
        # y-update: Bregman proximal step
        # grad_y = ... (requires solving a proximal operator)
        # y = proximal_{g, h}(y_old, ...)
        # λ-update
        lam = lam + beta * (A @ x + B @ y - c)
    return x, y, lam

For general Bregman kernels, the $y$-update involves solving

$$ y^{k+1} = \operatorname{argmin}_y \left\{ g(y) + \frac{\beta}{2}\|Ax^{k+1} + By - c + \frac{\lambda^k}{\beta}\|^2 + D_h(y, y^k) \right\}. $$

Conclusion§

The paper provides a rigorous second-order analysis for a class of nonconvex, non-Lipschitz problems, with empirical validation on matrix and tensor factorization. The techniques (determinant reduction, symmetrization, null space cancellation) may extend to other Bregman-based algorithms.

Abstract

Technical Analysis & Implementation

Technical Breakdown§

Core Problem and Assumptions§

Bregman ADMM Update§

Key Theoretical Result§

Distributed Star Consensus Extension§

Implementation Example§

Conclusion§

Related Research

Which Nash Equilibrium? Solver-Dependent Selection on Zero-Sum Nash Polytopes

DexCompose: Reusing Dexterous Policies for Multi-Task Manipulation with a Single Hand

Surprises in Proper Positive-Only Learning