arrow_backBack to research feed
otherPublished: June 26, 2026

Second-Order KKT Guarantees for Bregman ADMM in Nonconvex and Non-Lipschitz Optimization

By Shuang Li, Zhihui Zhu, Qiuwei Li

Research TL;DR

"Proves second-order KKT guarantees for Bregman ADMM under two-sided relative smoothness, enabling convergence to strict saddle points from random initialization for non-Lipschitz problems."

Abstract

We analyze Bregman ADMM for nonconvex linearly constrained problems under two-sided relative smoothness, a condition that replaces the standard Lipschitz gradient assumption with a Hessian comparison relative to a Bregman kernel. This setting covers polynomial objectives arising in matrix and tensor models for which a global Lipschitz-gradient constant need not exist. We show that on an invariant open state-space domain, one iteration of Bregman ADMM defines a smooth primal--dual fixed-point map whose strict-saddle KKT points are unstable fixed points; consequently, from random initialization the iterates converge to a strict saddle with probability zero. Combined with existing first-order convergence results, this yields almost-sure second-order stationarity of limiting KKT points. We extend the analysis to a multi-block star consensus formulation for distributed optimization. The technical novelty lies in a determinant reduction with a Bregman-specific symmetrization and scaling step in the two block spectral argument, together with a null space cancellation exploiting the star graph structure in the consensus case. Numerical experiments on distributed matrix factorization illustrate the theory, and a symmetric tensor factorization example demonstrates the broader Bregman proximal splitting idea beyond the separable consensus setting.

Technical Analysis & Implementation

Technical Breakdown§

Core Problem and Assumptions§

The paper addresses linearly constrained nonconvex optimization where the objective lacks a global Lipschitz gradient. Instead, it assumes two-sided relative smoothness with respect to a Bregman kernel $D_h$: there exist constants $ ho_1, \rho_2 > 0$ such that for all $x, y$ in an invariant open domain,

$$ \rho_1 D_h(y, x) \leq f(y) - f(x) - \langle \nabla f(x), y - x \rangle \leq \rho_2 D_h(y, x). $$

This generalizes the standard smoothness and covers polynomial objectives like those in tensor factorization.

Bregman ADMM Update§

The algorithm solves $\min_{x,y} f(x) + g(y)$ subject to $Ax + By = c$ via:

\begin{align} x^{k+1} &= \operatorname{argmin}_x \left\{ f(x) + \langle \lambda^k, Ax \rangle + \frac{\beta}{2} \|Ax + By^k - c\|^2 \right\}, \\ y^{k+1} &= \operatorname{argmin}_y \left\{ g(y) + \langle \lambda^k, By \rangle + \frac{\beta}{2} \|Ax^{k+1} + By - c\|^2 + D_h(y, y^k) \right\}, \\ \lambda^{k+1} &= \lambda^k + \beta (Ax^{k+1} + By^{k+1} - c). \end{align}

The Bregman proximal term $D_h(y, y^k)$ replaces the standard quadratic penalty, allowing adaptation to the problem's geometry.

Key Theoretical Result§

By constructing a smooth primal-dual fixed-point map $T$, the authors show that KKT points satisfy $T(z) = z$. Using a spectral analysis with a determinant reduction and Bregman-specific symmetrization, they prove that strict saddle points (where the Hessian has at least one negative eigenvalue) are unstable fixed points. Consequently, from random initialization, iterates avoid strict saddles with probability 1, and converge to second-order stationary KKT points.

Distributed Star Consensus Extension§

For a star graph with $N$ agents, the problem is

$$ \min_{x_0, \{x_i\}} \sum_{i=1}^N f_i(x_i) \quad \text{s.t.} \quad x_i = x_0, $$

where $x_0$ is the central variable. The multi-block Bregman ADMM exploits the null space of the constraint matrix to apply a null space cancellation, reducing the spectral analysis to a single block.

Implementation Example§

Below is a PyTorch-like snippet for distributed matrix factorization using Bregman ADMM with the kernel $h(x) = \frac{1}{2}\|x\|^2$ (Euclidean case, simplifying to standard ADMM) or a more general kernel like Shannon entropy.

import torch

def bregman_admm(f, g, A, B, c, beta, h, T=1000):
    # Initialize primal and dual variables
    x = torch.randn(A.shape[1])
    y = torch.randn(B.shape[1])
    lam = torch.zeros(c.shape[0])
    
    for t in range(T):
        # x-update: unconstrained quadratic if f is simple
        x = torch.linalg.solve(A.T @ A, A.T @ (c - B @ y - lam/beta))
        # y-update: Bregman proximal step
        # grad_y = ... (requires solving a proximal operator)
        # y = proximal_{g, h}(y_old, ...)
        # λ-update
        lam = lam + beta * (A @ x + B @ y - c)
    return x, y, lam

For general Bregman kernels, the $y$-update involves solving

$$ y^{k+1} = \operatorname{argmin}_y \left\{ g(y) + \frac{\beta}{2}\|Ax^{k+1} + By - c + \frac{\lambda^k}{\beta}\|^2 + D_h(y, y^k) \right\}. $$

Conclusion§

The paper provides a rigorous second-order analysis for a class of nonconvex, non-Lipschitz problems, with empirical validation on matrix and tensor factorization. The techniques (determinant reduction, symmetrization, null space cancellation) may extend to other Bregman-based algorithms.