G-RRM: Guiding Symbolic Solvers with Recurrent Reasoning Models

Abstract

In this work, we focus on SE-RRMs, a symbol-equivariant instantiation of RRMs that exhibits improved extrapolation to larger problem sizes. We propose a neuro-symbolic approach, ``Guiding with Recurrent Reasoning Models'' (G-RRM), which integrates SE-RRMs with symbolic solvers for constraint satisfaction problems. SE-RRMs act as neural solvers that generate full solution proposals and guide classical symbolic solvers, such as backtracking or SAT-based methods like Glucose 4.1 and CaDiCaL 3.0.0, that produce globally correct solutions. Centrally, we investigate when neural guidance with G-RRM improves the search efficiency of symbolic solvers. % Our experiments show that the efficacy of G-RRM depends on two conditions: first, the problem instances must have an expansive combinatorial search space to expose potential gains, and second, the solver architecture must be capable of dynamically overwriting its branching choices to recover when neural hints are imperfect. When these conditions hold, guidance drives median conflict counts to zero and yields significant wall-clock speedups: on $9\times9$ Sudoku, where the SE-RRM correctly solves $91.1\%$ of instances, backtracking accelerates by $33.3\times$ and Glucose 4.1 by $1.70\times$ (median, $p<0.001$), with Glucose 4.1 retaining a $1.17\times$ speedup on perfect-hint $25\times25$ grids. In contrast, CaDiCaL 3.0.0, whose runtime is overhead-dominated and which always respects the injected branching hints rather than overwriting them, shows no significant speedup (median $1.02\times$, n.s.) and even a small significant mean slowdown ($0.90\times$) on $9\times9$. These results delineate the regimes in which neural guidance translates into practical speedups.

Technical Analysis & Implementation

Overview§

G-RRM (Guiding with Recurrent Reasoning Models) is a neuro-symbolic framework that uses a Symbol-Equivariant Recurrent Reasoning Model (SE-RRM) to generate full solution proposals for constraint satisfaction problems (CSPs). These proposals guide classical symbolic solvers (backtracking, Glucose 4.1, CaDiCaL 3.0.0) by providing branching order hints. The key insight is that neural guidance improves search efficiency only when the problem has a large combinatorial search space (e.g., large Sudoku grids) and when the solver can dynamically overwrite imperfect hints.

Methodology§

SE-RRMs are a form of neural solver that updates node features via message passing in a factor graph, equivariant to permutations of variables and constraints. Given a CSP with variables $x_i$ and constraints $C_j$, SE-RRM computes assignments $\hat{x}_i$ by iterating: $$ h_i^{(t+1)} = f_{\theta}\left( h_i^{(t)}, \sum_{j} \phi_{\theta}(h_i^{(t)}, g_{\theta}(C_j)) \right) $$ where $h_i$ is the hidden state for variable $i$, and $f_\theta, \phi_\theta, g_\theta$ are learnable networks. The final predictions are obtained via a readout: $\hat{x}_i = \text{softmax}(W h_i^{(T)})$.

In G-RRM, the SE-RRM's output is converted into branching hints for the solver. For backtracking, hints are literal orderings. For SAT solvers (Glucose, CaDiCaL), hints are variable polarities (phase). The symbolic solver then runs with these hints, but may overwrite them if necessary. The augmented solver's search terminates upon finding a valid solution or exhausting the search space.

Key Results§

On $9\times9$ Sudoku (SE-RRM solves 91.1% perfectly), backtracking sees a $33.3\times$ wall-clock speedup, Glucose 4.1 a $1.70\times$ speedup ($p<0.001$), but CaDiCaL shows negligible speedup (median $1.02\times$, n.s.) due to overhead and overwrite-respecting behavior.
On $25\times25$ Sudoku with perfect hints, Glucose still gains $1.17\times$, confirming the importance of large search spaces.
Conditions for speedup: (1) problem must have an expansive search space, (2) solver must be able to overwrite incorrect hints to recover from imperfect neural predictions.

Code Snippet (Simplified G-RRM Guidance Loop)§

def grrm_solve(model, solver, problem):
    # model: SE-RRM, solver: Backtracking or SAT solver
    # Generate neural proposal
    proposal = model(problem)  # shape: (n_vars, domain_size)
    hints = extract_hints(proposal)  # e.g. argmax for each variable
    solver.set_hints(hints)
    
    # Run symbolic solver with neural guidance
    solution = solver.solve(problem)
    return solution

The SE-RRM parameter updates follow a supervised loss on complete assignments: $$\mathcal{L} = \frac{1}{N} \sum_{i=1}^N \text{CE}(\hat{x}_i, x_i^)$$ where $x_i^$ is the ground-truth assignment for training instances.

Abstract

Technical Analysis & Implementation

Overview§

Methodology§

Key Results§

Code Snippet (Simplified G-RRM Guidance Loop)§

Related Research

Audio-Based Understanding of Audiobook Narration Appeal

Language-Critique Imitation Learning from Suboptimal Demonstrations

DexCompose: Reusing Dexterous Policies for Multi-Task Manipulation with a Single Hand