arrow_backBack to research feed
alignmentPublished: July 1, 2026

Theoria: Rewrite-Acceptability Verification over Informal Reasoning States

By Ben Slivinski, Michael Saldivar

Research TL;DR

"Theoria verifies AI reasoning by decomposing it into auditable state transitions with explicit justifications, enforcing completeness of change to surface hidden errors, achieving 91.4% precision on HLE-Verified Gold."

Abstract

When should an AI system's answer be trusted? Formal proof assistants offer certainty but cannot reach most of the problem distribution; scalar LLM judges offer coverage but produce opaque scores that cannot be audited after the fact and are subject to the same coherence issues as any LLM. We present Theoria, a verification architecture that closes this gap. A candidate solution is rewritten into a sequence of typed state transitions, each licensed by an explicit justification, whether that be a citation, computation, or problem-given fact, and every transition is independently auditable. The foundational invariant is completeness of change: every difference between consecutive proof states must be accounted for, so hidden premises surface as unlicensed mutations rather than passing silently. On HLE-Verified Gold (185 text-only expert problems), Theoria certifies 105 at 91.4% strict precision (Wilson 95% CI [84.5%, 95.4%]). Every certification produces a human readable proof trace in which each step can be independently challenged. Holistic LLM judges achieve comparable precision at matched coverage but fail on different problems (Jaccard 0.14-0.36), making the approaches complementary. On 95 adversarial poisoned proofs across 15 domains, structured judges catch 94.7% versus 83.2% for holistic judging (p= 0.0017). The overall 11.5 pp gap concentrates in hidden premises (90.6% vs. 62.5%, a 28 pp difference) and fabricated citations (100% vs. 90%), the error classes where the formal analysis predicts an advantage; performance is identical on arithmetic and theorem-misapplication errors, where no advantage is predicted. On GPQA Diamond (n= 65), certified precision is 97.1% (Wilson CI [85.1%, 99.5%]).

Technical Analysis & Implementation

Theoria: Rewrite-Acceptability Verification over Informal Reasoning States§

Core Methodology§

Theoria formalizes reasoning verification by transforming a candidate solution into a sequence of typed states $S_0, S_1, \dots, S_n$, where each state is a set of claims. Transitions $S_i \rightarrow S_{i+1}$ must be licensed by an explicit justification $J$ of one of three types:

  • citation (e.g., from a known source),
  • computation (e.g., arithmetic),
  • problem-given fact (from the problem statement).

The central invariant is completeness of change: every claim that appears in $S_{i+1}$ but not in $S_i$ must be justified by $J$, and every claim removed must also be accounted for. Hidden premises surface as unlicensed mutations—changes that lack a justification.

Verification Pipeline§

1. Rewriting: An LLM (e.g., GPT-4o) takes the original solution and produces a sequence of states and justifications. This step is guided by a prompt that enforces the transition structure. 2. Verification: A structured LLM judge checks each transition independently. For each $i$, the judge classifies the transition as accept or reject based on whether the justification licenses the observed changes. The judge is also an LLM but operates under a strict template that forces it to inspect each claim and justification.

Mathematical Formulation§

Let $\Delta_i = S_{i+1} \setminus S_i$ (added claims) and $\nabla_i = S_i \setminus S_{i+1}$ (removed claims). A transition is valid iff there exists a justification $J$ such that:

  • $\forall c \in \Delta_i: \text{justifies}(J, c)$
  • $\forall c \in \nabla_i: \text{explains}(J, c)$ (e.g., superseded by a more specific claim)

The verifier outputs a binary decision for each step, and the entire proof is accepted iff all steps pass.

Code Illustration (Conceptual)§

def verify_sequence(states, justifications):
    for i in range(len(states)-1):
        added = states[i+1].claims - states[i].claims
        removed = states[i].claims - states[i+1].claims
        J = justifications[i]
        if not (all(justifies(J, c) for c in added) and
                all(explains(J, c) for c in removed)):
            return False, i
    return True, None

The actual implementation uses LLM calls to evaluate justifies and explains for each claim.

Key Results§

  • On HLE-Verified Gold (185 problems): 105 certified with 91.4% strict precision (Wilson 95% CI [84.5%, 95.4%]).
  • Adversarial poisoned proofs: 94.7% detection vs 83.2% for holistic judges (p=0.0017). The 11.5 pp gap is concentrated in hidden premises (90.6% vs 62.5%) and fabricated citations (100% vs 90%).
  • On GPQA Diamond (n=65): 97.1% precision.

The paper also shows that Theoria and holistic judges fail on disjoint sets (Jaccard 0.14–0.36), suggesting complementarity.

Implications§

Theoria provides an auditable, high-precision verification layer for LLM outputs, particularly useful in high-stakes settings. Its structured approach catches errors that holistic judges miss, especially those involving omissions or fabricated justifications.

Interactive SEO Tool

Embedding Vector Similarity Visualizer

Embeddings represent text in high-dimensional vector spaces. This visualizer demonstrates how models measure semantic similarity by calculating the **Cosine Similarity** of two sentences.

Cosine Similarity:0.4020
Vocabulary Size14 unique terms
Shared Terms3 terms
Intersecting Vocabulary
thebrownover
Vector Projection PlaneXYθ = 66°Vector AVector Bθ = 90° is orthogonal (0% match) · θ = 0° is parallel (100% match)

Mathematical Formulation

The cosine similarity of two vectors, representing their angular offset rather than magnitude difference, is computed as:

\[\text{Cosine Similarity} = \cos(\theta) = \frac{\mathbf{A} \cdot \mathbf{B}}{\|\mathbf{A}\| \|\mathbf{B}\|} = \frac{\sum_{i=1}^{n} A_i B_i}{\sqrt{\sum_{i=1}^{n} A_i^2} \sqrt{\sum_{i=1}^{n} B_i^2}}\]

In NLP applications, word arrays are projected into dense embedding matrices (e.g. 1536 dimensions). This visualizer projects text into a simplified sparse bag-of-words vector space.