Theoria: Rewrite-Acceptability Verification over Informal Reasoning States
By Ben Slivinski, Michael Saldivar
"Theoria verifies AI reasoning by decomposing it into auditable state transitions with explicit justifications, enforcing completeness of change to surface hidden errors, achieving 91.4% precision on HLE-Verified Gold."
Abstract
When should an AI system's answer be trusted? Formal proof assistants offer certainty but cannot reach most of the problem distribution; scalar LLM judges offer coverage but produce opaque scores that cannot be audited after the fact and are subject to the same coherence issues as any LLM. We present Theoria, a verification architecture that closes this gap. A candidate solution is rewritten into a sequence of typed state transitions, each licensed by an explicit justification, whether that be a citation, computation, or problem-given fact, and every transition is independently auditable. The foundational invariant is completeness of change: every difference between consecutive proof states must be accounted for, so hidden premises surface as unlicensed mutations rather than passing silently. On HLE-Verified Gold (185 text-only expert problems), Theoria certifies 105 at 91.4% strict precision (Wilson 95% CI [84.5%, 95.4%]). Every certification produces a human readable proof trace in which each step can be independently challenged. Holistic LLM judges achieve comparable precision at matched coverage but fail on different problems (Jaccard 0.14-0.36), making the approaches complementary. On 95 adversarial poisoned proofs across 15 domains, structured judges catch 94.7% versus 83.2% for holistic judging (p= 0.0017). The overall 11.5 pp gap concentrates in hidden premises (90.6% vs. 62.5%, a 28 pp difference) and fabricated citations (100% vs. 90%), the error classes where the formal analysis predicts an advantage; performance is identical on arithmetic and theorem-misapplication errors, where no advantage is predicted. On GPQA Diamond (n= 65), certified precision is 97.1% (Wilson CI [85.1%, 99.5%]).
Technical Analysis & Implementation
Theoria: Rewrite-Acceptability Verification over Informal Reasoning States§
Core Methodology§
Theoria formalizes reasoning verification by transforming a candidate solution into a sequence of typed states $S_0, S_1, \dots, S_n$, where each state is a set of claims. Transitions $S_i \rightarrow S_{i+1}$ must be licensed by an explicit justification $J$ of one of three types:
- citation (e.g., from a known source),
- computation (e.g., arithmetic),
- problem-given fact (from the problem statement).
The central invariant is completeness of change: every claim that appears in $S_{i+1}$ but not in $S_i$ must be justified by $J$, and every claim removed must also be accounted for. Hidden premises surface as unlicensed mutations—changes that lack a justification.
Verification Pipeline§
1. Rewriting: An LLM (e.g., GPT-4o) takes the original solution and produces a sequence of states and justifications. This step is guided by a prompt that enforces the transition structure. 2. Verification: A structured LLM judge checks each transition independently. For each $i$, the judge classifies the transition as accept or reject based on whether the justification licenses the observed changes. The judge is also an LLM but operates under a strict template that forces it to inspect each claim and justification.
Mathematical Formulation§
Let $\Delta_i = S_{i+1} \setminus S_i$ (added claims) and $\nabla_i = S_i \setminus S_{i+1}$ (removed claims). A transition is valid iff there exists a justification $J$ such that:
- $\forall c \in \Delta_i: \text{justifies}(J, c)$
- $\forall c \in \nabla_i: \text{explains}(J, c)$ (e.g., superseded by a more specific claim)
The verifier outputs a binary decision for each step, and the entire proof is accepted iff all steps pass.
Code Illustration (Conceptual)§
def verify_sequence(states, justifications):
for i in range(len(states)-1):
added = states[i+1].claims - states[i].claims
removed = states[i].claims - states[i+1].claims
J = justifications[i]
if not (all(justifies(J, c) for c in added) and
all(explains(J, c) for c in removed)):
return False, i
return True, NoneThe actual implementation uses LLM calls to evaluate justifies and explains for each claim.
Key Results§
- On HLE-Verified Gold (185 problems): 105 certified with 91.4% strict precision (Wilson 95% CI [84.5%, 95.4%]).
- Adversarial poisoned proofs: 94.7% detection vs 83.2% for holistic judges (p=0.0017). The 11.5 pp gap is concentrated in hidden premises (90.6% vs 62.5%) and fabricated citations (100% vs 90%).
- On GPQA Diamond (n=65): 97.1% precision.
The paper also shows that Theoria and holistic judges fail on disjoint sets (Jaccard 0.14–0.36), suggesting complementarity.
Implications§
Theoria provides an auditable, high-precision verification layer for LLM outputs, particularly useful in high-stakes settings. Its structured approach catches errors that holistic judges miss, especially those involving omissions or fabricated justifications.
Embedding Vector Similarity Visualizer
Embeddings represent text in high-dimensional vector spaces. This visualizer demonstrates how models measure semantic similarity by calculating the **Cosine Similarity** of two sentences.
Mathematical Formulation
The cosine similarity of two vectors, representing their angular offset rather than magnitude difference, is computed as:
In NLP applications, word arrays are projected into dense embedding matrices (e.g. 1536 dimensions). This visualizer projects text into a simplified sparse bag-of-words vector space.