arrow_backBack to research feed
otherPublished: June 24, 2026

When Certainty Is an Artifact: Keyword Lexicon Blindness and the (Mis)Measurement of Rhetorical Stance

By Bo Chen

Research TL;DR

"Keyword lexicons can invert rhetorical stance measurement; LLM semantic classification reveals that negativity couples with hedging, not certainty."

Abstract

Can a statistically significant, large-effect-size finding in computational social science be entirely an artifact of the measurement instrument? We present a case where the answer appears to be yes. Analyzing 85 interviews across four public intellectuals (2016--2026), we find a robust negative-affect/emphatic-certainty lexical co-occurrence pattern under keyword-based scoring ($r = 0.72$--$0.93$, $p < 0.01$ for all four speakers). Replacing keyword counting with LLM-based zero-shot semantic classification on the complete diarized corpus (32,625 sentences) dramatically reduces this correlation: Dalio's $r = 0.851$ drops to $r = 0.206$, with two speakers showing negative $r(\text{neg}, \text{emphatic})$ and one showing null. In contrast, the LLM reveals a strong negative-hedging coupling across speakers -- Rogoff's $r(\text{neg}, \text{hedged}) = 0.875$ ($p = 0.001$) and Zeihan's $r(\text{neg}, \text{hedged}) = 0.722$ ($p = 0.008$) -- consistent with the conventional expectation that pessimistic discourse attracts hedging, not certainty. Sentence-level error analysis traces this discrepancy to three structural failure modes in keyword lexicons -- syntactic blindness, polysemy blindness, and categorical absence -- illustrated through cases where keyword counting inverts semantic meaning (e.g., ''never absolutely totally confident'' scored as high-certainty). We argue that keyword lexicons measure a universal lexical co-occurrence tendency -- negative discourse naturally attracts emphatic vocabulary -- that is orthogonal to, and can systematically invert, rhetorical stance. Treating keyword counts as measurements of epistemic certainty is a category error: a finding that appears to be about a speaker's psychology may be entirely about the counting of words.

Technical Analysis & Implementation

Summary§

This paper exposes a fundamental flaw in using keyword lexicons to measure rhetorical stance (e.g., certainty). Through a case study of four public intellectuals, the authors demonstrate that keyword-based scoring yields a strong correlation between negative affect and emphatic certainty ($r=0.72-0.93$). However, replacing keyword counting with zero-shot LLM-based semantic classification on the same transcribed interviews (32,625 sentences) nearly eliminates this correlation, and instead reveals a robust coupling between negative discourse and hedging (e.g., Rogoff $r=0.875$, Zeihan $r=0.722$). The discrepancy is traced to three structural failure modes: syntactic blindness, polysemy blindness, and categorical absence.

Core Methodology§

A corpus of 85 interviews (2016-2026) from four public intellectuals (Dalio, Rogoff, Quadir, Zeihan) was fully transcribed and diarized into 32,625 sentences. Two measurement approaches are compared:

1. Keyword lexicon scoring: Predefined lists of negative-affect words (e.g., "crisis," "risk") and emphatic-certainty words (e.g., "absolutely," "certainly") are counted. Per-speaker correlation between normalized counts yields high positive $r$.

2. LLM-based zero-shot semantic classification: An LLM (e.g., GPT-4) is prompted to classify each sentence into categories: negative, emphatic, hedged, neutral. The model is instructed to consider syntax and context. The per-speaker proportions per sentence are averaged, and then correlated across speakers.

Failure Modes Identified§

  • Syntactic blindness: Keyword lexicons ignore negation (e.g., "never absolutely totally confident" scores high-certainty).
  • Polysemy blindness: Words like "certainly" can be sarcastic or ambiguous.
  • Categorical absence: Key stance markers like "maybe" or "probably" (hedging) are not in typical certainty lexicons.

Mathematical Formulation§

Let $x_{i,s}$ be the count of negative words and $y_{i,s}$ the count of emphatic-certainty words for speaker $s$ in sentence $i$. The keyword-based score for speaker $s$ is: $$r_s = \frac{\sum_{i} (x_{i,s} - \bar{x}_s)(y_{i,s} - \bar{y}_s)}{\sqrt{\sum_{i} (x_{i,s} - \bar{x}_s)^2 \sum_{i} (y_{i,s} - \bar{y}_s)^2}}$$

LLM-based classification yields per-sentence probabilities $p_{i,s}^{\text{neg}}, p_{i,s}^{\text{emp}}, p_{i,s}^{\text{hed}}$. The speaker-level proportions are $P_s^{\text{neg}} = \frac{1}{N_s}\sum_i p_{i,s}^{\text{neg}}$, similarly for emphatic and hedged. Correlations are computed across speakers.

Code Snippet (Simulated)§

import numpy as np
# Simulated keyword counts and LLM probabilities for 4 speakers
keyword_counts = np.array([[0.12, 0.08], [0.15, 0.11], [0.09, 0.06], [0.20, 0.16]])  # [neg_count, emp_count]
llm_probs = np.array([[0.10, 0.05, 0.04], [0.14, 0.07, 0.06], [0.08, 0.03, 0.02], [0.18, 0.09, 0.10]])  # [neg_prob, emp_prob, hed_prob]

# Keyword correlation
corr_keyword = np.corrcoef(keyword_counts[:,0], keyword_counts[:,1])[0,1]
print(f"Keyword r: {corr_keyword:.3f}")

# LLM-based correlations
corr_neg_emp = np.corrcoef(llm_probs[:,0], llm_probs[:,1])[0,1]
corr_neg_hed = np.corrcoef(llm_probs[:,0], llm_probs[:,2])[0,1]
print(f"LLM neg-emp r: {corr_neg_emp:.3f}")
print(f"LLM neg-hed r: {corr_neg_hed:.3f}")

Key Results§

  • Keyword-based $r(\text{neg},\text{emp})$: Dalio 0.851, Rogoff 0.721, Quadir 0.931, Zeihan 0.798 (all $p<0.01$)
  • LLM-based $r(\text{neg},\text{emp})$: Dalio 0.206, Rogoff -0.218, Quadir -0.152, Zeihan 0.047 (not significant)
  • LLM-based $r(\text{neg},\text{hed})$: Rogoff 0.875 ($p=0.001$), Zeihan 0.722 ($p=0.008$), others positive but weaker.

Implications§

The paper argues that keyword lexicons inadvertently measure a universal linguistic co-occurrence (negative discourse inherently uses more emphatic vocabulary), not epistemic certainty. This is a category error. LLM-based semantic classification is recommended as a more valid measurement instrument.