When Certainty Is an Artifact: Keyword Lexicon Blindness and the (Mis)Measurement of Rhetorical Stance

Summary§

This paper exposes a fundamental flaw in using keyword lexicons to measure rhetorical stance (e.g., certainty). Through a case study of four public intellectuals, the authors demonstrate that keyword-based scoring yields a strong correlation between negative affect and emphatic certainty ($r=0.72-0.93$). However, replacing keyword counting with zero-shot LLM-based semantic classification on the same transcribed interviews (32,625 sentences) nearly eliminates this correlation, and instead reveals a robust coupling between negative discourse and hedging (e.g., Rogoff $r=0.875$, Zeihan $r=0.722$). The discrepancy is traced to three structural failure modes: syntactic blindness, polysemy blindness, and categorical absence.

Core Methodology§

A corpus of 85 interviews (2016-2026) from four public intellectuals (Dalio, Rogoff, Quadir, Zeihan) was fully transcribed and diarized into 32,625 sentences. Two measurement approaches are compared:

1. Keyword lexicon scoring: Predefined lists of negative-affect words (e.g., "crisis," "risk") and emphatic-certainty words (e.g., "absolutely," "certainly") are counted. Per-speaker correlation between normalized counts yields high positive $r$.

2. LLM-based zero-shot semantic classification: An LLM (e.g., GPT-4) is prompted to classify each sentence into categories: negative, emphatic, hedged, neutral. The model is instructed to consider syntax and context. The per-speaker proportions per sentence are averaged, and then correlated across speakers.

Failure Modes Identified§

Syntactic blindness: Keyword lexicons ignore negation (e.g., "never absolutely totally confident" scores high-certainty).
Polysemy blindness: Words like "certainly" can be sarcastic or ambiguous.
Categorical absence: Key stance markers like "maybe" or "probably" (hedging) are not in typical certainty lexicons.

Mathematical Formulation§

Let $x_{i,s}$ be the count of negative words and $y_{i,s}$ the count of emphatic-certainty words for speaker $s$ in sentence $i$. The keyword-based score for speaker $s$ is: $$r_s = \frac{\sum_{i} (x_{i,s} - \bar{x}_s)(y_{i,s} - \bar{y}_s)}{\sqrt{\sum_{i} (x_{i,s} - \bar{x}_s)^2 \sum_{i} (y_{i,s} - \bar{y}_s)^2}}$$

LLM-based classification yields per-sentence probabilities $p_{i,s}^{\text{neg}}, p_{i,s}^{\text{emp}}, p_{i,s}^{\text{hed}}$. The speaker-level proportions are $P_s^{\text{neg}} = \frac{1}{N_s}\sum_i p_{i,s}^{\text{neg}}$, similarly for emphatic and hedged. Correlations are computed across speakers.

Code Snippet (Simulated)§

import numpy as np
# Simulated keyword counts and LLM probabilities for 4 speakers
keyword_counts = np.array([[0.12, 0.08], [0.15, 0.11], [0.09, 0.06], [0.20, 0.16]])  # [neg_count, emp_count]
llm_probs = np.array([[0.10, 0.05, 0.04], [0.14, 0.07, 0.06], [0.08, 0.03, 0.02], [0.18, 0.09, 0.10]])  # [neg_prob, emp_prob, hed_prob]

# Keyword correlation
corr_keyword = np.corrcoef(keyword_counts[:,0], keyword_counts[:,1])[0,1]
print(f"Keyword r: {corr_keyword:.3f}")

# LLM-based correlations
corr_neg_emp = np.corrcoef(llm_probs[:,0], llm_probs[:,1])[0,1]
corr_neg_hed = np.corrcoef(llm_probs[:,0], llm_probs[:,2])[0,1]
print(f"LLM neg-emp r: {corr_neg_emp:.3f}")
print(f"LLM neg-hed r: {corr_neg_hed:.3f}")

Key Results§

Keyword-based $r(\text{neg},\text{emp})$: Dalio 0.851, Rogoff 0.721, Quadir 0.931, Zeihan 0.798 (all $p<0.01$)
LLM-based $r(\text{neg},\text{emp})$: Dalio 0.206, Rogoff -0.218, Quadir -0.152, Zeihan 0.047 (not significant)
LLM-based $r(\text{neg},\text{hed})$: Rogoff 0.875 ($p=0.001$), Zeihan 0.722 ($p=0.008$), others positive but weaker.

Implications§

The paper argues that keyword lexicons inadvertently measure a universal linguistic co-occurrence (negative discourse inherently uses more emphatic vocabulary), not epistemic certainty. This is a category error. LLM-based semantic classification is recommended as a more valid measurement instrument.

Abstract

Technical Analysis & Implementation

Summary§

Core Methodology§

Failure Modes Identified§

Mathematical Formulation§

Code Snippet (Simulated)§

Key Results§

Implications§

Related Research

A welding penetration prediction model for laser welding process based on self-supervised learning using physics-informed neural networks

A cross-process welding penetration status prediction algorithm based on unsupervised domain adaptation in laser and TIG welding

Real vs. Complex Spectral Bases for Neural Operators: The Role of Green's Function Alignment