arrow_backBack to research feed
llmPublished: July 1, 2026

Measuring the Gap Between Human and LLM Research Ideas

By Ziyu Chen, Yilun Zhao, Arman Cohan

Research TL;DR

"Proposes a two-axis research-taste taxonomy to profile ideas, revealing that LLMs over-concentrate on bridge opportunities and synthesis methods compared to humans."

Abstract

LLMs are increasingly used to brainstorm research ideas, but existing evaluations mostly judge individual ideas by novelty, feasibility, or expert preference. We instead ask: how far are current LLM-generated ideas from human researchers? To characterize this gap, we build a large-scale evaluation framework for ideation from high-quality human research papers. For each paper, we reverse-engineer a small set of closely related prior works that likely inspired its core idea. LLMs are then prompted to generate a new idea from the set of paper titles and summaries. We introduce a two-axis research-taste taxonomy to profile each idea by its opportunity pattern and research paradigm, and use it to quantify the divergence between human and LLM ideas. Across idea sets generated by different LLMs, we observe a consistent distributional gap: LLM ideas are disproportionately concentrated around bridge-like opportunities and synthesis methods, whereas the human paper reference distribution spreads more broadly across ways of framing gaps and constructing contributions. This result suggests that strong LLMs can produce a range of reasonable ideas, but that range remains narrower than, and systematically shifted relative to, human research taste.

Technical Analysis & Implementation

Overview§

This paper quantifies the distributional gap between human-written research ideas and LLM-generated ideas. The authors reverse-engineer likely prior works that inspired each human paper, then prompt LLMs to generate new ideas from those prior works. A two-axis taxonomy (opportunity pattern × research paradigm) is introduced to profile ideas.

Methodology§

Reverse-engineering prior works§

For each human paper $P$, a small set of prior works $\mathcal{C}_P$ (typically 2) that likely inspired $P$ is identified via citation analysis and human annotation. LLMs are given the titles and summaries of $\mathcal{C}_P$ and asked to propose a new idea.

Two-axis taxonomy§

Each idea is classified along:

  • Opportunity pattern: how the gap is framed (e.g., bridge between fields, fill a hole, identify a new direction)
  • Research paradigm: how the contribution is constructed (e.g., synthesis, analysis, empirical study)

This yields a 2D distribution. The authors compute the Wasserstein distance between human and LLM distributions.

Key Findings§

  • LLM ideas are disproportionately concentrated on "bridge" opportunities and "synthesis" paradigms.
  • Human ideas span more evenly across categories like "hole-filling" and "analysis".
  • The gap persists across various LLMs (GPT-4, Claude, Gemini) and prompt variations.

Code Snippet (Idea Classification)§

import numpy as np
from scipy.stats import wasserstein_distance

def profile_idea(idea_text, classifier):
    # classifier returns one-hot vector over 4 opportunity patterns and 4 paradigms
    opp, para = classifier(idea_text)
    return opp, para

# Compute distributions
opp_dist_human = np.array([0.2, 0.3, 0.3, 0.2])  # example
opp_dist_llm = np.array([0.1, 0.5, 0.2, 0.2])

w_dist = wasserstein_distance(opp_dist_human, opp_dist_llm)
print(f"Wasserstein distance: {w_dist:.3f}")

Equations§

Let $X_H$ and $X_L$ be random variables representing the taxonomy category of human and LLM ideas. The gap is measured by the Wasserstein distance: $$W_1(X_H, X_L) = \inf_{\gamma \in \Gamma(X_H, X_L)} \mathbb{E}_{(x,y)\sim\gamma}[d(x,y)]$$ where $d$ is Euclidean distance on the 2D taxonomy grid.

Interactive SEO Tool

Embedding Vector Similarity Visualizer

Embeddings represent text in high-dimensional vector spaces. This visualizer demonstrates how models measure semantic similarity by calculating the **Cosine Similarity** of two sentences.

Cosine Similarity:0.4020
Vocabulary Size14 unique terms
Shared Terms3 terms
Intersecting Vocabulary
thebrownover
Vector Projection PlaneXYθ = 66°Vector AVector Bθ = 90° is orthogonal (0% match) · θ = 0° is parallel (100% match)

Mathematical Formulation

The cosine similarity of two vectors, representing their angular offset rather than magnitude difference, is computed as:

\[\text{Cosine Similarity} = \cos(\theta) = \frac{\mathbf{A} \cdot \mathbf{B}}{\|\mathbf{A}\| \|\mathbf{B}\|} = \frac{\sum_{i=1}^{n} A_i B_i}{\sqrt{\sum_{i=1}^{n} A_i^2} \sqrt{\sum_{i=1}^{n} B_i^2}}\]

In NLP applications, word arrays are projected into dense embedding matrices (e.g. 1536 dimensions). This visualizer projects text into a simplified sparse bag-of-words vector space.

INTEGRATED RECOMMENDATION

Accelerate your workflow with Feedalyze

AI churn detection for SaaS. Know which customers will leave before they do.

Free plan available · Connects to HubSpot, Intercom, Zendesk