arrow_backBack to research feed
otherPublished: June 25, 2026

LLM-Based Examination of Eligibility Criteria from Securities Prospectuses at the German Central Bank

By Serhii Hamotskyi, Akash Kumar Gautam, Christian Hänig

Research TL;DR

"Applies LLMs to extract eligibility criteria from noisy, bilingual prospectuses, achieving 91% precision via a generative pipeline with LLM-as-judge evaluation."

Abstract

Verifying the eligibility of securities as collateral is a key responsibility of the German Central Bank. However, manually verifying these assets against legal and financial criteria within lengthy, semi-structured, and often bilingual prospectuses is a resource-intensive task. While previous efforts utilized traditional Named Entity Recognition (NER) for information extraction, these methods can struggle with OCR noise, linguistic variance, and rigid span-based constraints, and the need for manually annotated training data for each relevant annotation type. In this paper, we present the first case study applying Large Language Models (LLMs) to the eligibility examination process, shifting the paradigm toward a generative Information Extraction pipeline. Our approach decomposes the task into extraction, normalization, and interpretation, allowing for greater flexibility in handling noisy text and interleaved German-English content. We further introduce a value-based evaluation methodology using LLM-as-a-judge, which offers a more semantic assessment than location-based metrics. Our results demonstrate that LLM-based systems achieve high precision (up to 91%) in document-level eligibility, exhibiting a conservative operating profile that minimizes false acceptance.

Technical Analysis & Implementation

Technical Synopsis§

This paper presents a generative information extraction (IE) pipeline using Large Language Models (LLMs) to determine whether securities are eligible as collateral at the German Central Bank. The pipeline decomposes the task into three stages: extraction, normalization, and interpretation.

Methodology§

Given a prospectus document $D$, the pipeline first extracts raw text spans corresponding to predefined criteria (e.g., maturity, currency, issuer type). Unlike traditional NER, which identifies spans with fixed labels, the LLM is prompted to generate structured outputs (e.g., JSON) containing the extracted values. Let $\text{LLM}$ be a pretrained autoregressive model (e.g., GPT-4). The extraction step is: $$ E = \text{LLM}(\text{prompt}_{extract}, D) $$ where $E$ is a set of key-value pairs.

Next, normalization converts extracted values into a canonical form. For example, German and English date formats are unified: $$ N = f_{\text{norm}}(E) $$ where $f_{\text{norm}}$ uses rule-based heuristics and an LLM for ambiguous cases.

Finally, interpretation checks each normalized value against the eligibility rules $\mathcal{R}$: $$ \text{eligible} = \text{LLM}(\text{prompt}_{interpret}, N, \mathcal{R}) $$ The model outputs a boolean decision per criterion and an overall document-level eligibility flag.

Evaluation with LLM-as-a-Judge§

Instead of span-level metrics, the authors propose a value-based evaluation where an LLM judges whether the extracted information matches the ground truth semantically. For each criterion, the judge assigns a score $s \in \{0,1\}$, and precision/recall are computed at the document level.

Implementation Details§

  • Model: GPT-4 (via API) with few-shot prompting.
  • Prompts: Designed to handle OCR noise and interleaved German-English text. Example extract prompt:
Extract the following fields from the given prospectus text. Return a JSON object.
Fields: [maturity_date, currency, issuer_name, ...]
Text: {document_text}
  • Pipeline: Sequential calls to LLM; each step uses a dedicated prompt.

Code Snippet§

import openai

def extract_fields(text):
    prompt = f"""Extract maturity_date, currency, and issuer_type from the text.
Return JSON.
Text: {text}
"""
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.0
    )
    return response["choices"][0]["message"]["content"]

# Then normalize and interpret similarly

Results§

The system achieved 91% precision on document-level eligibility decisions, with a conservative bias (low false positive rate). The LLM-as-a-judge evaluation correlated well with human judgments.

Key Contributions§

  • First application of LLMs to this domain.
  • Generational IE approach overcomes OCR noise and bilingual challenges.
  • Value-based evaluation provides semantic assessment beyond span accuracy.

This work demonstrates that LLMs can effectively automate complex regulatory compliance tasks, reducing manual effort while maintaining high accuracy.