LLM-Based Examination of Eligibility Criteria from Securities Prospectuses at the German Central Bank

Technical Synopsis§

This paper presents a generative information extraction (IE) pipeline using Large Language Models (LLMs) to determine whether securities are eligible as collateral at the German Central Bank. The pipeline decomposes the task into three stages: extraction, normalization, and interpretation.

Methodology§

Given a prospectus document $D$, the pipeline first extracts raw text spans corresponding to predefined criteria (e.g., maturity, currency, issuer type). Unlike traditional NER, which identifies spans with fixed labels, the LLM is prompted to generate structured outputs (e.g., JSON) containing the extracted values. Let $\text{LLM}$ be a pretrained autoregressive model (e.g., GPT-4). The extraction step is: $$ E = \text{LLM}(\text{prompt}_{extract}, D) $$ where $E$ is a set of key-value pairs.

Next, normalization converts extracted values into a canonical form. For example, German and English date formats are unified: $$ N = f_{\text{norm}}(E) $$ where $f_{\text{norm}}$ uses rule-based heuristics and an LLM for ambiguous cases.

Finally, interpretation checks each normalized value against the eligibility rules $\mathcal{R}$: $$ \text{eligible} = \text{LLM}(\text{prompt}_{interpret}, N, \mathcal{R}) $$ The model outputs a boolean decision per criterion and an overall document-level eligibility flag.

Evaluation with LLM-as-a-Judge§

Instead of span-level metrics, the authors propose a value-based evaluation where an LLM judges whether the extracted information matches the ground truth semantically. For each criterion, the judge assigns a score $s \in \{0,1\}$, and precision/recall are computed at the document level.

Implementation Details§

Model: GPT-4 (via API) with few-shot prompting.
Prompts: Designed to handle OCR noise and interleaved German-English text. Example extract prompt:

Extract the following fields from the given prospectus text. Return a JSON object.
Fields: [maturity_date, currency, issuer_name, ...]
Text: {document_text}

Pipeline: Sequential calls to LLM; each step uses a dedicated prompt.

Code Snippet§

import openai

def extract_fields(text):
    prompt = f"""Extract maturity_date, currency, and issuer_type from the text.
Return JSON.
Text: {text}
"""
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.0
    )
    return response["choices"][0]["message"]["content"]

# Then normalize and interpret similarly

Results§

The system achieved 91% precision on document-level eligibility decisions, with a conservative bias (low false positive rate). The LLM-as-a-judge evaluation correlated well with human judgments.

Key Contributions§

First application of LLMs to this domain.
Generational IE approach overcomes OCR noise and bilingual challenges.
Value-based evaluation provides semantic assessment beyond span accuracy.

This work demonstrates that LLMs can effectively automate complex regulatory compliance tasks, reducing manual effort while maintaining high accuracy.

Abstract

Technical Analysis & Implementation

Technical Synopsis§

Methodology§

Evaluation with LLM-as-a-Judge§

Implementation Details§

Code Snippet§

Results§

Key Contributions§

Related Research

Mapping Political-Elite Networks in Europe with a Multilingual Joint Entity-Relation Extraction Pipeline

Understanding Domain-Aware Distribution Alignment in Budgeted Entity Matching

Language-Based Digital Twins for Elderly Cognitive Assistance