Mapping Political-Elite Networks in Europe with a Multilingual Joint Entity-Relation Extraction Pipeline

Technical Summary§

This paper presents a fully open-weight pipeline for multilingual joint entity-relation extraction (ERE) from large unstructured news corpora, building signed and temporal knowledge graphs (KGs). The pipeline consists of three main components: (1) span-based named-entity recognition (NER), (2) a three-stage entity linking cascade to Wikidata, and (3) an ontology-constrained mixture-of-experts (MoE) model with guided decoding for relation extraction.

Span-based NER§

A pretrained multilingual language model (e.g., XLM-R) is fine-tuned with a span classification head. For each input token sequence $X = [x_1, ..., x_n]$, all possible spans $s_{i,j}$ (contiguous subsequences) are enumerated. Each span is represented as the concatenation of its start/end token embeddings and a width embedding, fed into a classifier to predict entity type (PER, ORG, GPE, etc.) or "non-entity". The loss is a cross-entropy over spans.

Three-Stage Entity Linking§

Mentions are linked to Wikidata Q-IDs via: 1. Candidate Generation: Fuzzy string matching against a precomputed index of Wikidata labels/aliases. 2. Contextual Disambiguation: A bi-encoder (Sentence-BERT) scores candidate entities against the mention's left/right context window. 3. Coreference Resolution: Within-document and cross-document clustering using agglomerative clustering with a learned pairwise similarity threshold.

The pipeline outputs a set of Wikidata IDs for each document.

Ontology-Constrained MoE for Relation Extraction§

Relations are extracted using a decoder-only MoE transformer (e.g., Mixtral 8x7B) with guided decoding constrained by a domain ontology. The ontology defines relation types (e.g., "member_of", "conflict", "supports") with direction and sign (positive/negative). The model takes as input the concatenation of two entity IDs and the context text, and generates a relation token via constrained beam search. The MoE architecture uses a gating network $G(x) = \text{softmax}(W_g x)$ to select top-$k$ experts, with each expert $E_i$ being an FFN. The output is:

$$y = \sum_{i=1}^N G(x)_i E_i(x)$$

To enforce ontology constraints, the decoding step masks invalid tokens (e.g., disallowing "supports" between two organizations if not defined). Relations are extracted as triples $(h, r, t, s, t)$ where $h$ and $t$ are Wikidata IDs, $r$ is the relation type, $s \in \{-1, +1\}$ the sign, and $t$ the timestamp.

Implementation Details§

NER: Fine-tuned XLM-R Large on a multilingual dataset of political news (hand-annotated).
Entity Linking: Precomputed index of ~10M Wikidata entities; bi-encoder trained on Wikipedia hyperlinks.
Relation Extraction: Mixtral 8x7B with retrieval-augmented generation (RAG) for context; constrained decoding via a custom grammar.
Pipeline Scaling: Document-level parallelism with distributed Redis-backed mention queues.

Example Code Snippet§

# Pseudo-code for relation extraction with guided decoding
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("mistral-mixtral-8x7b")
tokenizer = AutoTokenizer.from_pretrained("mistral-mixtral-8x7b")

# Ontology constraints: allowed relations and their signatures
ontology_constraints = {
    "conflict": ["POLITICIAN", "POLITICIAN"],
    "supports": ["POLITICIAN", "POLICY"],
    # ...
}

def guided_generate(context, entity_h, entity_t, max_tokens=5):
    input_text = f"Context: {context}\nRelation between {entity_h} and {entity_t}:"
    input_ids = tokenizer.encode(input_text, return_tensors='pt').cuda()
    
    valid_ids = get_valid_relation_ids(ontology_constraints, entity_h.type, entity_t.type)
    
    outputs = model.generate(
        input_ids,
        max_new_tokens=max_tokens,
        prefix_allowed_tokens_fn=lambda batch_id, input_ids: valid_ids,
        do_sample=False
    )
    return tokenizer.decode(outputs[0][len(input_ids[0]):])

Evaluation§

On a gold-standard set of 3,491 relations, the pipeline achieves 68.2% strict and 93.7% lenient textual correctness. Two large-scale case studies (Austrian party lifecycle and Polish patronage networks) demonstrate validity against historical records.

Abstract

Technical Analysis & Implementation

Technical Summary§

Span-based NER§

Three-Stage Entity Linking§

Ontology-Constrained MoE for Relation Extraction§

Implementation Details§

Example Code Snippet§

Evaluation§

Related Research

Understanding Domain-Aware Distribution Alignment in Budgeted Entity Matching

Language-Based Digital Twins for Elderly Cognitive Assistance

Autoregressive Boltzmann Generators