arrow_backBack to research feed
otherPublished: June 25, 2026

Autoregressive Boltzmann Generators

By Danyal Rehman, Charlie B. Tan, Yoshua Bengio, Avishek Joey Bose, Alexander Tong

Research TL;DR

"Replaces normalizing flows with autoregressive models in Boltzmann Generators, improving expressivity and scalability for molecular equilibrium sampling."

Abstract

Efficient sampling of molecular systems at thermodynamic equilibrium is a hallmark challenge in statistical physics. This challenge has driven the development of Boltzmann Generators (BGs), which allow rapid generation of uncorrelated equilibrium samples by combining a generative model with exact likelihoods and an importance sampling correction. However, modern BGs predominantly rely on normalizing flows (NFs), which either suffer from limited expressivity due to strict invertibility constraints (discrete time) or computationally expensive likelihoods (continuous time). In this paper, we propose Autoregressive Boltzmann Generators (ArBG) -- a novel autoregressive modelling framework -- that overcomes these limitations by departing from the flow-based BG paradigm. ArBG circumvents the topological constraints of flows and enables sequential inference-time interventions, while offering enhanced scalability by leveraging architectures effective in Large Language Models. We empirically demonstrate that ArBG leads to significant improvements over flow-based models across all benchmarks, but particularly in larger peptide systems such as the 10-residue Chignolin. Furthermore, we introduce Robin, a 132 million parameter transferable model trained with the ArBG framework which improves over the previous state-of-the-art, reducing the zero-shot energy error, E-W$_2$, on 8-residue systems by over 60$\%$. The code can be found at the following link: https://github.com/danyalrehman/autobg.

Technical Analysis & Implementation

Autoregressive Boltzmann Generators (ArBG)§

Core Idea§

Boltzmann Generators (BGs) aim to generate independent equilibrium samples from a target Boltzmann distribution $p(x) \propto e^{-U(x)/kT}$ by training a generative model $q_\theta(x)$ with exact likelihoods and reweighting via importance sampling. Prior BGs rely on normalizing flows (NFs), which are constrained by invertibility (discrete-time NFs) or suffer from costly Jacobian determinants (continuous-time NFs). ArBG departs from flows by using an autoregressive model, which sequentially generates coordinates and allows tractable likelihoods without invertibility constraints.

Methodology§

ArBG factorizes the joint distribution of molecular coordinates $x = (x_1, \dots, x_N)$ (where each $x_i$ may represent a group of atoms) as: $$ q_\theta(x) = \prod_{i=1}^N q_\theta(x_i \mid x_{<i}). $$ Each conditional is modeled as a mixture of Gaussians or a more flexible distribution (e.g., real-valued non-volume preserving (real NVP) block). The model is trained by minimizing the Kullback-Leibler divergence to the target, which reduces to maximum likelihood on samples from the target (if available) or via self-training with importance sampling. The key advantage: autoregressive models are universal approximators of probability distributions and do not require invertibility, enabling more expressive conditionals.

Inference allows sequential interventions: one can condition on partial observations (e.g., fixing a dihedral angle) and sample the remaining degrees of freedom, which is not straightforward with flows.

Training§

For training, ArBG uses a dataset of conformations or a self-consistent procedure. The loss is the negative log-likelihood: $$ \mathcal{L} = -\frac{1}{B} \sum_{j=1}^B \sum_{i=1}^N \log q_\theta(x_i^{(j)} \mid x_{<i}^{(j)}). $$ The model architecture leverages Transformer-like layers for scalability (e.g., masked attention to enforce autoregressive ordering). The authors introduce a 132M parameter model called Robin, trained on a diverse set of peptide systems.

Code Snippet: Model Forward Pass§

import torch
import torch.nn as nn

class AutoregressiveBoltzmannGenerator(nn.Module):
    def __init__(self, d_input, d_model, n_layers, n_heads):
        super().__init__()
        self.embed = nn.Linear(d_input, d_model)
        self.transformer = nn.TransformerEncoder(
            nn.TransformerEncoderLayer(d_model, n_heads, batch_first=True),
            num_layers=n_layers
        )
        self.output_mu = nn.Linear(d_model, d_input)
        self.output_log_sigma = nn.Linear(d_model, d_input)

    def forward(self, x):
        # x: (batch, seq_len, d_input); autoregressive mask applied in transformer
        mask = torch.triu(torch.ones(seq_len, seq_len) * float('-inf'), diagonal=1)
        h = self.embed(x)
        h = self.transformer(h, mask=mask.to(x.device))
        mu = self.output_mu(h)
        log_sigma = self.output_log_sigma(h)
        # conditional log prob: Gaussian
        log_prob = -0.5 * ( ((x - mu) / log_sigma.exp())**2 + 2*log_sigma + torch.log(2*torch.pi) )
        return log_prob.sum(dim=-1).sum(dim=-1)  # sum over dimensions and residues

Results§

ArBG significantly outperforms flow-based BGs on all benchmarks, especially on larger systems like Chignolin (10 residues). The Robin model achieves a 60% reduction in zero-shot energy Wasserstein-2 error on 8-residue peptides compared to prior state-of-the-art.