efficiencyPublished: June 17, 2021

LoRA: Low-rank adaptation of large language models

By Edward J. Hu, Yuxin Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen

Research TL;DR

"Introduces Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning. Dramatically reduces the number of trainable parameters while maintaining downstream performance."

Abstract

We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. LoRA performs on-par or better than fine-tuning while reducing GPU memory requirements.

Read full paper on arXiv →

LoRA: Low-rank adaptation of large language models

Abstract

Related Research

FlashAttention: Fast and memory-efficient exact attention with IO-awareness

Training compute-optimal large language models