efficiencyPublished: June 17, 2021
LoRA: Low-rank adaptation of large language models
By Edward J. Hu, Yuxin Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen
Research TL;DR
"Introduces Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning. Dramatically reduces the number of trainable parameters while maintaining downstream performance."
Abstract
We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. LoRA performs on-par or better than fine-tuning while reducing GPU memory requirements.