You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language Models
Authors: Yifan Yang, Jiajun Zhou, Ngai Wong, Zheng Zhang Affiliations: University of California, Santa Barbara; The University of Hong Kong Code: GitHub Repository Paper: 2024.NAACL-Long.174
Abstract
LoRETTA is a parameter-efficient fine-tuning (PEFT) framework for large language models (LLMs) that leverages tensor-train (TT) decomposition to drastically reduce trainable parameters. It introduces two variants:
LoRETTAₐdₚ: Uses tensorized adapters for lightweight fine-tuning.
LoRETTAᵣₑₚ: Reparameterizes weights via tensor factors for ultra-low parameter updates.
Key results:
Achieves 100× fewer parameters than LoRA/Adapters on LLaMA-2 models.
Matches or outperforms full fine-tuning and existing PEFT methods across GLUE, SuperGLUE, and generation tasks.
Demonstrates anti-overfitting capabilities and enhanced multi-task learning efficiency.
Method
Tensor-Train (TT) Decomposition
Reshapes weight matrices into high-dimensional tensors, decomposed into small tensor factors.
Reduces parameters from M×N to ∑rᵢ₋₁kᵢrᵢ (controlled by TT ranks).
LoRETTA Variants
LoRETTAₐdₚ
Injects tensorized adapters after attention and feed-forward layers.
Compresses parameters via TT layers (e.g., 1.2K vs. 98K parameters for Adapters).
LoRETTAᵣₑₚ
Reparameterizes weight updates using TT factors (e.g., 1K vs. 12K parameters for LoRA).
Initializes tensor factors via noise reduction to avoid optimization issues.