Representation fine-tuning (ReFT): A Powerful Parameter-Efficient Way to Fine-tune Language Models (2024)

SACHIN KUMAR

9 min read

Apr 7, 2024

Parameter-efficient finetuning (PEFT) methods propose a efficient and cheaper alternative to full fine-tuning by updating a small fraction of weights, while using less memory and finishing training faster. Current state-of-art PEFTs like LoRA[1] and DoRA[2] modify weights of model but not the representations. However much of the model interpretation work suggests that editing model representations might be more powerful alternative to model weight updates.

In the paper[3], researchers propose Representation Finetuning (ReFT) approach, which operates on a frozen base model and learn task-specific interventions on hidden representations. This approach is proposed as a drop-in replacement for weights based PEFTs.

As part of the ReFT model family, the paper[3] also proposes a highly efficient instance of the ReFT called as Low-rank Linear Subspace ReFT (LoReFT). LoReFT learns interventions that are 10×–50× more parameter-efficient than prior state-of-the-art PEFTs.

Diagram below shows, Parameter count vs. performance for LoReFT and other PEFTs across four benchmarks when applied to LLaMA, Llama-2, and RoBERTa models. Despite training much fewer parameters than existing PEFTs, LoReFT achieves competitive or even state-of-the-art performance on all tasks.

Representation fine-tuning (ReFT): A Powerful Parameter-Efficient Way to Fine-tune Language Models (2)

Assumption for the model to which it can be applied can be summarized as following:

Assume target model is transformer based language model that produces that a contextualised tokens sequence representations .
Given a sequence of n input tokens, model first embeds these into a list of representations.
Then, m layers successively compute the j-th list of hidden representations h(j) as a function of the previous list of hidden representations
Finally, Language Model uses the final hidden representations h(m) to produce its predictions.

Motivation of ReFT stems from work on intervention-based model interpretability, which provides the foundational idea of modifying representation, rather than weights and can be understood from following theories/papers:

i) Linear representation hypothesis

Now before understanding interventions, first more foundational theory to understand will be linear representation hypothesis, which claims that concepts are encoded in linear subspaces of representations in neural networks.

ii) Interchange interventions

Moving on to further understand interventions, framework of causal abstraction [5], can be looked in to that proposes interchange interventions that causally establish the role of neural network components in implementing particular behaviours. Interchange interventions works as:

align variables in a causal model (e.g., a deterministic program or Bayesian network) with representations in a neural model .
train the neural model to match the counterfactual behavior of the causal model on a base input when aligned representations in both models are set to be the value they would be for a source input.

iii) Distributed interchange intervention

Further for testing if concept is encoded in a linear subspace of a representation, as claimed by the linear representation hypothesis, one can also use distributed interchange intervention[5], which:

Proposes Distributed alignment Search(DAS), which finds the alignment between high-level and low-level models using gradient descent rather than conducting a brute-force search.
allow individual neurons to play multiple distinct roles by analyzing representations in non-standard bases-distributed representations

Finally to conclude, authors of ReFT decided to use distributed interchange intervention operation to make a new parameter-efficient method for adapting language models for downstream tasks.

Distributed interchange intervention(DII) suggests a way to control model generations via interventions. So guiding intuition will be how to perform interventions that lead the model to accurately predict our task labels.

Low-rank Linear Subspace ReFT (LoReFT), is defined as:

Representation fine-tuning (ReFT): A Powerful Parameter-Efficient Way to Fine-tune Language Models (3)

In the equation, it uses learned projected source Rs = Wh + b. Intuitively, LoReFT edits the representation in the r-dimensional subspace spanned by the columns of R to take on the values obtained from our linear projection Wh +b.

For generation tasks, ReFT paper uses training objective of language modelling, focused on minimising the cross-entropy loss with teacher-forcing over all output positions.

For single-label classification tasks, ReFT paper adds a classification head Hθ(⋅) with parameters θ that takes the final-layer representation at the first token (CLS) as input and outputs a distribution over classes. Thereby working to minimise the cross-entropy loss of the target class y given input x.

ReFT Model architecture define a general notion of intervention, which basically means the modification of hidden representations during the model forward pass.

i) Intervention

Representation fine-tuning (ReFT): A Powerful Parameter-Efficient Way to Fine-tune Language Models (4)

Consists of three components:

a) intervention function Φ: with learned parameters ϕ

b) set of input positions P: that intervention is applied to.

c) layer L: at which intervention is applied

ii) ReFT method

Can be defined as set of f interventions.
For any two interventions Ij,Ik ∈ I such that they operate on the same layer Lj = Lk, their intervention positions must be disjoint, i.e. Pj ∩ Pk = ∅.
The parameters (ϕ1,…,ϕf) of all of the intervention functions are independent.
for any two interventions Ij,Ik ∈ I such that they operate on the same layer Lj = Lk, their intervention positions must be disjoint, i.e. Pj ∩ Pk = ∅. The parameters (ϕ1,…,ϕf) of all of the intervention functions are independent.

For evaluating LoReFT against other PEFTs, experiments were conducted across four diverse NLP benchmarks covering more than 20 datasets.

Benchmarks used for the experiment were:

i) Commonsense reasoning

Evaluations performed across eight commonsense reasoning datasets, including BoolQ [6], PIQA [7], SIQA [8], HellaSwag [9], WinoGrande [10], ARC-e, ARC-c [11], and OBQA[12]. The task is formulated as a multiple-choice problem.

Table below shows accuracy comparison of LLaMA-7B and LLaMA-13B against existing PEFT methods on eight commonsense reasoning datasets.

Representation fine-tuning (ReFT): A Powerful Parameter-Efficient Way to Fine-tune Language Models (5)

As it can be observed from the eval results above, LoREFT outperforms other PEFTs on accuracy scores across seven of the eight commonsense reasoning datasets used.

ii) Arithmetic reasoning

Evaluation performed across four datasets for math world problems, including AQuA [13], GSM8K [14], MAWPS [15], and SVAMP [16]. Task used here was that models need to generate chain-of-thought [17] before the final answer.

Table below shows Accuracy comparison of LLaMA-7B and LLaMA-13B against existing PEFT methods on four arithmetic reasoning datasets

Representation fine-tuning (ReFT): A Powerful Parameter-Efficient Way to Fine-tune Language Models (6)

As it can be observed from the results above, LoReFT performs significantly well across one dataset and on-par on other datasets as compared to other PEFTs.

iii) Instruction-following

Evaluates whether models can follow human instructions. Here, Ultrafeedback [18] was used as training data, and Alpaca-Eval v1.0 [19] evaluation dataset.

Representation fine-tuning (ReFT): A Powerful Parameter-Efficient Way to Fine-tune Language Models (7)

As it can be observed from the results above, LoReFT has significantly higher win rate as compared to other PEFTs.

iv) Natural language understanding

Evaluates across eight datasets from the GLUE benchmark [20] such as sentiment analysis and natural language inference.

Representation fine-tuning (ReFT): A Powerful Parameter-Efficient Way to Fine-tune Language Models (8)

As it can be observed from the results above, LoReFT outperforms on average across all evaluation.

ReFT paper also released pyReFT, a Python library made for training and sharing ReFTs.

This library is built on top of pyvene [21], a library for performing and training activation interventions on arbitrary PyTorch models.
Codebase : https://github.com/stanfordnlp/pyreft
PyPI release: https://pypi.org/project/pyreft/
Any pretrained LM available on HuggingFace is supported through pyreft for finetuning with ReFT methods, and finetuned models can be easily uploaded to HuggingFace.
following example shows steps to wrap a Llama-2 7B model with a single intervention on the residual stream output of the 19-th layer:

Representation fine-tuning (ReFT): A Powerful Parameter-Efficient Way to Fine-tune Language Models (9)

wrapped model then can be trained for downstream tasks.

Representation fine-tuning (ReFT): A Powerful Parameter-Efficient Way to Fine-tune Language Models (10)

More diverse models : LLaMA-family models was explored as part of the ReFT paper. Effectiveness of ReFT on other model families can be further explored for its effectiveness.
Additional design considerations: capabilities of ReFT not explored fully due to the large hyperparameter search space. Also power of orthogonality of learned subspace has yet to be explored.
ReFT, abstraction, and generation: power of ReFT may come from the fact that it creates new causal pathways or modifies the strength of some existing ones.So some exploration is needed studying effect of structured ReFTs to modify complex causal pathways in language models.
Evaluation practices in PEFT research: In ReFT paper hyperparameter-tune ReFT on development sets do not overlap with the test set.However, a considerable portion of the literature on PEFTs directly hill-climbs performance on test sets. This results in overfitting to specific tasks, which gives practitioners less certainty about the real-world performance of different methods and impedes fair comparison. So, more benchmarks needed to be introduced for evaluating PEFTs and ReFTs.

LoReFT achieves strong performance across benchmarks from four domains while being 10×–50× more efficient than prior state-of-the-art PEFTs.
LoReFT establishes new state-of-the-art performance on commonsense reasoning, instruction-following, and natural language understanding against the strongest PEFTs.