Improving instruction following capabilities using self-alignment

4 minute read

Published:

The introduction of GPT-3 completely revolutionized natural language processing by enabling few-shot learning through prompt engineering rather than fine-tuning. However, language models still struggle at zero-shot performance on tasks dissimilar from their pretraining data.

FLAN

As Wei et al. [1] discussed, instruction tuning significantly improves zero-shot capabilities by training models to follow natural language instructions with the FLAN model. Ouyang et al. [2] further combined instruction tuning with reinforcement learning from human feedback to create InstructGPT. Remarkably, an InstructGPT model with just 1.3 billion parameters substantially outperformed the 175 billion parameter GPT-3 at zero-shot text generation. This demonstrates the efficacy of instruction tuning for few-shot and zero-shot learning. By training models to effectively leverage instructions, we can vastly improve performance on new tasks without requiring large amounts of training data.

However, annotating instruction following datasets is challenging. Instructions often contain very specific phrasing and structure that produce high quality outputs. If instruction tuning enables aligning language models to follow directions, how can we generate diverse, effective instructions without extensive human effort?

Self-alignment presents a promising solution. Self-alignment leverages the model to refine itself by rephrasing its own inputs into optimized instructions and critiques. Li et al. [3] proposes a novel model called Humpback which combining self-alignment with backtranslation and creates a scalable framework for instruction tuning. Backtranslation, a well-established technique in machine translation, works like a game of ping pong between languages. An English sentence can be “pinged” to French, then “ponged” back to English. This recursive process helps models strengthen translation abilities in both directions. Similarly, self-alignment uses iterative refinement of instructions to build language models capable of generalizable instruction following. Let’s dive into the details

back_translation

Self-alignment with backtranslation

Similar to machine translation, our two languages are high quality instructions (these are unknown) and unlabelled human-written web corpus. The authors also curate a small seed dataset which contains high quality instructions and its corresponding text, so that it can guide the self-alignment. To begin this process of self alignment, all we need are the following:

  • Base language model (like Llama 7b, 33b, 65b)
  • Small seed data (instructions, output)
    • Can be used to give output given instruction or instruction given output
  • Web corpus: unlabelled data (all topics; no instructions)
    • Is cleaned (deduplication, length and noise filtering)

humpack

Once we have the requirements, we follow these two steps:

Self augmentation:

  • Generate instructions for unlabelled data, i.e. the web corpus, to produce candidate training data of (instruction, output) pairs for instruction tuning.
  • Create a model $M : p(x|y)$, where $x$ is the instruction and $y$ is the text output.
  • Unlabelled dataset can generate $\hat{x}$ using $M : A {(\hat{x}, y)}$, the augmented dataset.

Self curation:

  • Self-select high quality demonstration examples as training data to finetune the base model to follow instructions.
  • Start with an instruction tuned model $M_0$, following seed instruction set.
  • Predict a quality score $a_i$ (use prompting to rate quality with score 1-5) for the augmented examples. Keep a threshold, and filter out top k generations
  • These filtered examples are used to iteratively tune model Mi with $A_k^{(i-1)}$ augmented set

These two steps are iteratively run till we get the desired outputs like backtranslation. To differentiate between the seed instructions and the augmented instructions, prefix prompts are added.

Observations

Augmented instructions tends to have longer lengths but higher quality instructions tend to be closer in length to the original seed data instructions. Augmented data increases the task diversity as compared to the seed data instructions

diversity

A seed set is very important to help self-curation of high quality instructions. Starting with a pretrained model does not improve performance even with high data quantity. Increasing the number of examples in this high quality seed set will eventually increase the quality of the augmented dataset

seed

AlpacaEval [4] measures the comparison between two models using a metric called pairwise win rate using GPT-4. The win rate quantifies the number of times a model produces a satisfactory output on a task. Self-augmentation and self-curation (Humpback) outperforms non-distilled models by a large margin, while it performs similarly to distilled or proprietary models. With human evaluation, annotators mostly prefer Humpback generations.

quality

One major limitation in this methodology is that since the unlabelled data comes from the web, finetuning the model might amplify the bias from the web data. Along with bias, safety instructions should also be added with the seed data so that the augmented instructions can produce more harmless responses.