Auto-regressive models

What are auto-regressive models?

Auto-regressive models are a class of generative models that predict the probability distribution of a sequence of tokens by conditioning each token’s probability distribution on the tokens that precede it in the sequence. Auto-regressive models are commonly used for tasks such as language modeling, machine translation, and image captioning.

What can auto-regressive models do?

Auto-regressive models generate new sequences of tokens by sampling from the predicted probability distributions conditioned on the preceding tokens in the sequence. For example, a language model trained with an auto-regressive architecture can generate coherent and diverse sentences by sampling from the predicted distribution of the next word given the preceding words in the sentence.

Some benefits of using auto-regressive models

Auto-regressive models offer several benefits for generative tasks:

  • Flexibility: Auto-regressive models can generate sequences of arbitrary length and are not restricted to fixed-length inputs or outputs.

  • Diversity: Auto-regressive models can generate diverse outputs by sampling from the predicted probability distributions, enabling the generation of multiple plausible outputs for a given input.

  • Adaptability: Auto-regressive models can be fine-tuned for specific tasks or domains, allowing for the generation of high-quality outputs that are tailored to a particular use case.

More resources to learn more about auto-regressive models

To learn more about auto-regressive models and their applications, you can explore the following resources:

  • The Illustrated Transformer, an interactive guide to the Transformer model, which is a popular auto-regressive architecture

  • Saturn Cloud for free cloud compute

  • OpenAI’s GPT-3 model, one of the largest and most powerful auto-regressive language models to date

  • The Attention Mechanism, a key component of many auto-regressive models that enables them to selectively focus on different parts of the input sequence

  • The Image Transformer, an auto-regressive model that generates image captions by predicting the next word in a sentence given the preceding words and the image features