DocumentationAPI ReferenceTutorialsGitHub Code ExamplesDiscord Community

Language Models

With such a variety of readily available fine-tuned models, it's difficult to choose the best one for your use case. This guide tells you about different types of models and when to use them.

What Are Language Models?

Language models are neural networks trained on large amounts of text to learn how a language works. If you think about the way you're using your native tongue, you'll observe that it's intuitive. You don't have to think about grammar rules when speaking, you just know and apply them intuitively. A language model is an attempt to enable machines to use the language in a similar way humans do.

Language models are no magic. The way a model works is that it decides how probable it is that a word is valid in a particular word sequence. By valid, we mean that it resembles the way humans talk, not that it complies with any grammar rules.

The latest advance in NLP is the transformer architecture which led to top-performing models, such as BERT. It differs from previously used architectures in that instead of processing passages of text word-by-word, transformer-based models can process whole sentences at once, which makes them much faster in training but also in inference. They can also learn the relations between different words in a sentence and how these relations affect the expected model output—a functionality called attention mechanism.

Fine-Tuned Models

A fine-tuned model is a language model that was trained on a large dataset to perform a specific NLP task, such as text classification or question answering. A lot of fine-tuned models are publicly available for free on model hubs such as Hugging Face. This means that you can just grab them from there and use them in your NLP app as a starting point. You no longer need to train your own model, which takes a lot of time and a lot of data. If you find that the fine-tuned model you're using is not good enough, you can always just fine-tune it, which is much easier than training it from scratch.

Large Language Models (LLMs)

Large language models (LLMs) are huge models trained on enormous amounts of data. Interacting with such a model resembles talking to another person. These models have general knowledge of the world. You can ask them almost anything, and they'll be able to answer.

LLMs are trained to perform many NLP tasks with little training data. Some examples of tasks they can handle are language translation, summarization, question answering, or text generation.

These models are called "large" because their architecture is much more complicated than the architecture of regular models. It consists of hundreds of layers and billions of parameters. This makes these models take up a lot of computational resources to run. They are also trained on a very large dataset, typically consisting of billions or trillions of words.

The process used for training LLMs is called unsupervised learning. The models are not explicitly given task-specific labels or examples to learn from. Instead, they learn by predicting the next word in a sequence of words based on the context. During training, they memorize a lot of the training data. That's where their knowledge comes from.

LLMs have their limitations as well. Their knowledge isn't updated, so they won't be able to answer questions about very recent events or questions where the answer changes over time. They'll answer with something that was correct in the past. An example question like that is "Who's the president of the USA?".

Some of the most well-known LLMs include flan-t5-base, flan-paLM, chinchilla, and GPT-3 variants, such as text-davinci-003.

You can use LLMs in our Haystack pipelines through PromptNode and OpenAIAnswerGenerator.

Model Naming Conventions

If you're new to models, it's helpful to understand the model naming convention as already the name gives you a hint about what the model can do. The first part of the name is the user that trained and uploaded the model. Then, there is the architecture that the model uses and the size of the model. The last part of the name is the dataset used to train the model. Here's a typical model name:


deepset is the user who trained and uploaded the model, roberta is the model architecture, base is the size, and squad2 is the dataset on which the model was trained.

Model Types

You don't need to have in-depth knowledge of the architecture to be able to use a model but it's helpful to have a rough overview of the most common model types out there and their key differences. This will help you to better understand which models to try out in your own pipeline.


The different generations of language models.

Model Size

Large models have better accuracy but it comes at a cost of speed and the resources it needs to run. Large models can be significantly slower than smaller ones and they use up a lot of your graphic card memory or even exceed its limit. Distilled models are a good compromise, as they preserve accuracy similar to the larger models but are much smaller and easier to deploy. To learn more about it, see Model Distillation.

Training Dataset

There are standard datasets used when training a model for a particular task. For example, SQuAD stands for Stanford Question Answering Dataset and is the standard dataset used for training English-language models for question answering.



When searching for a model for your task, you can check what dataset is typically used for it and then search for a model that was trained on this dataset.

Models in Haystack

Haystack loads models directly from Hugging Face. You can use publicly available models but also your private ones. To use a private model, pass your Hugging Face user access token in the use_auth_token parameter.

If you're trying to find the right model on Hugging Face, go to the Models menu and you'll find a list of tasks by which you can search for your model.


Different model categories on the Hugging Face Model Hub.

There are two types of pipeline nodes that use models: dense retrievers and readers. Readers use models for question answering, while retrievers use sentence similarity or DPR models.For information about choosing the models for pipeline nodes, see EmbeddingRetriever, DensePassageRetriever, and Reader.

To use a model, simply provide its Hugging Face location as a parameter to the node and deepset Cloud will take care of loading it. For example:

- name: Retriever 
    type: EmbeddingRetriever 
      document_store: DocumentStore
      embedding_model: sentence-transformers/multi-qa-mpnet-base-dot-v1 
      model_format: sentence_transformers
      pooling_strategy: cls_token 
      top_k: 20