Introduction to Haystack

Haystack is an open-source framework for building production-ready LLM applications, retrieval-augmented generative pipelines and state-of-the-art search systems that work intelligently over large document collections. Learn more about Haystack and how it works.


Haystack 2.0

Looking for documentation for Haystack 2.0-Beta? Visit the docs here!


Get Started

To skip the introductions and go directly to installing and creating a search app, see Get Started.

Haystack is an end-to-end framework that you can use to build powerful and production-ready pipelines with Large Language Models (LLMs) for different search use cases. Whether you want to perform retrieval-augmented generation (RAG), question answering, or semantic document search, you can use the state-of-the-art LLMs and NLP models in Haystack to provide custom search experiences and make it possible for your users to query in natural language. Haystack is built in a modular fashion so that you can combine the best technology from OpenAI, Cohere, SageMaker, and other open source projects, like Hugging Face's Transformers, Elasticsearch, or Milvus.

The Building Blocks of Haystack

Haystack is geared towards building great search systems that are customizable and production-ready. There are a lot of components you can use to build them.


Haystack offers various nodes, each performing different kinds of tasks. These are often powered by the latest LLMs and transformer models. Code-wise, they are Python classes with methods you can directly call. For example, to perform question answering with a PromptNode, all you need to do is provide it with documents, a PromptTemplate designed for question answering, and a query.

Working on this level with Haystack nodes is a hands-on approach. It gives you a very direct way of manipulating inputs and inspecting outputs. This can be useful for exploration, prototyping, and debugging.

Below is an example of a single PromptNode that uses the "deepset/question-answering" prompt from the PromptHub:

from haystack.nodes import PromptNode

prompt_node = PromptNode(model_name_or_path="gpt-4", api_key='YOUR_OPENAI_KEY')
result = prompt_node.prompt(query="What is Haystack?", 


Haystack is built on the idea that great systems are more than the sum of their parts. By combining different nodes, you can create powerful and customizable systems. The pipeline is the key to making this modular approach work.

When adding nodes to a pipeline, you can define how data flows from one node to the next, until the pipeline has reached a final result. On top of simplifying data flow logic, this also allows for complex routing options, such as those involving decision nodes.

Below is an example of a PromptNode preceded by a Retriever which will provide it with the documents. This is a simple example of a Retrieval-Augmented Generative pipeline (RAG).

from haystack import Pipeline

p = Pipeline()
p.add_node(component=retriever, name="Retriever", inputs=["Query"])
p.add_node(component=prompt_node, name="PromptNode", inputs=["Retriever"])
result ="What did Einstein work on?")

Why Use Pipelines?

The value of chaining different nodes together is clearest when looking at a RAG Pipeline. It's one of the most common systems built in Haystack. It harnesses the language generation capabilities of LLMs with the help of a PromptNode but also provides the relevant context to LLMs from large document bases with the help of the Retriever.

The PromptNode, which is an interface with an LLM (like GPT-4, Llama2, and so on), allows us to customize the mode of interaction we need with an LLM. It does this with the use of the PromptTemplate. As a result, it generates a response based on what the PromptTemplate was designed to do. For example, you may use a PromptTemplate that is designed to receive a query and answer the query based on some documents. The PromptNode will ensure that the LLM you chose to use receives the correct instructions.

The Retriever assists the PromptNode by acting as a lightweight filter that reduces the amount of context the LLM needs to accurately respond to a query. It scans through all documents in the database, quickly identifies the relevant ones, and dismisses the irrelevant ones. It ends up with a small set of candidate documents that it passes on to the PromptNode.

Another example of a powerful and lightweight component in Haystack is the Reader, also known as Closed-Domain Question Answering systems. They wrap powerful models that closely analyze documents and perform extractive question answering. The Readers in Haystack are trained from the latest transformer-based language models and can be significantly sped up using GPU acceleration. But it's not currently feasible to use the Reader directly on a large collection of documents.

Haystack also provides a set of ready-made pipelines for a set of simple NLP tasks. Below is an example of a ready-made pipeline that combines a retriever with a reader to create an ExtractiveQAPipeline.

p = ExtractiveQAPipeline(reader, retriever)
result ="What is the capital of Australia?")

You can't do question answering with a Retriever only. And with just a Reader, it would be unacceptably slow. The power of this system comes from the combination of the two nodes.


Question Answering Tutorial

To start building your first generative question answering system, see our Generative QA with PromptNode Tutorial

To start building your first extractive question answering system, see our Introductory Tutorial.

These pipelines are by no means the only ones that Haystack offers, but they're perhaps the most instructive for showing the gains from combining nodes. Many of the synergistic node combinations are covered by Ready Made Pipelines, but we're sure there are many still to be discovered!


The Agent is a very versatile, prompt-based component that uses a large language model and employs reasoning to answer complex questions beyond the capabilities of extractive or generative question answering. It's particularly useful for multi-hop question answering scenarios where it must combine information from multiple sources to arrive at an answer.

When the Agent receives a query, it forms a plan of action consisting of steps it has to complete. It then starts with choosing the right tool and proceeds using the output from each tool as input for the next. It uses the tools in a loop until it reaches the final answer.

The Agent can use Haystack pipelines, nodes, and web queries as tools to amplify its capabilities to solve the most complex search tasks.

agent = Agent(
    final_answer_pattern=r"Final Answer\s*:\s*(.*)",

hotpot_questions = [
    "What year was the father of the Princes in the Tower born?",
    "Name the movie in which the daughter of Noel Harrison plays Violet Trefusis.",
    "Where was the actress who played the niece in the Priest film born?",
    "Which author is English: John Braine or Studs Terkel?",


To deploy a search system, you need more than just a Python script. You need a service that can stay on, handle requests as they come in, and be callable by many different applications. For this, Haystack comes with a REST API designed to work in production environments.

When set up like this, you can load Pipelines from YAML files, interact with Pipelines via HTTP requests, and connect Haystack to user-facing GUIs.

curl -X 'POST' \
  '' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "query": "Who is the father of Arya Stark?",
  "params": {}

Related Links