Most common position in a pipeline	Flexible
Mandatory init variables	"labels": A list of labels for classification "token": The Hugging Face API token. Can be set with `HF_API_TOKEN` or `HF_TOKEN` env var.
Mandatory run variables	“text”: The text to be routed to one of the specified outputs based on which label it has been categorized into
Output variables	“documents”: A dictionary with the label as key and the text as value
API reference	Routers
GitHub link	https://github.com/deepset-ai/haystack/blob/main/haystack/components/routers/zero_shot_text_router.py

Overview

TransformersZeroShotTextRouter routes text input to various output connections based on its categorization label. This feature is especially beneficial for directing queries to appropriate components within a pipeline, according to their specific categories. Users can define the labels for this categorization process.

TransformersZeroShotTextRouter uses the MoritzLaurer/deberta-v3-base-zeroshot-v1.1-all-33 zero-shot text classification model by default. You can set another model of your choosing with the model parameter.

To use TransformersZeroShotTextRouter, you need to provide the mandatory labels parameter – a list of strings of possible class labels to classify each sequence into.

To see the full list of parameters, check out our API reference.

Usage

On its own

The TransformersZeroShotTextRouter isn’t very effective on its own, as its main strength lies in working within a pipeline. The component's true potential is unlocked when it is integrated into a pipeline, where it can efficiently route text to the most appropriate components. Please see the following section for a complete example of usage.

In a pipeline

Below is an example of a simple pipeline that routes input text to an appropriate route in the pipeline.

We first create an InMemoryDocumentStore and populate it with documents about Germany and France, embedding these documents using SentenceTransformersDocumentEmbedder.

We then create a retrieving pipeline with the TransformersZeroShotTextRouter to categorize an incoming text as either "passage" or "query" based on these predefined labels. Depending on the categorization, the text is then processed by appropriate Embedders tailored for passages and queries, respectively. These Embedders generate embeddings that are used by InMemoryEmbeddingRetriever to find relevant documents in the Document Store.

Finally, the pipeline is executed with a sample text: "What is the capital of Germany?” which categorizes this input text as “query” and routes it to Query Embedder and subsequently Query Retriever to return the relevant results.

from haystack import Document
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.core.pipeline import Pipeline
from haystack.components.routers import TransformersZeroShotTextRouter
from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder
from haystack.components.retrievers import InMemoryEmbeddingRetriever

document_store = InMemoryDocumentStore()
doc_embedder = SentenceTransformersDocumentEmbedder(model="intfloat/e5-base-v2")
doc_embedder.warm_up()
docs = [
    Document(
        content="Germany, officially the Federal Republic of Germany, is a country in the western region of "
                "Central Europe. The nation's capital and most populous city is Berlin and its main financial centre "
                "is Frankfurt; the largest urban area is the Ruhr."
    ),
    Document(
        content="France, officially the French Republic, is a country located primarily in Western Europe. "
                "France is a unitary semi-presidential republic with its capital in Paris, the country's largest city "
                "and main cultural and commercial centre; other major urban areas include Marseille, Lyon, Toulouse, "
                "Lille, Bordeaux, Strasbourg, Nantes and Nice."
    )
]
docs_with_embeddings = doc_embedder.run(docs)
document_store.write_documents(docs_with_embeddings["documents"])

p = Pipeline()
p.add_component(instance=TransformersZeroShotTextRouter(labels=["passage", "query"]), name="text_router")
p.add_component(
    instance=SentenceTransformersTextEmbedder(model="intfloat/e5-base-v2", prefix="passage: "),
    name="passage_embedder"
)
p.add_component(
    instance=SentenceTransformersTextEmbedder(model="intfloat/e5-base-v2", prefix="query: "),
    name="query_embedder"
)
p.add_component(
    instance=InMemoryEmbeddingRetriever(document_store=document_store),
    name="query_retriever"
)
p.add_component(
    instance=InMemoryEmbeddingRetriever(document_store=document_store),
    name="passage_retriever"
)

p.connect("text_router.passage", "passage_embedder.text")
p.connect("passage_embedder.embedding", "passage_retriever.query_embedding")
p.connect("text_router.query", "query_embedder.text")
p.connect("query_embedder.embedding", "query_retriever.query_embedding")

# Query Example
result = p.run({"text_router": {"text": "What is the capital of Germany?"}})
print(result)

>>{'query_retriever': {'documents': [Document(id=32d393dd8ee60648ae7e630cfe34b1922e747812ddf9a2c8b3650e66e0ecdb5a, 
content: 'Germany, officially the Federal Republic of Germany, is a country in the western region of Central E...', 
score: 0.8625669285150891), Document(id=c17102d8d818ce5cdfee0288488c518f5c9df238a9739a080142090e8c4cb3ba, 
content: 'France, officially the French Republic, is a country located primarily in Western Europe. France is ...', 
score: 0.7637571978602222)]}}

Additional References

📓 Tutorial: Query Classification with TransformersTextRouter and TransformersZeroShotTextRouter