TransformersZeroShotTextRouter
Use this component to route text input to various output connections based on its user-defined categorization label.
Most common position in a pipeline | Flexible |
Mandatory init variables | "labels": A list of labels for classification "token": The Hugging Face API token. Can be set with HF_API_TOKEN or HF_TOKEN env var. |
Mandatory run variables | “text”: The text to be routed to one of the specified outputs based on which label it has been categorized into |
Output variables | “documents”: A dictionary with the label as key and the text as value |
API reference | Routers |
GitHub link | https://github.com/deepset-ai/haystack/blob/main/haystack/components/routers/zero_shot_text_router.py |
Overview
TransformersZeroShotTextRouter
routes text input to various output connections based on its categorization label. This feature is especially beneficial for directing queries to appropriate components within a pipeline, according to their specific categories. Users can define the labels for this categorization process.
TransformersZeroShotTextRouter
uses the MoritzLaurer/deberta-v3-base-zeroshot-v1.1-all-33
zero-shot text classification model by default. You can set another model of your choosing with the model
parameter.
To use TransformersZeroShotTextRouter
, you need to provide the mandatory labels
parameter – a list of strings of possible class labels to classify each sequence into.
To see the full list of parameters, check out our API reference.
Usage
On its own
The TransformersZeroShotTextRouter
isn’t very effective on its own, as its main strength lies in working within a pipeline. The component's true potential is unlocked when it is integrated into a pipeline, where it can efficiently route text to the most appropriate components. Please see the following section for a complete example of usage.
In a pipeline
Below is an example of a simple pipeline that routes input text to an appropriate route in the pipeline.
We first create an InMemoryDocumentStore
and populate it with documents about Germany and France, embedding these documents using SentenceTransformersDocumentEmbedder
.
We then create a retrieving pipeline with the TransformersZeroShotTextRouter
to categorize an incoming text as either "passage" or "query" based on these predefined labels. Depending on the categorization, the text is then processed by appropriate Embedders tailored for passages and queries, respectively. These Embedders generate embeddings that are used by InMemoryEmbeddingRetriever
to find relevant documents in the Document Store.
Finally, the pipeline is executed with a sample text: "What is the capital of Germany?” which categorizes this input text as “query” and routes it to Query Embedder and subsequently Query Retriever to return the relevant results.
from haystack import Document
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.core.pipeline import Pipeline
from haystack.components.routers import TransformersZeroShotTextRouter
from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder
from haystack.components.retrievers import InMemoryEmbeddingRetriever
document_store = InMemoryDocumentStore()
doc_embedder = SentenceTransformersDocumentEmbedder(model="intfloat/e5-base-v2")
doc_embedder.warm_up()
docs = [
Document(
content="Germany, officially the Federal Republic of Germany, is a country in the western region of "
"Central Europe. The nation's capital and most populous city is Berlin and its main financial centre "
"is Frankfurt; the largest urban area is the Ruhr."
),
Document(
content="France, officially the French Republic, is a country located primarily in Western Europe. "
"France is a unitary semi-presidential republic with its capital in Paris, the country's largest city "
"and main cultural and commercial centre; other major urban areas include Marseille, Lyon, Toulouse, "
"Lille, Bordeaux, Strasbourg, Nantes and Nice."
)
]
docs_with_embeddings = doc_embedder.run(docs)
document_store.write_documents(docs_with_embeddings["documents"])
p = Pipeline()
p.add_component(instance=TransformersZeroShotTextRouter(labels=["passage", "query"]), name="text_router")
p.add_component(
instance=SentenceTransformersTextEmbedder(model="intfloat/e5-base-v2", prefix="passage: "),
name="passage_embedder"
)
p.add_component(
instance=SentenceTransformersTextEmbedder(model="intfloat/e5-base-v2", prefix="query: "),
name="query_embedder"
)
p.add_component(
instance=InMemoryEmbeddingRetriever(document_store=document_store),
name="query_retriever"
)
p.add_component(
instance=InMemoryEmbeddingRetriever(document_store=document_store),
name="passage_retriever"
)
p.connect("text_router.passage", "passage_embedder.text")
p.connect("passage_embedder.embedding", "passage_retriever.query_embedding")
p.connect("text_router.query", "query_embedder.text")
p.connect("query_embedder.embedding", "query_retriever.query_embedding")
# Query Example
result = p.run({"text_router": {"text": "What is the capital of Germany?"}})
print(result)
>>{'query_retriever': {'documents': [Document(id=32d393dd8ee60648ae7e630cfe34b1922e747812ddf9a2c8b3650e66e0ecdb5a,
content: 'Germany, officially the Federal Republic of Germany, is a country in the western region of Central E...',
score: 0.8625669285150891), Document(id=c17102d8d818ce5cdfee0288488c518f5c9df238a9739a080142090e8c4cb3ba,
content: 'France, officially the French Republic, is a country located primarily in Western Europe. France is ...',
score: 0.7637571978602222)]}}
Additional References
📓 Tutorial: Query Classification with TransformersTextRouter and TransformersZeroShotTextRouter
Updated 3 months ago