Skip to main content
Version: 2.29-unstable

TextEmbeddingRetriever

Wraps an embedding-based retriever with a text embedder into a single component that accepts a text query.

Most common position in a pipelineIn query pipelines:
In a RAG pipeline, before a ChatPromptBuilder
In a semantic search pipeline, as the last component
As a retriever inside MultiRetriever
Mandatory init variablesretriever: An embedding-based Retriever
text_embedder: A Text Embedder component
Mandatory run variablesquery: A query string
Output variablesdocuments: A list of retrieved documents sorted by relevance score
API referenceRetrievers
GitHub linkhttps://github.com/deepset-ai/haystack/blob/main/haystack/components/retrievers/text_embedding_retriever.py
Package namehaystack-ai

Overview

TextEmbeddingRetriever bundles a text embedder and an embedding-based retriever into a single component. It accepts a plain text query, converts it to an embedding internally, and returns documents sorted by relevance score.

You can use it anywhere an embedding-based retriever fits: in RAG pipelines before a prompt builder, as the final component in a semantic search pipeline, or as a drop-in retriever inside MultiRetriever.

Usage

On its own

python
from haystack import Document
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.document_stores.types import DuplicatePolicy
from haystack.components.embedders import (
SentenceTransformersDocumentEmbedder,
SentenceTransformersTextEmbedder,
)
from haystack.components.retrievers import (
InMemoryEmbeddingRetriever,
TextEmbeddingRetriever,
)
from haystack.components.writers import DocumentWriter

documents = [
Document(
content="Renewable energy is energy that is collected from renewable resources.",
),
Document(
content="Solar energy is a type of green energy that is harnessed from the sun.",
),
Document(
content="Wind energy is another type of green energy that is generated by wind turbines.",
),
Document(
content="Geothermal energy is heat that comes from the sub-surface of the earth.",
),
]

doc_store = InMemoryDocumentStore()
doc_embedder = SentenceTransformersDocumentEmbedder(
model="sentence-transformers/all-MiniLM-L6-v2",
)
doc_writer = DocumentWriter(document_store=doc_store, policy=DuplicatePolicy.SKIP)
doc_writer.run(documents=doc_embedder.run(documents)["documents"])

retriever = TextEmbeddingRetriever(
retriever=InMemoryEmbeddingRetriever(document_store=doc_store, top_k=2),
text_embedder=SentenceTransformersTextEmbedder(
model="sentence-transformers/all-MiniLM-L6-v2",
),
)

result = retriever.run(query="Geothermal energy")
for doc in result["documents"]:
print(f"Content: {doc.content}, Score: {doc.score}")

As part of MultiRetriever

TextEmbeddingRetriever is most commonly used as one of the retrievers inside a MultiRetriever:

python
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack.components.retrievers import (
InMemoryBM25Retriever,
InMemoryEmbeddingRetriever,
)
from haystack.components.retrievers import MultiRetriever, TextEmbeddingRetriever

retriever = MultiRetriever(
retrievers={
"bm25": InMemoryBM25Retriever(document_store=doc_store),
"embedding": TextEmbeddingRetriever(
retriever=InMemoryEmbeddingRetriever(document_store=doc_store),
text_embedder=SentenceTransformersTextEmbedder(
model="sentence-transformers/all-MiniLM-L6-v2",
),
),
},
)