TextEmbeddingRetriever
Wraps an embedding-based retriever with a text embedder into a single component that accepts a text query.
| Most common position in a pipeline | In query pipelines: In a RAG pipeline, before a ChatPromptBuilder In a semantic search pipeline, as the last component As a retriever inside MultiRetriever |
| Mandatory init variables | retriever: An embedding-based Retrievertext_embedder: A Text Embedder component |
| Mandatory run variables | query: A query string |
| Output variables | documents: A list of retrieved documents sorted by relevance score |
| API reference | Retrievers |
| GitHub link | https://github.com/deepset-ai/haystack/blob/main/haystack/components/retrievers/text_embedding_retriever.py |
| Package name | haystack-ai |
Overview
TextEmbeddingRetriever bundles a text embedder and an embedding-based retriever into a single component. It accepts a plain text query, converts it to an embedding internally, and returns documents sorted by relevance score.
You can use it anywhere an embedding-based retriever fits: in RAG pipelines before a prompt builder, as the final component in a semantic search pipeline, or as a drop-in retriever inside MultiRetriever.
Usage
On its own
python
from haystack import Document
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.document_stores.types import DuplicatePolicy
from haystack.components.embedders import (
SentenceTransformersDocumentEmbedder,
SentenceTransformersTextEmbedder,
)
from haystack.components.retrievers import (
InMemoryEmbeddingRetriever,
TextEmbeddingRetriever,
)
from haystack.components.writers import DocumentWriter
documents = [
Document(
content="Renewable energy is energy that is collected from renewable resources.",
),
Document(
content="Solar energy is a type of green energy that is harnessed from the sun.",
),
Document(
content="Wind energy is another type of green energy that is generated by wind turbines.",
),
Document(
content="Geothermal energy is heat that comes from the sub-surface of the earth.",
),
]
doc_store = InMemoryDocumentStore()
doc_embedder = SentenceTransformersDocumentEmbedder(
model="sentence-transformers/all-MiniLM-L6-v2",
)
doc_writer = DocumentWriter(document_store=doc_store, policy=DuplicatePolicy.SKIP)
doc_writer.run(documents=doc_embedder.run(documents)["documents"])
retriever = TextEmbeddingRetriever(
retriever=InMemoryEmbeddingRetriever(document_store=doc_store, top_k=2),
text_embedder=SentenceTransformersTextEmbedder(
model="sentence-transformers/all-MiniLM-L6-v2",
),
)
result = retriever.run(query="Geothermal energy")
for doc in result["documents"]:
print(f"Content: {doc.content}, Score: {doc.score}")
As part of MultiRetriever
TextEmbeddingRetriever is most commonly used as one of the retrievers inside a MultiRetriever:
python
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack.components.retrievers import (
InMemoryBM25Retriever,
InMemoryEmbeddingRetriever,
)
from haystack.components.retrievers import MultiRetriever, TextEmbeddingRetriever
retriever = MultiRetriever(
retrievers={
"bm25": InMemoryBM25Retriever(document_store=doc_store),
"embedding": TextEmbeddingRetriever(
retriever=InMemoryEmbeddingRetriever(document_store=doc_store),
text_embedder=SentenceTransformersTextEmbedder(
model="sentence-transformers/all-MiniLM-L6-v2",
),
),
},
)