ValkeyEmbeddingRetriever
This is an embedding Retriever compatible with the Valkey Document Store.
| Most common position in a pipeline | 1. After a Text Embedder and before a ChatPromptBuilder or PromptBuilder in a RAG pipeline 2. The last component in a semantic search pipeline 3. After a Text Embedder and before an ExtractiveReader in an extractive QA pipeline |
| Mandatory init variables | document_store: An instance of a ValkeyDocumentStore |
| Mandatory run variables | query_embedding: A list of floats |
| Output variables | documents: A list of documents |
| API reference | Valkey |
| GitHub link | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/valkey |
Overview
The ValkeyEmbeddingRetriever is an embedding-based Retriever compatible with the ValkeyDocumentStore. It compares the query and Document embeddings and fetches the Documents most relevant to the query from the ValkeyDocumentStore based on vector similarity.
Parameters
When using the ValkeyEmbeddingRetriever in your system, ensure the query and Document embeddings are available. You can do so by adding a Document embedder to your indexing pipeline and a text embedder to your query pipeline.
In addition to the query_embedding, the ValkeyEmbeddingRetriever accepts other optional parameters, including top_k (the maximum number of Documents to retrieve) and filters to narrow down the search space.
Usage
Installation
To start using Valkey with Haystack, install the package with:
On its own
This Retriever needs an instance of ValkeyDocumentStore and indexed Documents to run.
from haystack_integrations.document_stores.valkey import ValkeyDocumentStore
from haystack_integrations.components.retrievers.valkey import ValkeyEmbeddingRetriever
document_store = ValkeyDocumentStore(
nodes_list=[("localhost", 6379)],
index_name="my_documents",
embedding_dim=768,
distance_metric="cosine",
)
retriever = ValkeyEmbeddingRetriever(document_store=document_store)
# Using a fake vector to keep the example simple
retriever.run(query_embedding=[0.1] * 768)
In a Pipeline
from haystack import Document, Pipeline
from haystack.components.embedders import (
SentenceTransformersDocumentEmbedder,
SentenceTransformersTextEmbedder,
)
from haystack.components.writers import DocumentWriter
from haystack_integrations.document_stores.valkey import ValkeyDocumentStore
from haystack_integrations.components.retrievers.valkey import ValkeyEmbeddingRetriever
document_store = ValkeyDocumentStore(
nodes_list=[("localhost", 6379)],
index_name="my_documents",
embedding_dim=768,
distance_metric="cosine",
)
documents = [
Document(content="There are over 7,000 languages spoken around the world today."),
Document(
content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors.",
),
Document(
content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.",
),
]
indexing = Pipeline()
indexing.add_component("embedder", SentenceTransformersDocumentEmbedder())
indexing.add_component("writer", DocumentWriter(document_store))
indexing.connect("embedder.documents", "writer.documents")
indexing.run({"embedder": {"documents": documents}})
query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
query_pipeline.add_component(
"retriever",
ValkeyEmbeddingRetriever(document_store=document_store),
)
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
query = "How many languages are there?"
result = query_pipeline.run({"text_embedder": {"text": query}})
print(result["retriever"]["documents"][0])
For a full RAG example with ValkeyEmbeddingRetriever, see the ValkeyDocumentStore documentation.