Skip to main content
Version: 2.30

ArangoEmbeddingRetriever

An embedding-based Retriever compatible with the ArangoDB Document Store.

Most common position in a pipeline1. After a Text Embedder and before a PromptBuilder in a RAG pipeline

2. The last component in a semantic search pipeline
Mandatory init variablesdocument_store: An instance of an ArangoDocumentStore
Mandatory run variablesquery_embedding: A vector representing the query (a list of floats)
Output variablesdocuments: A list of documents
API referenceArangoDB
GitHub linkhttps://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/arangodb
Package namearangodb-haystack

Overview

The ArangoEmbeddingRetriever retrieves documents from an ArangoDocumentStore using ArangoDB's AQL vector functions. It compares the query embedding with document embeddings and returns the most similar documents.

In addition to query_embedding, the retriever accepts optional filters to narrow the search space and top_k to limit the number of results. Both can be set at initialization and overridden per call to run().

The embedding dimension and similarity function (cosine, dot_product, or l2) are configured on the ArangoDocumentStore at initialization time.

Installation

shell
pip install arangodb-haystack

Ensure ArangoDB 3.12+ is running with the vector index enabled, for example via Docker:

shell
docker run -d -p 8529:8529 \
-e ARANGO_ROOT_PASSWORD=test-password \
arangodb:3.12 arangod --vector-index

Usage

On its own

python
from haystack import Document
from haystack_integrations.document_stores.arangodb import ArangoDocumentStore
from haystack_integrations.components.retrievers.arangodb import (
ArangoEmbeddingRetriever,
)

document_store = ArangoDocumentStore(
host="http://localhost:8529",
embedding_dimension=3,
recreate_collection=True,
)
document_store.write_documents(
[
Document(
content="There are over 7,000 languages spoken around the world today.",
embedding=[0.1, 0.2, 0.3],
),
Document(
content="Elephants have been observed to recognize themselves in mirrors.",
embedding=[0.8, 0.1, 0.5],
),
],
)

retriever = ArangoEmbeddingRetriever(document_store=document_store, top_k=1)
result = retriever.run(query_embedding=[0.1, 0.2, 0.3])
print(result["documents"][0].content)

In a pipeline

python
from haystack import Document, Pipeline
from haystack.document_stores.types import DuplicatePolicy
from haystack.components.embedders import (
SentenceTransformersDocumentEmbedder,
SentenceTransformersTextEmbedder,
)
from haystack_integrations.document_stores.arangodb import ArangoDocumentStore
from haystack_integrations.components.retrievers.arangodb import (
ArangoEmbeddingRetriever,
)

document_store = ArangoDocumentStore(
host="http://localhost:8529",
embedding_dimension=384,
recreate_collection=True,
)

documents = [
Document(content="There are over 7,000 languages spoken around the world today."),
Document(
content="Elephants have been observed to recognize themselves in mirrors.",
),
Document(
content="Bioluminescent waves can be seen in the Maldives and Puerto Rico.",
),
]

document_embedder = SentenceTransformersDocumentEmbedder(
model="sentence-transformers/all-MiniLM-L6-v2",
)
documents_with_embeddings = document_embedder.run(documents)

document_store.write_documents(
documents_with_embeddings["documents"],
policy=DuplicatePolicy.OVERWRITE,
)

query_pipeline = Pipeline()
query_pipeline.add_component(
"text_embedder",
SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2"),
)
query_pipeline.add_component(
"retriever",
ArangoEmbeddingRetriever(document_store=document_store, top_k=3),
)
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

result = query_pipeline.run(
{"text_embedder": {"text": "How many languages are there?"}},
)
print(result["retriever"]["documents"][0].content)