Version: 3.1-unstable

ArangoEmbeddingRetriever

An embedding-based Retriever compatible with the ArangoDB Document Store.


Most common position in a pipeline	1. After a Text Embedder and before a `PromptBuilder` in a RAG pipeline 2. The last component in a semantic search pipeline
Mandatory init variables	`document_store`: An instance of an ArangoDocumentStore
Mandatory run variables	`query_embedding`: A vector representing the query (a list of floats)
Output variables	`documents`: A list of documents
API reference	ArangoDB
GitHub link	https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/arangodb
Package name	`arangodb-haystack`

Overview

The ArangoEmbeddingRetriever retrieves documents from an ArangoDocumentStore using ArangoDB's AQL vector functions. It compares the query embedding with document embeddings and returns the most similar documents.

In addition to query_embedding, the retriever accepts optional filters to narrow the search space and top_k to limit the number of results. Both can be set at initialization and overridden per call to run().

The embedding dimension and similarity function (cosine, dot_product, or l2) are configured on the ArangoDocumentStore at initialization time.

Installation

shell

pip install arangodb-haystack

Ensure ArangoDB 3.12+ is running with the vector index enabled, for example via Docker:

shell

docker run -d -p 8529:8529 \
  -e ARANGO_ROOT_PASSWORD=test-password \
  arangodb:3.12 arangod --vector-index

The examples on this page use Sentence Transformers embedders that have moved to the sentence-transformers-haystack package. Install it to run the examples:

shell

pip install sentence-transformers-haystack

Usage

On its own

python

from haystack import Document
from haystack_integrations.document_stores.arangodb import ArangoDocumentStore
from haystack_integrations.components.retrievers.arangodb import (
    ArangoEmbeddingRetriever,
)

document_store = ArangoDocumentStore(
    host="http://localhost:8529",
    embedding_dimension=3,
    recreate_collection=True,
)
document_store.write_documents(
    [
        Document(
            content="There are over 7,000 languages spoken around the world today.",
            embedding=[0.1, 0.2, 0.3],
        ),
        Document(
            content="Elephants have been observed to recognize themselves in mirrors.",
            embedding=[0.8, 0.1, 0.5],
        ),
    ],
)

retriever = ArangoEmbeddingRetriever(document_store=document_store, top_k=1)
result = retriever.run(query_embedding=[0.1, 0.2, 0.3])
print(result["documents"][0].content)

In a pipeline

python

from haystack import Document, Pipeline
from haystack.document_stores.types import DuplicatePolicy
from haystack_integrations.components.embedders.sentence_transformers import (
    SentenceTransformersDocumentEmbedder,
    SentenceTransformersTextEmbedder,
)
from haystack_integrations.document_stores.arangodb import ArangoDocumentStore
from haystack_integrations.components.retrievers.arangodb import (
    ArangoEmbeddingRetriever,
)

document_store = ArangoDocumentStore(
    host="http://localhost:8529",
    embedding_dimension=384,
    recreate_collection=True,
)

documents = [
    Document(content="There are over 7,000 languages spoken around the world today."),
    Document(
        content="Elephants have been observed to recognize themselves in mirrors.",
    ),
    Document(
        content="Bioluminescent waves can be seen in the Maldives and Puerto Rico.",
    ),
]

document_embedder = SentenceTransformersDocumentEmbedder(
    model="sentence-transformers/all-MiniLM-L6-v2",
)
documents_with_embeddings = document_embedder.run(documents)

document_store.write_documents(
    documents_with_embeddings["documents"],
    policy=DuplicatePolicy.OVERWRITE,
)

query_pipeline = Pipeline()
query_pipeline.add_component(
    "text_embedder",
    SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2"),
)
query_pipeline.add_component(
    "retriever",
    ArangoEmbeddingRetriever(document_store=document_store, top_k=3),
)
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

result = query_pipeline.run(
    {"text_embedder": {"text": "How many languages are there?"}},
)
print(result["retriever"]["documents"][0].content)

Overview​

Installation​

Usage​

On its own​

In a pipeline​

Overview

Installation

Usage

On its own

In a pipeline