Skip to main content
Version: 2.27

Arangodb

haystack_integrations.components.retrievers.arangodb.embedding_retriever

ArangoEmbeddingRetriever

Retrieves documents from an ArangoDocumentStore using vector similarity on embeddings.

The similarity function is configured on the ArangoDocumentStore (cosine, dot product, or L2).

Example usage:

python
from haystack_integrations.document_stores.arangodb import ArangoDocumentStore
from haystack_integrations.components.retrievers.arangodb import ArangoEmbeddingRetriever

store = ArangoDocumentStore(host="http://localhost:8529", database="haystack",
username="root", collection_name="docs", embedding_dimension=768)
retriever = ArangoEmbeddingRetriever(document_store=store, top_k=5)
result = retriever.run(query_embedding=[0.1, 0.2, ...])

init

python
__init__(
*,
document_store: ArangoDocumentStore,
top_k: int = 10,
filters: dict[str, Any] | None = None
) -> None

Creates a new ArangoEmbeddingRetriever.

Parameters:

  • document_store (ArangoDocumentStore) – The ArangoDocumentStore to retrieve documents from.
  • top_k (int) – Maximum number of documents to return.
  • filters (dict[str, Any] | None) – Optional Haystack metadata filters applied at retrieval time.

run

python
run(
query_embedding: list[float],
top_k: int | None = None,
filters: dict[str, Any] | None = None,
) -> dict[str, list[Document]]

Retrieves documents most similar to query_embedding.

Parameters:

  • query_embedding (list[float]) – The query vector.
  • top_k (int | None) – Overrides the instance-level top_k for this call.
  • filters (dict[str, Any] | None) – Overrides the instance-level filters for this call.

Returns:

  • dict[str, list[Document]] – A dictionary with documents — a list of Document objects sorted by score.

to_dict

python
to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

  • dict[str, Any] – Dictionary with serialized data.

from_dict

python
from_dict(data: dict[str, Any]) -> ArangoEmbeddingRetriever

Deserializes the component from a dictionary.

Parameters:

  • data (dict[str, Any]) – Dictionary to deserialize from.

Returns:

  • ArangoEmbeddingRetriever – Deserialized component.

haystack_integrations.document_stores.arangodb.document_store

ArangoDocumentStore

A Haystack DocumentStore backed by ArangoDB.

Documents are stored in an ArangoDB collection and support vector similarity search via AQL vector functions (requires ArangoDB 3.12+).

Example usage:

python
from haystack_integrations.document_stores.arangodb import ArangoDocumentStore
from haystack.utils import Secret

store = ArangoDocumentStore(
host="http://localhost:8529",
database="haystack",
username=Secret.from_env_var("ARANGO_USERNAME", strict=False),
password=Secret.from_env_var("ARANGO_PASSWORD"),
collection_name="documents",
embedding_dimension=768,
)

init

python
__init__(
*,
host: str = "http://localhost:8529",
database: str = "haystack",
username: Secret = Secret.from_env_var("ARANGO_USERNAME", strict=False),
password: Secret = Secret.from_env_var("ARANGO_PASSWORD"),
collection_name: str = "haystack_documents",
embedding_dimension: int = 768,
recreate_collection: bool = False,
similarity_function: Literal["cosine", "dot_product", "l2"] = "cosine"
) -> None

Creates a new ArangoDocumentStore instance.

Parameters:

  • host (str) – ArangoDB server URL, e.g. http://localhost:8529.
  • database (str) – Name of the ArangoDB database to use. Created if it does not exist.
  • username (Secret) – ArangoDB username as a Secret. Defaults to ARANGO_USERNAME env var, falling back to root if the variable is not set.
  • password (Secret) – ArangoDB password as a Secret. Defaults to ARANGO_PASSWORD env var.
  • collection_name (str) – Name of the collection to store documents in.
  • embedding_dimension (int) – Dimensionality of document embeddings.
  • recreate_collection (bool) – If True, drop and recreate the collection on startup.
  • similarity_function (Literal['cosine', 'dot_product', 'l2']) – Vector similarity function to use for embedding retrieval. One of "cosine" (default), "dot_product", or "l2".

count_documents

python
count_documents() -> int

Returns the number of documents in the store.

Returns:

  • int – Document count.

filter_documents

python
filter_documents(filters: dict[str, Any] | None = None) -> list[Document]

Returns documents matching the provided filters.

Parameters:

  • filters (dict[str, Any] | None) – Haystack metadata filters. If None, all documents are returned.

Returns:

  • list[Document] – List of matching Document objects.

write_documents

python
write_documents(
documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE
) -> int

Writes documents to the store.

Parameters:

  • documents (list[Document]) – Documents to write.
  • policy (DuplicatePolicy) – How to handle duplicates — OVERWRITE, SKIP, or FAIL (default).

Returns:

  • int – Number of documents written.

Raises:

  • ValueError – If documents contains non-Document objects.
  • DuplicateDocumentError – If a duplicate is found and policy is FAIL.

delete_documents

python
delete_documents(document_ids: list[str]) -> None

Deletes documents by their IDs.

Parameters:

  • document_ids (list[str]) – List of document IDs to delete.

to_dict

python
to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

  • dict[str, Any] – Dictionary with serialized data.

from_dict

python
from_dict(data: dict[str, Any]) -> ArangoDocumentStore

Deserializes the component from a dictionary.

Parameters:

  • data (dict[str, Any]) – Dictionary to deserialize from.

Returns:

  • ArangoDocumentStore – Deserialized component.