Arangodb
haystack_integrations.components.retrievers.arangodb.embedding_retriever
ArangoEmbeddingRetriever
Retrieves documents from an ArangoDocumentStore using vector similarity on embeddings.
The similarity function is configured on the ArangoDocumentStore (cosine, dot product, or L2).
Example usage:
from haystack_integrations.document_stores.arangodb import ArangoDocumentStore
from haystack_integrations.components.retrievers.arangodb import ArangoEmbeddingRetriever
store = ArangoDocumentStore(host="http://localhost:8529", database="haystack",
username="root", collection_name="docs", embedding_dimension=768)
retriever = ArangoEmbeddingRetriever(document_store=store, top_k=5)
result = retriever.run(query_embedding=[0.1, 0.2, ...])
init
__init__(
*,
document_store: ArangoDocumentStore,
top_k: int = 10,
filters: dict[str, Any] | None = None
) -> None
Creates a new ArangoEmbeddingRetriever.
Parameters:
- document_store (
ArangoDocumentStore) – TheArangoDocumentStoreto retrieve documents from. - top_k (
int) – Maximum number of documents to return. - filters (
dict[str, Any] | None) – Optional Haystack metadata filters applied at retrieval time.
run
run(
query_embedding: list[float],
top_k: int | None = None,
filters: dict[str, Any] | None = None,
) -> dict[str, list[Document]]
Retrieves documents most similar to query_embedding.
Parameters:
- query_embedding (
list[float]) – The query vector. - top_k (
int | None) – Overrides the instance-leveltop_kfor this call. - filters (
dict[str, Any] | None) – Overrides the instance-levelfiltersfor this call.
Returns:
dict[str, list[Document]]– A dictionary withdocuments— a list ofDocumentobjects sorted by score.
to_dict
Serializes the component to a dictionary.
Returns:
dict[str, Any]– Dictionary with serialized data.
from_dict
Deserializes the component from a dictionary.
Parameters:
- data (
dict[str, Any]) – Dictionary to deserialize from.
Returns:
ArangoEmbeddingRetriever– Deserialized component.
haystack_integrations.document_stores.arangodb.document_store
ArangoDocumentStore
A Haystack DocumentStore backed by ArangoDB.
Documents are stored in an ArangoDB collection and support vector similarity search via AQL vector functions (requires ArangoDB 3.12+).
Example usage:
from haystack_integrations.document_stores.arangodb import ArangoDocumentStore
from haystack.utils import Secret
store = ArangoDocumentStore(
host="http://localhost:8529",
database="haystack",
username=Secret.from_env_var("ARANGO_USERNAME", strict=False),
password=Secret.from_env_var("ARANGO_PASSWORD"),
collection_name="documents",
embedding_dimension=768,
)
init
__init__(
*,
host: str = "http://localhost:8529",
database: str = "haystack",
username: Secret = Secret.from_env_var("ARANGO_USERNAME", strict=False),
password: Secret = Secret.from_env_var("ARANGO_PASSWORD"),
collection_name: str = "haystack_documents",
embedding_dimension: int = 768,
recreate_collection: bool = False,
similarity_function: Literal["cosine", "dot_product", "l2"] = "cosine"
) -> None
Creates a new ArangoDocumentStore instance.
Parameters:
- host (
str) – ArangoDB server URL, e.g.http://localhost:8529. - database (
str) – Name of the ArangoDB database to use. Created if it does not exist. - username (
Secret) – ArangoDB username as aSecret. Defaults toARANGO_USERNAMEenv var, falling back torootif the variable is not set. - password (
Secret) – ArangoDB password as aSecret. Defaults toARANGO_PASSWORDenv var. - collection_name (
str) – Name of the collection to store documents in. - embedding_dimension (
int) – Dimensionality of document embeddings. - recreate_collection (
bool) – IfTrue, drop and recreate the collection on startup. - similarity_function (
Literal['cosine', 'dot_product', 'l2']) – Vector similarity function to use for embedding retrieval. One of"cosine"(default),"dot_product", or"l2".
count_documents
Returns the number of documents in the store.
Returns:
int– Document count.
filter_documents
Returns documents matching the provided filters.
Parameters:
- filters (
dict[str, Any] | None) – Haystack metadata filters. IfNone, all documents are returned.
Returns:
list[Document]– List of matchingDocumentobjects.
write_documents
write_documents(
documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE
) -> int
Writes documents to the store.
Parameters:
- documents (
list[Document]) – Documents to write. - policy (
DuplicatePolicy) – How to handle duplicates —OVERWRITE,SKIP, orFAIL(default).
Returns:
int– Number of documents written.
Raises:
ValueError– Ifdocumentscontains non-Documentobjects.DuplicateDocumentError– If a duplicate is found and policy isFAIL.
delete_documents
Deletes documents by their IDs.
Parameters:
- document_ids (
list[str]) – List of document IDs to delete.
to_dict
Serializes the component to a dictionary.
Returns:
dict[str, Any]– Dictionary with serialized data.
from_dict
Deserializes the component from a dictionary.
Parameters:
- data (
dict[str, Any]) – Dictionary to deserialize from.
Returns:
ArangoDocumentStore– Deserialized component.