ArcadeDB
haystack_integrations.components.retrievers.arcadedb.embedding_retriever
ArcadeDBEmbeddingRetriever
Retrieve documents from ArcadeDB using vector similarity (LSM_VECTOR / HNSW index).
Usage example:
from haystack import Document
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack_integrations.components.retrievers.arcadedb import ArcadeDBEmbeddingRetriever
from haystack_integrations.document_stores.arcadedb import ArcadeDBDocumentStore
store = ArcadeDBDocumentStore(database="mydb")
retriever = ArcadeDBEmbeddingRetriever(document_store=store, top_k=5)
# Add documents to DocumentStore
documents = [
Document(text="My name is Carla and I live in Berlin"),
Document(text="My name is Paul and I live in New York"),
Document(text="My name is Silvano and I live in Matera"),
Document(text="My name is Usagi Tsukino and I live in Tokyo"),
]
document_store.write_documents(documents)
embedder = SentenceTransformersTextEmbedder()
query_embeddings = embedder.run("Who lives in Berlin?")["embedding"]
result = retriever.run(query=query_embeddings)
for doc in result["documents"]:
print(doc.content)
init
__init__(
*,
document_store: ArcadeDBDocumentStore,
filters: dict[str, Any] | None = None,
top_k: int = 10,
filter_policy: FilterPolicy = FilterPolicy.REPLACE
)
Create an ArcadeDBEmbeddingRetriever.
Parameters:
- document_store (
ArcadeDBDocumentStore) – An instance ofArcadeDBDocumentStore. - filters (
dict[str, Any] | None) – Default filters applied to every retrieval call. - top_k (
int) – Maximum number of documents to return. - filter_policy (
FilterPolicy) – How runtime filters interact with default filters.
run
run(
query_embedding: list[float],
filters: dict[str, Any] | None = None,
top_k: int | None = None,
) -> dict[str, list[Document]]
Retrieve documents by vector similarity.
Parameters:
- query_embedding (
list[float]) – The embedding vector to search with. - filters (
dict[str, Any] | None) – Optional filters to narrow results. - top_k (
int | None) – Maximum number of documents to return.
Returns:
dict[str, list[Document]]– A dictionary with the following keys:documents: List ofDocuments most similar to the givenquery_embedding
to_dict
Serializes the component to a dictionary.
Returns:
dict[str, Any]– Dictionary with serialized data.
from_dict
Deserializes the component from a dictionary.
Parameters:
- data (
dict[str, Any]) – Dictionary to deserialize from.
Returns:
ArcadeDBEmbeddingRetriever– Deserialized component.
haystack_integrations.document_stores.arcadedb.document_store
ArcadeDB DocumentStore for Haystack 2.x — document storage + vector search via HTTP/JSON API.
ArcadeDBDocumentStore
An ArcadeDB-backed DocumentStore for Haystack 2.x.
Uses ArcadeDB's HTTP/JSON API for all operations — no special drivers required. Supports HNSW vector search (LSM_VECTOR) and SQL metadata filtering.
Usage example:
from haystack.dataclasses.document import Document
from haystack_integrations.document_stores.arcadedb import ArcadeDBDocumentStore
document_store = ArcadeDBDocumentStore(
url="http://localhost:2480",
database="haystack",
embedding_dimension=768,
)
document_store.write_documents([
Document(content="This is first", embedding=[0.0]*5),
Document(content="This is second", embedding=[0.1, 0.2, 0.3, 0.4, 0.5])
])
init
__init__(
*,
url: str = "http://localhost:2480",
database: str = "haystack",
username: Secret = Secret.from_env_var("ARCADEDB_USERNAME", strict=False),
password: Secret = Secret.from_env_var("ARCADEDB_PASSWORD", strict=False),
type_name: str = "Document",
embedding_dimension: int = 768,
similarity_function: str = "cosine",
recreate_type: bool = False,
create_database: bool = True
)
Create an ArcadeDBDocumentStore instance.
Parameters:
- url (
str) – ArcadeDB HTTP endpoint. - database (
str) – Database name. - username (
Secret) – HTTP Basic Auth username (default:ARCADEDB_USERNAMEenv var). - password (
Secret) – HTTP Basic Auth password (default:ARCADEDB_PASSWORDenv var). - type_name (
str) – Vertex type name for documents. - embedding_dimension (
int) – Vector dimension for the HNSW index. - similarity_function (
str) – Distance metric —"cosine","euclidean", or"dot". - recreate_type (
bool) – IfTrue, drop and recreate the type on initialization. - create_database (
bool) – IfTrue, create the database if it doesn't exist.
to_dict
Serializes the DocumentStore to a dictionary.
Returns:
dict[str, Any]– Dictionary with serialized data.
from_dict
Deserializes the DocumentStore from a dictionary.
Parameters:
- data (
dict[str, Any]) – The dictionary to deserialize from.
Returns:
ArcadeDBDocumentStore– The deserialized DocumentStore.
count_documents
Returns how many documents are present in the document store.
Returns:
int– Number of documents in the document store.
filter_documents
Return documents matching the given filters.
Parameters:
- filters (
dict[str, Any] | None) – Haystack filter dictionary.
Returns:
list[Document]– List of matching documents.
write_documents
write_documents(
documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE
) -> int
Write documents to the store.
Parameters:
- documents (
list[Document]) – List of Haystack Documents to write. - policy (
DuplicatePolicy) – How to handle duplicate document IDs.
Returns:
int– Number of documents written.
delete_documents
Delete documents by their IDs.
Parameters:
- document_ids (
list[str]) – List of document IDs to delete.