ArcadeDB
haystack_integrations.components.retrievers.arcadedb.embedding_retriever
ArcadeDBEmbeddingRetriever
Retrieve documents from ArcadeDB using vector similarity (LSM_VECTOR / HNSW index).
Usage example:
from haystack import Document
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack_integrations.components.retrievers.arcadedb import ArcadeDBEmbeddingRetriever
from haystack_integrations.document_stores.arcadedb import ArcadeDBDocumentStore
store = ArcadeDBDocumentStore(database="mydb")
retriever = ArcadeDBEmbeddingRetriever(document_store=store, top_k=5)
# Add documents to DocumentStore
documents = [
Document(text="My name is Carla and I live in Berlin"),
Document(text="My name is Paul and I live in New York"),
Document(text="My name is Silvano and I live in Matera"),
Document(text="My name is Usagi Tsukino and I live in Tokyo"),
]
document_store.write_documents(documents)
embedder = SentenceTransformersTextEmbedder()
query_embeddings = embedder.run("Who lives in Berlin?")["embedding"]
result = retriever.run(query=query_embeddings)
for doc in result["documents"]:
print(doc.content)
init
__init__(
*,
document_store: ArcadeDBDocumentStore,
filters: dict[str, Any] | None = None,
top_k: int = 10,
filter_policy: FilterPolicy = FilterPolicy.REPLACE
) -> None
Create an ArcadeDBEmbeddingRetriever.
Parameters:
- document_store (
ArcadeDBDocumentStore) – An instance ofArcadeDBDocumentStore. - filters (
dict[str, Any] | None) – Default filters applied to every retrieval call. - top_k (
int) – Maximum number of documents to return. - filter_policy (
FilterPolicy) – How runtime filters interact with default filters.
run
run(
query_embedding: list[float],
filters: dict[str, Any] | None = None,
top_k: int | None = None,
) -> dict[str, list[Document]]
Retrieve documents by vector similarity.
Parameters:
- query_embedding (
list[float]) – The embedding vector to search with. - filters (
dict[str, Any] | None) – Optional filters to narrow results. - top_k (
int | None) – Maximum number of documents to return.
Returns:
dict[str, list[Document]]– A dictionary with the following keys:documents: List ofDocuments most similar to the givenquery_embedding
to_dict
Serializes the component to a dictionary.
Returns:
dict[str, Any]– Dictionary with serialized data.
from_dict
Deserializes the component from a dictionary.
Parameters:
- data (
dict[str, Any]) – Dictionary to deserialize from.
Returns:
ArcadeDBEmbeddingRetriever– Deserialized component.
haystack_integrations.document_stores.arcadedb.document_store
ArcadeDB DocumentStore for Haystack 2.x — document storage + vector search via HTTP/JSON API.
ArcadeDBDocumentStore
An ArcadeDB-backed DocumentStore for Haystack 2.x.
Uses ArcadeDB's HTTP/JSON API for all operations — no special drivers required. Supports HNSW vector search (LSM_VECTOR) and SQL metadata filtering.
Usage example:
from haystack.dataclasses.document import Document
from haystack_integrations.document_stores.arcadedb import ArcadeDBDocumentStore
document_store = ArcadeDBDocumentStore(
url="http://localhost:2480",
database="haystack",
embedding_dimension=768,
)
document_store.write_documents([
Document(content="This is first", embedding=[0.0]*5),
Document(content="This is second", embedding=[0.1, 0.2, 0.3, 0.4, 0.5])
])
init
__init__(
*,
url: str = "http://localhost:2480",
database: str = "haystack",
username: Secret = Secret.from_env_var("ARCADEDB_USERNAME", strict=False),
password: Secret = Secret.from_env_var("ARCADEDB_PASSWORD", strict=False),
type_name: str = "Document",
embedding_dimension: int = 768,
similarity_function: str = "cosine",
recreate_type: bool = False,
create_database: bool = True
) -> None
Create an ArcadeDBDocumentStore instance.
Parameters:
- url (
str) – ArcadeDB HTTP endpoint. - database (
str) – Database name. - username (
Secret) – HTTP Basic Auth username (default:ARCADEDB_USERNAMEenv var). - password (
Secret) – HTTP Basic Auth password (default:ARCADEDB_PASSWORDenv var). - type_name (
str) – Vertex type name for documents. - embedding_dimension (
int) – Vector dimension for the HNSW index. - similarity_function (
str) – Distance metric —"cosine","euclidean", or"dot". - recreate_type (
bool) – IfTrue, drop and recreate the type on initialization. - create_database (
bool) – IfTrue, create the database if it doesn't exist.
to_dict
Serializes the DocumentStore to a dictionary.
Returns:
dict[str, Any]– Dictionary with serialized data.
from_dict
Deserializes the DocumentStore from a dictionary.
Parameters:
- data (
dict[str, Any]) – The dictionary to deserialize from.
Returns:
ArcadeDBDocumentStore– The deserialized DocumentStore.
count_documents
Returns how many documents are present in the document store.
Returns:
int– Number of documents in the document store.
filter_documents
Return documents matching the given filters.
Parameters:
- filters (
dict[str, Any] | None) – Haystack filter dictionary.
Returns:
list[Document]– List of matching documents.
write_documents
write_documents(
documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE
) -> int
Write documents to the store.
Parameters:
- documents (
list[Document]) – List of Haystack Documents to write. - policy (
DuplicatePolicy) – How to handle duplicate document IDs.
Returns:
int– Number of documents written.
delete_documents
Delete documents by their IDs.
Parameters:
- document_ids (
list[str]) – List of document IDs to delete.
delete_all_documents
Deletes all documents in the document store.
delete_by_filter
Deletes all documents that match the provided filters.
Parameters:
- filters (
dict[str, Any]) – The filters to apply to select documents for deletion. For filter syntax, see Haystack metadata filtering
Returns:
int– The number of documents deleted.
update_by_filter
Updates the metadata of all documents that match the provided filters.
Parameters:
- filters (
dict[str, Any]) – The filters to apply to select documents for updating. For filter syntax, see Haystack metadata filtering - meta (
dict[str, Any]) – The metadata fields to update.
Returns:
int– The number of documents updated.
count_documents_by_filter
Counts the number of documents matching the provided filter
Parameters:
- filters (
dict[str, Any]) – The filters to apply to the documents
Returns:
int– The number of documents that match the filter
count_unique_metadata_by_filter
count_unique_metadata_by_filter(
filters: dict[str, Any], metadata_fields: list[str]
) -> dict[str, int]
Counts unique values for each metadata field in documents matching the provided filters.
Parameters:
- filters (
dict[str, Any]) – The filters to apply to the document list. - metadata_fields (
list[str]) – Metadata fields for which to count unique values.
Returns:
dict[str, int]– A dictionary where keys are metadata field names and values are the counts of unique values for that field.
get_metadata_fields_info
Returns the metadata fields and their corresponding types based on sampled documents.
Returns:
dict[str, dict[str, str]]– A dictionary mapping field names to dictionaries with atypekey.
get_metadata_field_min_max
For a given metadata field, finds its min and max values.
Parameters:
- metadata_field (
str) – The metadata field to inspect.
Returns:
dict[str, Any]– A dictionary withminandmaxkeys and their corresponding values.
get_metadata_field_unique_values
get_metadata_field_unique_values(
metadata_field: str,
search_term: str | None = None,
from_: int = 0,
size: int = 10,
) -> tuple[list[str], int]
Retrieves unique values for a field matching a search term or all possible values if no search term is given.
Parameters:
- metadata_field (
str) – The metadata field to inspect. - search_term (
str | None) – Optional case-insensitive substring search term. - from_ (
int) – The starting index for pagination. - size (
int) – The number of values to return.
Returns:
tuple[list[str], int]– A tuple containing the paginated values and the total count.