Version: 2.31-unstable

ArcadeDBEmbeddingRetriever

An embedding-based Retriever compatible with the ArcadeDB Document Store. It uses ArcadeDB's LSM_VECTOR (HNSW) index for vector similarity search.


Most common position in a pipeline	1. After a Text Embedder and before a ChatPromptBuilder in a RAG pipeline 2. The last component in a semantic search pipeline
Mandatory init variables	`document_store`: An instance of ArcadeDBDocumentStore
Mandatory run variables	`query_embedding`: A vector representing the query (a list of floats)
Output variables	`documents`: A list of documents
API reference	ArcadeDB
GitHub link	https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/arcadedb
Package name	`arcadedb-haystack`

Overview

The ArcadeDBEmbeddingRetriever retrieves documents from ArcadeDBDocumentStore by comparing the query embedding with document embeddings using the store's HNSW index. It accepts optional filters for metadata filtering and top_k to limit the number of results. Use a Document Embedder in your indexing pipeline and a Text Embedder in your query pipeline so embeddings are available.

Installation

shell

pip install arcadedb-haystack

Ensure ArcadeDB is running, for example via Docker, and credentials are set (ARCADEDB_USERNAME, ARCADEDB_PASSWORD).

Usage

On its own

python

from haystack_integrations.document_stores.arcadedb import ArcadeDBDocumentStore
from haystack_integrations.components.retrievers.arcadedb import (
    ArcadeDBEmbeddingRetriever,
)

document_store = ArcadeDBDocumentStore(
    url="http://localhost:2480",
    database="haystack",
    embedding_dimension=768,
)
retriever = ArcadeDBEmbeddingRetriever(document_store=document_store, top_k=5)

# Example: run with a query embedding (e.g. from an embedder)
result = retriever.run(query_embedding=[0.1] * 768)
for doc in result["documents"]:
    print(doc.content)

In a pipeline

python

from haystack import Document, Pipeline
from haystack.document_stores.types import DuplicatePolicy
from haystack.components.embedders import (
    SentenceTransformersTextEmbedder,
    SentenceTransformersDocumentEmbedder,
)
from haystack_integrations.document_stores.arcadedb import ArcadeDBDocumentStore
from haystack_integrations.components.retrievers.arcadedb import (
    ArcadeDBEmbeddingRetriever,
)

document_store = ArcadeDBDocumentStore(
    url="http://localhost:2480",
    database="haystack",
    embedding_dimension=768,
    recreate_type=True,
)

documents = [
    Document(content="There are over 7,000 languages spoken around the world today."),
    Document(
        content="Elephants have been observed to recognize themselves in mirrors.",
    ),
    Document(
        content="Bioluminescent waves can be seen in the Maldives and Puerto Rico.",
    ),
]

document_embedder = SentenceTransformersDocumentEmbedder()
documents_with_embeddings = document_embedder.run(documents)
document_store.write_documents(
    documents_with_embeddings["documents"],
    policy=DuplicatePolicy.OVERWRITE,
)

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
query_pipeline.add_component(
    "retriever",
    ArcadeDBEmbeddingRetriever(document_store=document_store, top_k=3),
)
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

result = query_pipeline.run(
    {"text_embedder": {"text": "How many languages are there?"}},
)
print(result["retriever"]["documents"][0])

Overview​

Installation​

Usage​

On its own​

In a pipeline​

Overview

Installation

Usage

On its own

In a pipeline