API reference	Astra
GitHub link	https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/astra

DataStax Astra DB is a serverless vector database built on Apache Cassandra, and it supports vector-based search and auto-scaling. You can deploy it on AWS, GCP, or Azure and easily expand to one or more regions within those clouds for multi-region availability, low latency data access, data sovereignty, and to avoid cloud vendor lock-in. For more information, see the DataStax documentation.

Initialization

Once you have an AstraDB account and have created a database, install the astra-haystack integration:

pip install astra-haystack

From the configuration in AstraDB’s web UI, you need the database ID and a generated token.

You will additionally need a collection name and a namespace. When you create the collection name, you also need to set the embedding dimensions and the similarity metric. The namespace organizes data in a database and is called a keyspace in Apache Cassandra.

Then, in Haystack, initialize an AstraDocumentStore object that’s connected to the AstraDB instance, and write documents to it.

We strongly encourage passing authentication data through environment variables: make sure to populate the environment variables ASTRA_DB_API_ENDPOINT and ASTRA_DB_APPLICATION_TOKEN before running the following example.

from haystack import Document
from haystack_integrations.document_stores.astra import AstraDocumentStore

document_store = AstraDocumentStore()

document_store.write_documents([
    Document(content="This is first"),
    Document(content="This is second")
    ])
print(document_store.count_documents())

Supported Retrievers

AstraEmbeddingRetriever: An embedding-based Retriever that fetches documents from the Document Store based on a query embedding provided to the Retriever.

Indexing Warnings

When you create an Astra DB Document Store, you might see one of these warnings:

Astra DB collection ... is detected as having indexing turned on for all fields (either created manually or by older versions of this plugin). This implies stricter limitations on the amount of text each string in a document can store. Consider indexing anew on a fresh collection to be able to store longer texts.

Or:

Astra DB collection ... is detected as having the following indexing policy: {...}. This does not match the requested indexing policy for this object: {...}. In particular, there may be stricter limitations on the amount of text each string in a document can store. Consider indexing anew on a fresh collection to be able to store longer texts.

Why You See This Warning

The collection already exists and is configured to index all fields for search, possibly because you created it earlier or an older plugin did. When Haystack tries to create the collection, it applies an indexing policy optimized for your intended use. This policy lets you store longer texts and avoids indexing fields you won’t filter on, which also reduces write overhead.

Common Causes

You created the collection outside Haystack (for example, in the Astra UI or with AstraPy’s Database.create_collection()).
You created the collection with an older version of the plugin.

Impact

This is only a warning. Your application keeps running unless you try to store very long text fields. If you do, Astra DB returns an indexing error.

Solutions

Recommended: Drop and recreate the collection if you can repopulate it. Then rerun your Haystack application so it creates the collection with the optimized indexing policy.
Ignore the warning if you’re sure you won’t store very long text fields.

Additional References

🧑‍🍳 Cookbook: Using AstraDB as a data store in your Haystack pipelines