DocumentationAPI ReferenceπŸ““ TutorialsπŸ§‘β€πŸ³ Cookbook🀝 IntegrationsπŸ’œ Discord

Astra integration for Haystack

Module haystack_integrations.components.retrievers.astra.retriever

AstraEmbeddingRetriever

A component for retrieving documents from an AstraDocumentStore.

Usage example:

from haystack_integrations.document_stores.astra import AstraDocumentStore
from haystack_integrations.components.retrievers.astra import AstraEmbeddingRetriever

document_store = AstraDocumentStore(
    api_endpoint=api_endpoint,
    token=token,
    collection_name=collection_name,
    duplicates_policy=DuplicatePolicy.SKIP,
    embedding_dim=384,
)

retriever = AstraEmbeddingRetriever(document_store=document_store)

AstraEmbeddingRetriever.__init__

def __init__(document_store: AstraDocumentStore,
             filters: Optional[Dict[str, Any]] = None,
             top_k: int = 10)

Arguments:

  • filters: a dictionary with filters to narrow down the search space.
  • top_k: the maximum number of documents to retrieve.

AstraEmbeddingRetriever.run

@component.output_types(documents=List[Document])
def run(query_embedding: List[float],
        filters: Optional[Dict[str, Any]] = None,
        top_k: Optional[int] = None)

Retrieve documents from the AstraDocumentStore.

Arguments:

  • query_embedding: floats representing the query embedding
  • filters: filters to narrow down the search space.
  • top_k: the maximum number of documents to retrieve.

Returns:

a dictionary with the following keys:

  • documents: A list of documents retrieved from the AstraDocumentStore.

AstraEmbeddingRetriever.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

AstraEmbeddingRetriever.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "AstraEmbeddingRetriever"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

Module haystack_integrations.document_stores.astra.document_store

AstraDocumentStore

An AstraDocumentStore document store for Haystack.

Example Usage:

from haystack_integrations.document_stores.astra import AstraDocumentStore

document_store = AstraDocumentStore(
    api_endpoint=api_endpoint,
    token=token,
    collection_name=collection_name,
    duplicates_policy=DuplicatePolicy.SKIP,
    embedding_dim=384,
)

AstraDocumentStore.__init__

def __init__(
        api_endpoint: Secret = Secret.from_env_var("ASTRA_DB_API_ENDPOINT"),
        token: Secret = Secret.from_env_var("ASTRA_DB_APPLICATION_TOKEN"),
        collection_name: str = "documents",
        embedding_dimension: int = 768,
        duplicates_policy: DuplicatePolicy = DuplicatePolicy.NONE,
        similarity: str = "cosine",
        namespace: Optional[str] = None)

The connection to Astra DB is established and managed through the JSON API.

The required credentials (api endpoint and application token) can be generated through the UI by clicking and the connect tab, and then selecting JSON API and Generate Configuration.

Arguments:

  • api_endpoint: the Astra DB API endpoint.
  • token: the Astra DB application token.
  • collection_name: the current collection in the keyspace in the current Astra DB.
  • embedding_dimension: dimension of embedding vector.
  • duplicates_policy: handle duplicate documents based on DuplicatePolicy parameter options. Parameter options : (SKIP, OVERWRITE, FAIL, NONE)
  • DuplicatePolicy.NONE: Default policy, If a Document with the same ID already exists, it is skipped and not written.
  • DuplicatePolicy.SKIP: if a Document with the same ID already exists, it is skipped and not written.
  • DuplicatePolicy.OVERWRITE: if a Document with the same ID already exists, it is overwritten.
  • DuplicatePolicy.FAIL: if a Document with the same ID already exists, an error is raised.
  • similarity: the similarity function used to compare document vectors.

Raises:

  • ValueError: if the API endpoint or token is not set.

AstraDocumentStore.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "AstraDocumentStore"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

AstraDocumentStore.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

AstraDocumentStore.write_documents

def write_documents(documents: List[Document],
                    policy: DuplicatePolicy = DuplicatePolicy.NONE)

Indexes documents for later queries.

Arguments:

  • documents: a list of Haystack Document objects.
  • policy: handle duplicate documents based on DuplicatePolicy parameter options. Parameter options : (SKIP, OVERWRITE, FAIL, NONE)
  • DuplicatePolicy.NONE: Default policy, If a Document with the same ID already exists, it is skipped and not written.
  • DuplicatePolicy.SKIP: If a Document with the same ID already exists, it is skipped and not written.
  • DuplicatePolicy.OVERWRITE: If a Document with the same ID already exists, it is overwritten.
  • DuplicatePolicy.FAIL: If a Document with the same ID already exists, an error is raised.

Raises:

  • ValueError: if the documents are not of type Document or dict.
  • DuplicateDocumentError: if a document with the same ID already exists and policy is set to FAIL.
  • Exception: if the document ID is not a string or if id and _id are both present in the document.

Returns:

number of documents written.

AstraDocumentStore.count_documents

def count_documents() -> int

Counts the number of documents in the document store.

Returns:

the number of documents in the document store.

AstraDocumentStore.filter_documents

def filter_documents(
        filters: Optional[Dict[str, Any]] = None) -> List[Document]

Returns at most 1000 documents that match the filter.

Arguments:

  • filters: filters to apply.

Raises:

  • AstraDocumentStoreFilterError: if the filter is invalid or not supported by this class.

Returns:

matching documents.

AstraDocumentStore.get_documents_by_id

def get_documents_by_id(ids: List[str]) -> List[Document]

Gets documents by their IDs.

Arguments:

  • ids: the IDs of the documents to retrieve.

Returns:

the matching documents.

AstraDocumentStore.get_document_by_id

def get_document_by_id(document_id: str) -> Document

Gets a document by its ID.

Arguments:

  • document_id: the ID to filter by

Raises:

  • MissingDocumentError: if the document is not found

Returns:

the found document

AstraDocumentStore.search

def search(query_embedding: List[float],
           top_k: int,
           filters: Optional[Dict[str, Any]] = None) -> List[Document]

Perform a search for a list of queries.

Arguments:

  • query_embedding: a list of query embeddings.
  • top_k: the number of results to return.
  • filters: filters to apply during search.

Returns:

matching documents.

AstraDocumentStore.delete_documents

def delete_documents(document_ids: Optional[List[str]] = None,
                     delete_all: Optional[bool] = None) -> None

Deletes documents from the document store.

Arguments:

  • document_ids: IDs of the documents to delete.
  • delete_all: if True, delete all documents.

Raises:

  • MissingDocumentError: if no document was deleted but document IDs were provided.

Module haystack_integrations.document_stores.astra.errors

AstraDocumentStoreError

Parent class for all AstraDocumentStore errors.

AstraDocumentStoreFilterError

Raised when an invalid filter is passed to AstraDocumentStore.

AstraDocumentStoreConfigError

Raised when an invalid configuration is passed to AstraDocumentStore.