DocumentationAPI ReferenceπŸ““ TutorialsπŸ§‘β€πŸ³ Cookbook🀝 IntegrationsπŸ’œ Discord

Weaviate integration for Haystack

Module haystack_integrations.document_stores.weaviate.auth

SupportedAuthTypes

Supported auth credentials for WeaviateDocumentStore.

AuthCredentials

Base class for all auth credentials supported by WeaviateDocumentStore. Can be used to deserialize from dict any of the supported auth credentials.

AuthCredentials.to_dict

def to_dict() -> Dict[str, Any]

Converts the object to a dictionary representation for serialization.

AuthCredentials.from_dict

@staticmethod
def from_dict(data: Dict[str, Any]) -> "AuthCredentials"

Converts a dictionary representation to an auth credentials object.

AuthCredentials.resolve_value

@abstractmethod
def resolve_value()

Resolves all the secrets in the auth credentials object and returns the corresponding Weaviate object. All subclasses must implement this method.

AuthApiKey

AuthCredentials for API key authentication. By default it will load api_key from the environment variable WEAVIATE_API_KEY.

AuthBearerToken

AuthCredentials for Bearer token authentication. By default it will load access_token from the environment variable WEAVIATE_ACCESS_TOKEN, and refresh_token from the environment variable WEAVIATE_REFRESH_TOKEN. WEAVIATE_REFRESH_TOKEN environment variable is optional.

AuthClientCredentials

AuthCredentials for client credentials authentication. By default it will load client_secret from the environment variable WEAVIATE_CLIENT_SECRET, and scope from the environment variable WEAVIATE_SCOPE. WEAVIATE_SCOPE environment variable is optional, if set it can either be a string or a list of space separated strings. e.g "scope1" or "scope1 scope2".

AuthClientPassword

AuthCredentials for username and password authentication. By default it will load username from the environment variable WEAVIATE_USERNAME, password from the environment variable WEAVIATE_PASSWORD, and scope from the environment variable WEAVIATE_SCOPE. WEAVIATE_SCOPE environment variable is optional, if set it can either be a string or a list of space separated strings. e.g "scope1" or "scope1 scope2".

Module haystack_integrations.document_stores.weaviate.document_store

WeaviateDocumentStore

WeaviateDocumentStore is a Document Store for Weaviate. It can be used with Weaviate Cloud Services or self-hosted instances.

Usage example with Weaviate Cloud Services:

import os
from haystack_integrations.document_stores.weaviate.auth import AuthApiKey
from haystack_integrations.document_stores.weaviate.document_store import WeaviateDocumentStore

os.environ["WEAVIATE_API_KEY"] = "MY_API_KEY

document_store = WeaviateDocumentStore(
    url="rAnD0mD1g1t5.something.weaviate.cloud",
    auth_client_secret=AuthApiKey(),
)

Usage example with self-hosted Weaviate:

from haystack_integrations.document_stores.weaviate.document_store import WeaviateDocumentStore

document_store = WeaviateDocumentStore(url="http://localhost:8080")

WeaviateDocumentStore.__init__

def __init__(*,
             url: Optional[str] = None,
             collection_settings: Optional[Dict[str, Any]] = None,
             auth_client_secret: Optional[AuthCredentials] = None,
             additional_headers: Optional[Dict] = None,
             embedded_options: Optional[EmbeddedOptions] = None,
             additional_config: Optional[AdditionalConfig] = None,
             grpc_port: int = 50051,
             grpc_secure: bool = False)

Create a new instance of WeaviateDocumentStore and connects to the Weaviate instance.

Arguments:

  • url: The URL to the weaviate instance.
  • collection_settings: The collection settings to use. If None, it will use a collection named default with the following properties:
  • _original_id: text
  • content: text
  • dataframe: text
  • blob_data: blob
  • blob_mime_type: text
  • score: number The Document meta fields are omitted in the default collection settings as we can't make assumptions on the structure of the meta field. We heavily recommend to create a custom collection with the correct meta properties for your use case. Another option is relying on the automatic schema generation, but that's not recommended for production use. See the official Weaviate documentation<https://weaviate.io/developers/weaviate/manage-data/collections>_ for more information on collections and their properties.
  • auth_client_secret: Authentication credentials. Can be one of the following types depending on the authentication mode:
  • AuthBearerToken to use existing access and (optionally, but recommended) refresh tokens
  • AuthClientPassword to use username and password for oidc Resource Owner Password flow
  • AuthClientCredentials to use a client secret for oidc client credential flow
  • AuthApiKey to use an API key
  • additional_headers: Additional headers to include in the requests. Can be used to set OpenAI/HuggingFace keys. OpenAI/HuggingFace key looks like this:
{"X-OpenAI-Api-Key": "<THE-KEY>"}, {"X-HuggingFace-Api-Key": "<THE-KEY>"}
  • embedded_options: If set, create an embedded Weaviate cluster inside the client. For a full list of options see weaviate.embedded.EmbeddedOptions.
  • additional_config: Additional and advanced configuration options for weaviate.
  • grpc_port: The port to use for the gRPC connection.
  • grpc_secure: Whether to use a secure channel for the underlying gRPC API.

WeaviateDocumentStore.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

WeaviateDocumentStore.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "WeaviateDocumentStore"

Deserializes the component from a dictionary.

Arguments:

  • data: The dictionary to deserialize from.

Returns:

The deserialized component.

WeaviateDocumentStore.count_documents

def count_documents() -> int

Returns the number of documents present in the DocumentStore.

WeaviateDocumentStore.filter_documents

def filter_documents(
        filters: Optional[Dict[str, Any]] = None) -> List[Document]

Returns the documents that match the filters provided.

For a detailed specification of the filters, refer to the DocumentStore.filter_documents() protocol documentation.

Arguments:

  • filters: The filters to apply to the document list.

Returns:

A list of Documents that match the given filters.

WeaviateDocumentStore.write_documents

def write_documents(documents: List[Document],
                    policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int

Writes documents to Weaviate using the specified policy. We recommend using a OVERWRITE policy as it's faster than other policies for Weaviate since it uses the batch API. We can't use the batch API for other policies as it doesn't return any information whether the document already exists or not. That prevents us from returning errors when using the FAIL policy or skipping a Document when using the SKIP policy.

WeaviateDocumentStore.delete_documents

def delete_documents(document_ids: List[str]) -> None

Deletes all documents with matching document_ids from the DocumentStore.

Arguments:

  • document_ids: The object_ids to delete.

Module haystack_integrations.components.retrievers.weaviate.bm25_retriever

WeaviateBM25Retriever

A component for retrieving documents from Weaviate using the BM25 algorithm.

Example usage:

from haystack_integrations.document_stores.weaviate.document_store import WeaviateDocumentStore
from haystack_integrations.components.retrievers.weaviate.bm25_retriever import WeaviateBM25Retriever

document_store = WeaviateDocumentStore(url="http://localhost:8080")
retriever = WeaviateBM25Retriever(document_store=document_store)
retriever.run(query="How to make a pizza", top_k=3)

WeaviateBM25Retriever.__init__

def __init__(*,
             document_store: WeaviateDocumentStore,
             filters: Optional[Dict[str, Any]] = None,
             top_k: int = 10,
             filter_policy: Union[str, FilterPolicy] = FilterPolicy.REPLACE)

Create a new instance of WeaviateBM25Retriever.

Arguments:

  • document_store: Instance of WeaviateDocumentStore that will be used from this retriever.
  • filters: Custom filters applied when running the retriever
  • top_k: Maximum number of documents to return
  • filter_policy: Policy to determine how filters are applied.

WeaviateBM25Retriever.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

WeaviateBM25Retriever.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "WeaviateBM25Retriever"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

WeaviateBM25Retriever.run

@component.output_types(documents=List[Document])
def run(query: str,
        filters: Optional[Dict[str, Any]] = None,
        top_k: Optional[int] = None)

Retrieves documents from Weaviate using the BM25 algorithm.

Arguments:

  • query: The query text.
  • filters: Filters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy chosen at retriever initialization. See init method docstring for more details.
  • top_k: The maximum number of documents to return.

Module haystack_integrations.components.retrievers.weaviate.embedding_retriever

WeaviateEmbeddingRetriever

A retriever that uses Weaviate's vector search to find similar documents based on the embeddings of the query.

WeaviateEmbeddingRetriever.__init__

def __init__(*,
             document_store: WeaviateDocumentStore,
             filters: Optional[Dict[str, Any]] = None,
             top_k: int = 10,
             distance: Optional[float] = None,
             certainty: Optional[float] = None,
             filter_policy: Union[str, FilterPolicy] = FilterPolicy.REPLACE)

Creates a new instance of WeaviateEmbeddingRetriever.

Arguments:

  • document_store: Instance of WeaviateDocumentStore that will be used from this retriever.
  • filters: Custom filters applied when running the retriever.
  • top_k: Maximum number of documents to return.
  • distance: The maximum allowed distance between Documents' embeddings.
  • certainty: Normalized distance between the result item and the search vector.
  • filter_policy: Policy to determine how filters are applied.

Raises:

WeaviateEmbeddingRetriever.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

WeaviateEmbeddingRetriever.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "WeaviateEmbeddingRetriever"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

WeaviateEmbeddingRetriever.run

@component.output_types(documents=List[Document])
def run(query_embedding: List[float],
        filters: Optional[Dict[str, Any]] = None,
        top_k: Optional[int] = None,
        distance: Optional[float] = None,
        certainty: Optional[float] = None)

Retrieves documents from Weaviate using the vector search.

Arguments:

  • query_embedding: Embedding of the query.
  • filters: Filters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy chosen at retriever initialization. See init method docstring for more details.
  • top_k: The maximum number of documents to return.
  • distance: The maximum allowed distance between Documents' embeddings.
  • certainty: Normalized distance between the result item and the search vector.

Raises: