Weaviate integration for Haystack
Module haystack_integrations.document_stores.weaviate.auth
SupportedAuthTypes
Supported auth credentials for WeaviateDocumentStore.
AuthCredentials
Base class for all auth credentials supported by WeaviateDocumentStore. Can be used to deserialize from dict any of the supported auth credentials.
AuthCredentials.to_dict
def to_dict() -> Dict[str, Any]
Converts the object to a dictionary representation for serialization.
AuthCredentials.from_dict
def from_dict(data: Dict[str, Any]) -> "AuthCredentials"
Converts a dictionary representation to an auth credentials object.
AuthCredentials.resolve_value
def resolve_value()
Resolves all the secrets in the auth credentials object and returns the corresponding Weaviate object. All subclasses must implement this method.
AuthApiKey
AuthCredentials for API key authentication.
By default it will load api_key
from the environment variable WEAVIATE_API_KEY
.
AuthBearerToken
AuthCredentials for Bearer token authentication.
By default it will load access_token
from the environment variable WEAVIATE_ACCESS_TOKEN
,
and refresh_token
from the environment variable
WEAVIATE_REFRESH_TOKEN
.
WEAVIATE_REFRESH_TOKEN
environment variable is optional.
AuthClientCredentials
AuthCredentials for client credentials authentication.
By default it will load client_secret
from the environment variable WEAVIATE_CLIENT_SECRET
, and
scope
from the environment variable WEAVIATE_SCOPE
.
WEAVIATE_SCOPE
environment variable is optional, if set it can either be a string or a list of space
separated strings. e.g "scope1" or "scope1 scope2".
AuthClientPassword
AuthCredentials for username and password authentication.
By default it will load username
from the environment variable WEAVIATE_USERNAME
,
password
from the environment variable WEAVIATE_PASSWORD
, and
scope
from the environment variable WEAVIATE_SCOPE
.
WEAVIATE_SCOPE
environment variable is optional, if set it can either be a string or a list of space
separated strings. e.g "scope1" or "scope1 scope2".
Module haystack_integrations.document_stores.weaviate.document_store
WeaviateDocumentStore
WeaviateDocumentStore is a Document Store for Weaviate. It can be used with Weaviate Cloud Services or self-hosted instances.
Usage example with Weaviate Cloud Services:
import os
from haystack_integrations.document_stores.weaviate.auth import AuthApiKey
from haystack_integrations.document_stores.weaviate.document_store import WeaviateDocumentStore
os.environ["WEAVIATE_API_KEY"] = "MY_API_KEY"
document_store = WeaviateDocumentStore(
url="rAnD0mD1g1t5.something.weaviate.cloud",
auth_client_secret=AuthApiKey(),
)
Usage example with self-hosted Weaviate:
from haystack_integrations.document_stores.weaviate.document_store import WeaviateDocumentStore
document_store = WeaviateDocumentStore(url="http://localhost:8080")
WeaviateDocumentStore.__init__
def __init__(*,
url: Optional[str] = None,
collection_settings: Optional[Dict[str, Any]] = None,
auth_client_secret: Optional[AuthCredentials] = None,
additional_headers: Optional[Dict] = None,
embedded_options: Optional[EmbeddedOptions] = None,
additional_config: Optional[AdditionalConfig] = None,
grpc_port: int = 50051,
grpc_secure: bool = False)
Create a new instance of WeaviateDocumentStore and connects to the Weaviate instance.
Arguments:
url
: The URL to the weaviate instance.collection_settings
: The collection settings to use. IfNone
, it will use a collection nameddefault
with the following properties:- _original_id: text
- content: text
- dataframe: text
- blob_data: blob
- blob_mime_type: text
- score: number
The Document
meta
fields are omitted in the default collection settings as we can't make assumptions on the structure of the meta field. We heavily recommend to create a custom collection with the correct meta properties for your use case. Another option is relying on the automatic schema generation, but that's not recommended for production use. See the officialWeaviate documentation<https://weaviate.io/developers/weaviate/manage-data/collections>
_ for more information on collections and their properties. auth_client_secret
: Authentication credentials. Can be one of the following types depending on the authentication mode:AuthBearerToken
to use existing access and (optionally, but recommended) refresh tokensAuthClientPassword
to use username and password for oidc Resource Owner Password flowAuthClientCredentials
to use a client secret for oidc client credential flowAuthApiKey
to use an API keyadditional_headers
: Additional headers to include in the requests. Can be used to set OpenAI/HuggingFace keys. OpenAI/HuggingFace key looks like this:
{"X-OpenAI-Api-Key": "<THE-KEY>"}, {"X-HuggingFace-Api-Key": "<THE-KEY>"}
embedded_options
: If set, create an embedded Weaviate cluster inside the client. For a full list of options seeweaviate.embedded.EmbeddedOptions
.additional_config
: Additional and advanced configuration options for weaviate.grpc_port
: The port to use for the gRPC connection.grpc_secure
: Whether to use a secure channel for the underlying gRPC API.
WeaviateDocumentStore.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
WeaviateDocumentStore.from_dict
def from_dict(cls, data: Dict[str, Any]) -> "WeaviateDocumentStore"
Deserializes the component from a dictionary.
Arguments:
data
: The dictionary to deserialize from.
Returns:
The deserialized component.
WeaviateDocumentStore.count_documents
def count_documents() -> int
Returns the number of documents present in the DocumentStore.
WeaviateDocumentStore.filter_documents
def filter_documents(
filters: Optional[Dict[str, Any]] = None) -> List[Document]
Returns the documents that match the filters provided.
For a detailed specification of the filters, refer to the DocumentStore.filter_documents() protocol documentation.
Arguments:
filters
: The filters to apply to the document list.
Returns:
A list of Documents that match the given filters.
WeaviateDocumentStore.write_documents
def write_documents(documents: List[Document],
policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int
Writes documents to Weaviate using the specified policy. We recommend using a OVERWRITE policy as it's faster than other policies for Weaviate since it uses the batch API. We can't use the batch API for other policies as it doesn't return any information whether the document already exists or not. That prevents us from returning errors when using the FAIL policy or skipping a Document when using the SKIP policy.
WeaviateDocumentStore.delete_documents
def delete_documents(document_ids: List[str]) -> None
Deletes all documents with matching document_ids from the DocumentStore.
Arguments:
document_ids
: The object_ids to delete.
Module haystack_integrations.components.retrievers.weaviate.bm25_retriever
WeaviateBM25Retriever
A component for retrieving documents from Weaviate using the BM25 algorithm.
Example usage:
from haystack_integrations.document_stores.weaviate.document_store import WeaviateDocumentStore
from haystack_integrations.components.retrievers.weaviate.bm25_retriever import WeaviateBM25Retriever
document_store = WeaviateDocumentStore(url="http://localhost:8080")
retriever = WeaviateBM25Retriever(document_store=document_store)
retriever.run(query="How to make a pizza", top_k=3)
WeaviateBM25Retriever.__init__
def __init__(*,
document_store: WeaviateDocumentStore,
filters: Optional[Dict[str, Any]] = None,
top_k: int = 10,
filter_policy: Union[str, FilterPolicy] = FilterPolicy.REPLACE)
Create a new instance of WeaviateBM25Retriever.
Arguments:
document_store
: Instance of WeaviateDocumentStore that will be used from this retriever.filters
: Custom filters applied when running the retrievertop_k
: Maximum number of documents to returnfilter_policy
: Policy to determine how filters are applied.
WeaviateBM25Retriever.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
WeaviateBM25Retriever.from_dict
def from_dict(cls, data: Dict[str, Any]) -> "WeaviateBM25Retriever"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
WeaviateBM25Retriever.run
output_types(documents=List[Document])
def run(query: str,
filters: Optional[Dict[str, Any]] = None,
top_k: Optional[int] = None)
.
Retrieves documents from Weaviate using the BM25 algorithm.
Arguments:
query
: The query text.filters
: Filters applied to the retrieved Documents. The way runtime filters are applied depends on thefilter_policy
chosen at retriever initialization. See init method docstring for more details.top_k
: The maximum number of documents to return.
Module haystack_integrations.components.retrievers.weaviate.embedding_retriever
WeaviateEmbeddingRetriever
A retriever that uses Weaviate's vector search to find similar documents based on the embeddings of the query.
WeaviateEmbeddingRetriever.__init__
def __init__(*,
document_store: WeaviateDocumentStore,
filters: Optional[Dict[str, Any]] = None,
top_k: int = 10,
distance: Optional[float] = None,
certainty: Optional[float] = None,
filter_policy: Union[str, FilterPolicy] = FilterPolicy.REPLACE)
Creates a new instance of WeaviateEmbeddingRetriever.
Arguments:
document_store
: Instance of WeaviateDocumentStore that will be used from this retriever.filters
: Custom filters applied when running the retriever.top_k
: Maximum number of documents to return.distance
: The maximum allowed distance between Documents' embeddings.certainty
: Normalized distance between the result item and the search vector.filter_policy
: Policy to determine how filters are applied.
Raises:
ValueError
: If bothdistance
andcertainty
are provided. See https://weaviate.io/developers/weaviate/api/graphql/search-operators#variables to learn more aboutdistance
andcertainty
parameters.
WeaviateEmbeddingRetriever.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
WeaviateEmbeddingRetriever.from_dict
def from_dict(cls, data: Dict[str, Any]) -> "WeaviateEmbeddingRetriever"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
WeaviateEmbeddingRetriever.run
output_types(documents=List[Document])
def run(query_embedding: List[float],
filters: Optional[Dict[str, Any]] = None,
top_k: Optional[int] = None,
distance: Optional[float] = None,
certainty: Optional[float] = None)
.
Retrieves documents from Weaviate using the vector search.
Arguments:
query_embedding
: Embedding of the query.filters
: Filters applied to the retrieved Documents. The way runtime filters are applied depends on thefilter_policy
chosen at retriever initialization. See init method docstring for more details.top_k
: The maximum number of documents to return.distance
: The maximum allowed distance between Documents' embeddings.certainty
: Normalized distance between the result item and the search vector.
Raises:
ValueError
: If bothdistance
andcertainty
are provided. See https://weaviate.io/developers/weaviate/api/graphql/search-operators#variables to learn more aboutdistance
andcertainty
parameters.