WeaviateDocumentStore
Weaviate is a multi-purpose vector DB that can store both embeddings and data objects, making it a good choice for multi-modality.
The WeaviateDocumentStore
can connect to any Weaviate instance, whether it's running on Weaviate Cloud Services, Kubernetes, or a local Docker container.
Installation
You can simply install the Weaviate Haystack integration with:
pip install weaviate-haystack
Initialization
The quickest way to use the WeaviateDocumentStore
is to start a local Docker container. This is what a minimal docker-compose.yml
could look like:
---
version: '3.4'
services:
weaviate:
command:
- --host
- 0.0.0.0
- --port
- '8080'
- --scheme
- http
image: semitechnologies/weaviate:1.24.5
ports:
- 8080:8080
- 50051:50051
volumes:
- weaviate_data:/var/lib/weaviate
restart: on-failure:0
environment:
QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
DEFAULT_VECTORIZER_MODULE: 'none'
ENABLE_MODULES: ''
CLUSTER_HOSTNAME: 'node1'
volumes:
weaviate_data:
...
With this example, we explicitly enabled access without authentication, so we won’t need to set any username, password, or API key to connect to our local instance. That is strongly not recommended for production use. See the authorization section for detailed information.
Let’s start our container with docker compose up -d
and then initialize our Document Store with:
from haystack_integrations.document_stores.weaviate.document_store import WeaviateDocumentStore
from haystack import Document
document_store = WeaviateDocumentStore(url="http://localhost:8080")
document_store.write_documents([
Document(content="This is first"),
Document(content="This is second")
])
print(document_store.count_documents())
Authorization
We provide some utility classes in the auth
package to handle authorization using different credentials. Every class stores distinct secrets and retrieves them from the environment variables when required.
The default environment variables for the classes are:
AuthApiKey
WEAVIATE_API_KEY
AuthBearerToken
WEAVIATE_ACCESS_TOKEN
WEAVIATE_REFRESH_TOKEN
AuthClientCredentials
WEAVIATE_CLIENT_SECRET
WEAVIATE_SCOPE
AuthClientPassword
WEAVIATE_USERNAME
WEAVIATE_PASSWORD
WEAVIATE_SCOPE
You can easily change environment variables if needed. In the following snippet, we instruct AuthApiKey
to look for MY_ENV_VAR
.
from haystack_integrations.document_stores.weaviate.auth import AuthApiKey
from haystack.utils.auth import Secret
AuthApiKey(api_key=Secret.from_env_var("MY_ENV_VAR"))
Supported Retrievers
WeaviateBM25Retriever
: A keyword-based Retriever that fetches documents matching a query from the Document Store.
WeaviateEmbeddingRetriever
: Compares the query and document embeddings and fetches the documents most relevant to the query.
Updated 5 months ago