Version: 3.0

WeaviateEmbeddingRetriever

This is an embedding Retriever compatible with the Weaviate Document Store.


Most common position in a pipeline	1. After a Text Embedder and before a `PromptBuilder` in a RAG pipeline 2. The last component in the semantic search pipeline 3. After a Text Embedder and before a `TransformersExtractiveReader` in an extractive QA pipeline
Mandatory init variables	`document_store`: An instance of a WeaviateDocumentStore
Mandatory run variables	`query_embedding`: A list of floats
Output variables	`documents`: A list of documents
API reference	Weaviate
GitHub link	https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/weaviate
Package name	`weaviate-haystack`

Overview

The WeaviateEmbeddingRetriever is an embedding-based Retriever compatible with the WeaviateDocumentStore. It compares the query and Document embeddings and fetches the Documents most relevant to the query from the WeaviateDocumentStore based on the outcome.

Parameters

When using the WeaviateEmbeddingRetriever in your NLP system, ensure the query and Document embeddings are available. You can do so by adding a Document Embedder to your indexing Pipeline and a Text Embedder to your query Pipeline.

In addition to the query_embedding, the WeaviateEmbeddingRetriever accepts other optional parameters, including top_k (the maximum number of Documents to retrieve) and filters to narrow down the search space.

You can also specify distance, the maximum allowed distance between embeddings, and certainty, the normalized distance between the result items and the search embedding. The behavior of distance depends on the Collection’s distance metric used. See the official Weaviate documentation for more information.

The embedding similarity function depends on the vectorizer used in the WeaviateDocumentStore collection. Check out the official Weaviate documentation to see all the supported vectorizers.

Usage

Installation

To start using Weaviate with Haystack, install the package with:

shell

pip install weaviate-haystack

On its own

This Retriever needs an instance of WeaviateDocumentStore and indexed Documents to run.

python

from haystack_integrations.document_stores.weaviate.document_store import (
    WeaviateDocumentStore,
)
from haystack_integrations.components.retrievers.weaviate import (
    WeaviateEmbeddingRetriever,
)

document_store = WeaviateDocumentStore(url="http://localhost:8080")

retriever = WeaviateEmbeddingRetriever(document_store=document_store)

# using a fake vector to keep the example simple
retriever.run(query_embedding=[0.1] * 768)

In a Pipeline

The examples on this page use Sentence Transformers embedders that have moved to the sentence-transformers-haystack package. Install it to run the examples:

shell

pip install sentence-transformers-haystack

python

from haystack.document_stores.types import DuplicatePolicy
from haystack import Document
from haystack import Pipeline
from haystack_integrations.components.embedders.sentence_transformers import (
    SentenceTransformersTextEmbedder,
    SentenceTransformersDocumentEmbedder,
)

from haystack_integrations.document_stores.weaviate.document_store import (
    WeaviateDocumentStore,
)
from haystack_integrations.components.retrievers.weaviate import (
    WeaviateEmbeddingRetriever,
)

document_store = WeaviateDocumentStore(url="http://localhost:8080")

documents = [
    Document(content="There are over 7,000 languages spoken around the world today."),
    Document(
        content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors.",
    ),
    Document(
        content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.",
    ),
]

document_embedder = SentenceTransformersDocumentEmbedder()
documents_with_embeddings = document_embedder.run(documents)

document_store.write_documents(
    documents_with_embeddings.get("documents"),
    policy=DuplicatePolicy.OVERWRITE,
)

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
query_pipeline.add_component(
    "retriever",
    WeaviateEmbeddingRetriever(document_store=document_store),
)
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

query = "How many languages are there?"

result = query_pipeline.run({"text_embedder": {"text": query}})

print(result["retriever"]["documents"][0])

Overview​

Parameters​

Usage​

Installation​

On its own​

In a Pipeline​