DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio
Documentation

AzureAISearchEmbeddingRetriever

An embedding Retriever compatible with the Azure AI Search Document Store.

This Retriever accepts the embeddings of a single query as input and returns a list of matching documents.

Most common position in a pipeline1. After a Text Embedder and before a PromptBuilder in a RAG pipeline 2. The last component in the semantic search pipeline 3. After a Text Embedder and before an ExtractiveReader in an extractive QA pipeline
Mandatory init variables"document_store": An instance of AzureAISearchDocumentStore
Mandatory run variables"query_embedding": A list of floats
Output variables“documents”: A list of documents
API referenceAzure AI Search
GitHub linkhttps://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/azure_ai_search

Overview

The AzureAISearchEmbeddingRetriever is an embedding-based Retriever compatible with the AzureAISearchDocumentStore. It compares the query and document embeddings and fetches the most relevant documents from the AzureAISearchDocumentStore based on the outcome.

The query needs to be embedded before being passed to this component. For example, you could use a Text Embedder component.

By default, the AzureAISearchDocumentStore uses the HNSW algorithm with cosine similarity to handle vector searches. The vector configuration is set during the initialization of the document store and can be customized by providing the vector_search_configuration parameter.

In addition to the query_embedding, the AzureAISearchEmbeddingRetriever accepts other optional parameters, including top_k (the maximum number of documents to retrieve) and filters to narrow down the search space.

📘

Semantic Ranking

The semantic ranking capability of Azure AI Search is not available for vector retrieval. To include semantic ranking in your retrieval process, use the AzureAISearchBM25Retriever or AzureAISearchHybridRetriever. For more details, see Azure AI documentation.

Usage

Installation

This integration requires you to have an active Azure subscription with a deployed Azure AI Search service.

To start using Azure AI search with Haystack, install the package with:

pip install azure-ai-search-haystack

On its own

This Retriever needs AzureAISearchDocumentStore and indexed documents to run.

from haystack_integrations.document_stores.azure_ai_search import AzureAISearchDocumentStore
from haystack_integrations.components.retrievers.azure_ai_search import AzureAISearchEmbeddingRetriever

document_store = AzureAISearchDocumentStore()

retriever = AzureAISearchEmbeddingRetriever(document_store=document_store)

# example run query
retriever.run(query_embedding=[0.1]*384)

In a pipeline

Here is how you could use the AzureAISearchEmbeddingRetriever in a pipeline. In this example, you would create two pipelines: an indexing one and a querying one.

In the indexing pipeline, the documents are passed to the Document Embedder and then written into the Document Store.

Then, in the querying pipeline, we use a Text Embedder to get the vector representation of the input query that will be then passed to the AzureAISearchEmbeddingRetriever to get the results.

from haystack import Document, Pipeline
from haystack.components.embedders import SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder
from haystack.components.writers import DocumentWriter

from haystack_integrations.components.retrievers.azure_ai_search import AzureAISearchEmbeddingRetriever
from haystack_integrations.document_stores.azure_ai_search import AzureAISearchDocumentStore

document_store = AzureAISearchDocumentStore(index_name="retrieval-example")

model = "sentence-transformers/all-mpnet-base-v2"

documents = [
    Document(content="There are over 7,000 languages spoken around the world today."),
    Document(
        content="""Elephants have been observed to behave in a way that indicates a
         high level of self-awareness, such as recognizing themselves in mirrors."""
    ),
    Document(
        content="""In certain parts of the world, like the Maldives, Puerto Rico, and
          San Diego, you can witness the phenomenon of bioluminescent waves."""
    ),
]

document_embedder = SentenceTransformersDocumentEmbedder(model=model)
document_embedder.warm_up()

# Indexing Pipeline
indexing_pipeline = Pipeline()
indexing_pipeline.add_component(instance=document_embedder, name="doc_embedder")
indexing_pipeline.add_component(instance=DocumentWriter(document_store=document_store), name="doc_writer")
indexing_pipeline.connect("doc_embedder", "doc_writer")

indexing_pipeline.run({"doc_embedder": {"documents": documents}})

# Query Pipeline
query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder(model=model))
query_pipeline.add_component("retriever", AzureAISearchEmbeddingRetriever(document_store=document_store))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

query = "How many languages are there?"

result = query_pipeline.run({"text_embedder": {"text": query}})

print(result["retriever"]["documents"][0])