Azure AI Search
haystack_integrations.components.retrievers.azure_ai_search.embedding_retriever
AzureAISearchEmbeddingRetriever
Retrieves documents from the AzureAISearchDocumentStore using a vector similarity metric. Must be connected to the AzureAISearchDocumentStore to run.
init
__init__(
*,
document_store: AzureAISearchDocumentStore,
filters: dict[str, Any] | None = None,
top_k: int = 10,
filter_policy: str | FilterPolicy = FilterPolicy.REPLACE,
**kwargs: Any
)
Create the AzureAISearchEmbeddingRetriever component.
Parameters:
- document_store (
AzureAISearchDocumentStore) – An instance of AzureAISearchDocumentStore to use with the Retriever. - filters (
dict[str, Any] | None) – Filters applied when fetching documents from the Document Store. - top_k (
int) – Maximum number of documents to return. - filter_policy (
str | FilterPolicy) – Policy to determine how filters are applied. - kwargs (
Any) – Additional keyword arguments to pass to the Azure AI's search endpoint. Some of the supported parameters:query_type: A string indicating the type of query to perform. Possible values are 'simple','full' and 'semantic'.semantic_configuration_name: The name of semantic configuration to be used when processing semantic queries. For more information on parameters, see the official Azure AI Search documentation.
to_dict
Serializes the component to a dictionary.
Returns:
dict[str, Any]– Dictionary with serialized data.
from_dict
Deserializes the component from a dictionary.
Parameters:
- data (
dict[str, Any]) – Dictionary to deserialize from.
Returns:
AzureAISearchEmbeddingRetriever– Deserialized component.
run
run(
query_embedding: list[float],
filters: dict[str, Any] | None = None,
top_k: int | None = None,
) -> dict[str, list[Document]]
Retrieve documents from the AzureAISearchDocumentStore.
Parameters:
- query_embedding (
list[float]) – A list of floats representing the query embedding. - filters (
dict[str, Any] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on thefilter_policychosen at retriever initialization. See__init__method docstring for more details. - top_k (
int | None) – The maximum number of documents to retrieve.
Returns:
dict[str, list[Document]]– Dictionary with the following keys:documents: A list of documents retrieved from the AzureAISearchDocumentStore.
haystack_integrations.document_stores.azure_ai_search.document_store
AzureAISearchDocumentStore
init
__init__(
*,
api_key: Secret = Secret.from_env_var(
"AZURE_AI_SEARCH_API_KEY", strict=False
),
azure_endpoint: Secret = Secret.from_env_var(
"AZURE_AI_SEARCH_ENDPOINT", strict=True
),
index_name: str = "default",
embedding_dimension: int = 768,
metadata_fields: dict[str, SearchField | type] | None = None,
vector_search_configuration: VectorSearch | None = None,
include_search_metadata: bool = False,
**index_creation_kwargs: Any
)
A document store using Azure AI Search as the backend.
Parameters:
- azure_endpoint (
Secret) – The URL endpoint of an Azure AI Search service. - api_key (
Secret) – The API key to use for authentication. - index_name (
str) – Name of index in Azure AI Search, if it doesn't exist it will be created. - embedding_dimension (
int) – Dimension of the embeddings. - metadata_fields (
dict[str, SearchField | type] | None) – A dictionary mapping metadata field names to their corresponding field definitions. Each field can be defined either as: - A SearchField object to specify detailed field configuration like type, searchability, and filterability
- A Python type (
str,bool,int,float, ordatetime) to create a simple filterable field
These fields are automatically added when creating the search index. Example:
metadata_fields={
"Title": SearchField(
name="Title",
type="Edm.String",
searchable=True,
filterable=True
),
"Pages": int
}
- vector_search_configuration (
VectorSearch | None) – Configuration option related to vector search. Default configuration uses the HNSW algorithm with cosine similarity to handle vector searches. - include_search_metadata (
bool) – Whether to include Azure AI Search metadata fields in the returned documents. When set to True, themetafield of the returned documents will contain the @search.score, @search.reranker_score, @search.highlights, @search.captions, and other fields returned by Azure AI Search. - index_creation_kwargs (
Any) – Optional keyword parameters to be passed toSearchIndexclass during index creation. Some of the supported parameters: -semantic_search: Defines semantic configuration of the search index. This parameter is needed to enable semantic search capabilities in index. -similarity: The type of similarity algorithm to be used when scoring and ranking the documents matching a search query. The similarity algorithm can only be defined at index creation time and cannot be modified on existing indexes.
For more information on parameters, see the official Azure AI Search documentation.
to_dict
Serializes the component to a dictionary.
Returns:
dict[str, Any]– Dictionary with serialized data.
from_dict
Deserializes the component from a dictionary.
Parameters:
- data (
dict[str, Any]) – Dictionary to deserialize from.
Returns:
AzureAISearchDocumentStore– Deserialized component.
count_documents
Returns how many documents are present in the search index.
Returns:
int– list of retrieved documents.
count_documents_by_filter
Returns the count of documents that match the provided filters.
Parameters:
- filters (
dict[str, Any]) – The filters to apply to the document list. For filter syntax, see Haystack metadata filtering
Returns:
int– The number of documents that match the filters.
count_unique_metadata_by_filter
count_unique_metadata_by_filter(
filters: dict[str, Any], metadata_fields: list[str]
) -> dict[str, int]
Counts unique values for each specified metadata field in documents matching the filters.
Parameters:
- filters (
dict[str, Any]) – The filters to apply to select documents. - metadata_fields (
list[str]) – List of field names to count unique values for.
Returns:
dict[str, int]– Dictionary mapping field names to counts of unique values.
get_metadata_fields_info
Returns the information about metadata fields in the index.
Returns:
dict[str, dict[str, str]]– Dictionary mapping field names to type information.
get_metadata_field_min_max
Returns the minimum and maximum values for the given metadata field.
Parameters:
- metadata_field (
str) – The metadata field to get the minimum and maximum values for.
Returns:
dict[str, Any]– A dictionary with the keys "min" and "max".
get_metadata_field_unique_values
get_metadata_field_unique_values(
metadata_field: str,
search_term: str | None = None,
from_: int = 0,
size: int = 10,
) -> tuple[list[str], int]
Retrieves unique values for a metadata field with optional search and pagination.
Parameters:
- metadata_field (
str) – The metadata field to get unique values for. - search_term (
str | None) – Optional search term to filter unique values. - from_ (
int) – Starting offset for pagination. - size (
int) – Number of values to return.
Returns:
tuple[list[str], int]– Tuple of (list of unique values, total count of matching values).
query_sql
Executes an SQL query if supported by the document store backend.
Azure AI Search does not support SQL queries.
write_documents
write_documents(
documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE
) -> int
Writes the provided documents to search index.
Parameters:
- documents (
list[Document]) – documents to write to the index. - policy (
DuplicatePolicy) – Policy to determine how duplicates are handled.
Returns:
int– the number of documents added to index.
Raises:
ValueError– If the documents are not of type Document.TypeError– If the document ids are not strings.
delete_documents
Deletes all documents with a matching document_ids from the search index.
Parameters:
- document_ids (
list[str]) – ids of the documents to be deleted.
delete_all_documents
Deletes all documents in the document store.
Parameters:
- recreate_index (
bool) – If True, the index will be deleted and recreated with the original schema. If False, all documents will be deleted while preserving the index.
delete_by_filter
Deletes all documents that match the provided filters.
Azure AI Search does not support server-side delete by query, so this method first searches for matching documents, then deletes them in a batch operation.
Parameters:
- filters (
dict[str, Any]) – The filters to apply to select documents for deletion. For filter syntax, see Haystack metadata filtering
Returns:
int– The number of documents deleted.
update_by_filter
Updates the fields of all documents that match the provided filters.
Azure AI Search does not support server-side update by query, so this method first searches for matching documents, then updates them using merge operations.
Parameters:
- filters (
dict[str, Any]) – The filters to apply to select documents for updating. For filter syntax, see Haystack metadata filtering - meta (
dict[str, Any]) – The fields to update. These fields must exist in the index schema.
Returns:
int– The number of documents updated.
search_documents
Returns all documents that match the provided search_text. If search_text is None, returns all documents.
Parameters:
- search_text (
str) – the text to search for in the Document list. - top_k (
int) – Maximum number of documents to return.
Returns:
list[Document]– A list of Documents that match the given search_text.
filter_documents
Returns the documents that match the provided filters. Filters should be given as a dictionary supporting filtering by metadata. For details on filters, see the metadata filtering documentation.
Parameters:
- filters (
dict[str, Any] | None) – the filters to apply to the document list.
Returns:
list[Document]– A list of Documents that match the given filters.