DocumentationAPI ReferenceπŸ““ TutorialsπŸ§‘β€πŸ³ Cookbook🀝 IntegrationsπŸ’œ Discord

OpenSearch integration for Haystack

Module haystack_integrations.components.retrievers.opensearch.bm25_retriever

OpenSearchBM25Retriever

OpenSearchBM25Retriever.__init__

def __init__(*,
             document_store: OpenSearchDocumentStore,
             filters: Optional[Dict[str, Any]] = None,
             fuzziness: str = "AUTO",
             top_k: int = 10,
             scale_score: bool = False,
             all_terms_must_match: bool = False)

Create the OpenSearchBM25Retriever component.

Arguments:

  • document_store: An instance of OpenSearchDocumentStore.
  • filters: Filters applied to the retrieved Documents. Defaults to None.
  • fuzziness: Fuzziness parameter for full-text queries. Defaults to "AUTO".
  • top_k: Maximum number of Documents to return, defaults to 10
  • scale_score: Whether to scale the score of retrieved documents between 0 and 1. This is useful when comparing documents across different indexes. Defaults to False.
  • all_terms_must_match: If True, all terms in the query string must be present in the retrieved documents. This is useful when searching for short text where even one term can make a difference. Defaults to False.

Raises:

  • ValueError: If document_store is not an instance of OpenSearchDocumentStore.

OpenSearchBM25Retriever.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

OpenSearchBM25Retriever.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "OpenSearchBM25Retriever"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

OpenSearchBM25Retriever.run

@component.output_types(documents=List[Document])
def run(query: str,
        filters: Optional[Dict[str, Any]] = None,
        all_terms_must_match: Optional[bool] = None,
        top_k: Optional[int] = None,
        fuzziness: Optional[str] = None,
        scale_score: Optional[bool] = None)

Retrieve documents using BM25 retrieval.

Arguments:

  • query: The query string
  • filters: Optional filters to narrow down the search space.
  • all_terms_must_match: If True, all terms in the query string must be present in the retrieved documents.
  • top_k: Maximum number of Documents to return.
  • fuzziness: Fuzziness parameter for full-text queries.
  • scale_score: Whether to scale the score of retrieved documents between 0 and 1. This is useful when comparing documents across different indexes.

Returns:

A dictionary containing the retrieved documents with the following structure:

  • documents: List of retrieved Documents.

Module haystack_integrations.components.retrievers.opensearch.embedding_retriever

OpenSearchEmbeddingRetriever

Uses a vector similarity metric to retrieve documents from the OpenSearchDocumentStore.

Needs to be connected to the OpenSearchDocumentStore to run.

OpenSearchEmbeddingRetriever.__init__

def __init__(*,
             document_store: OpenSearchDocumentStore,
             filters: Optional[Dict[str, Any]] = None,
             top_k: int = 10)

Create the OpenSearchEmbeddingRetriever component.

Arguments:

  • document_store: An instance of OpenSearchDocumentStore.
  • filters: Filters applied to the retrieved Documents. Defaults to None. Filters are applied during the approximate kNN search to ensure that top_k matching documents are returned.
  • top_k: Maximum number of Documents to return, defaults to 10

Raises:

  • ValueError: If document_store is not an instance of OpenSearchDocumentStore.

OpenSearchEmbeddingRetriever.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

OpenSearchEmbeddingRetriever.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "OpenSearchEmbeddingRetriever"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

OpenSearchEmbeddingRetriever.run

@component.output_types(documents=List[Document])
def run(query_embedding: List[float],
        filters: Optional[Dict[str, Any]] = None,
        top_k: Optional[int] = None)

Retrieve documents using a vector similarity metric.

Arguments:

  • query_embedding: Embedding of the query.
  • filters: Optional filters to narrow down the search space.
  • top_k: Maximum number of Documents to return.

Returns:

Dictionary with key "documents" containing the retrieved Documents.

  • documents: List of Document similar to query_embedding.

Module haystack_integrations.document_stores.opensearch.document_store

OpenSearchDocumentStore

OpenSearchDocumentStore.__init__

def __init__(*,
             hosts: Optional[Hosts] = None,
             index: str = "default",
             **kwargs)

Creates a new OpenSearchDocumentStore instance.

For more information on connection parameters, see the official OpenSearch documentation

For the full list of supported kwargs, see the official OpenSearch reference

Arguments:

  • hosts: List of hosts running the OpenSearch client. Defaults to None
  • index: Name of index in OpenSearch, if it doesn't exist it will be created. Defaults to "default"
  • **kwargs: Optional arguments that OpenSearch takes.

OpenSearchDocumentStore.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

OpenSearchDocumentStore.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "OpenSearchDocumentStore"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

OpenSearchDocumentStore.count_documents

def count_documents() -> int

Returns how many documents are present in the document store.

OpenSearchDocumentStore.write_documents

def write_documents(documents: List[Document],
                    policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int

Writes Documents to OpenSearch. If policy is not specified or set to DuplicatePolicy.NONE, it will raise an exception if a document with the same ID already exists in the document store.

OpenSearchDocumentStore.delete_documents

def delete_documents(document_ids: List[str]) -> None

Deletes all documents with a matching document_ids from the document store.

Arguments:

  • object_ids: the object_ids to delete

Module haystack_integrations.document_stores.opensearch.filters

normalize_filters

def normalize_filters(filters: Dict[str, Any]) -> Dict[str, Any]

Converts Haystack filters in OpenSearch compatible filters.