OpenSearch integration for Haystack
Module haystack_integrations.components.retrievers.opensearch.bm25_retriever
OpenSearchBM25Retriever
OpenSearchBM25Retriever.__init__
def __init__(*,
document_store: OpenSearchDocumentStore,
filters: Optional[Dict[str, Any]] = None,
fuzziness: str = "AUTO",
top_k: int = 10,
scale_score: bool = False,
all_terms_must_match: bool = False)
Create the OpenSearchBM25Retriever component.
Arguments:
document_store
: An instance of OpenSearchDocumentStore.filters
: Filters applied to the retrieved Documents. Defaults to None.fuzziness
: Fuzziness parameter for full-text queries. Defaults to "AUTO".top_k
: Maximum number of Documents to return, defaults to 10scale_score
: Whether to scale the score of retrieved documents between 0 and 1. This is useful when comparing documents across different indexes. Defaults to False.all_terms_must_match
: If True, all terms in the query string must be present in the retrieved documents. This is useful when searching for short text where even one term can make a difference. Defaults to False.
Raises:
ValueError
: Ifdocument_store
is not an instance of OpenSearchDocumentStore.
OpenSearchBM25Retriever.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
OpenSearchBM25Retriever.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "OpenSearchBM25Retriever"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
OpenSearchBM25Retriever.run
@component.output_types(documents=List[Document])
def run(query: str,
filters: Optional[Dict[str, Any]] = None,
all_terms_must_match: Optional[bool] = None,
top_k: Optional[int] = None,
fuzziness: Optional[str] = None,
scale_score: Optional[bool] = None)
Retrieve documents using BM25 retrieval.
Arguments:
query
: The query stringfilters
: Optional filters to narrow down the search space.all_terms_must_match
: If True, all terms in the query string must be present in the retrieved documents.top_k
: Maximum number of Documents to return.fuzziness
: Fuzziness parameter for full-text queries.scale_score
: Whether to scale the score of retrieved documents between 0 and 1. This is useful when comparing documents across different indexes.
Returns:
A dictionary containing the retrieved documents with the following structure:
- documents: List of retrieved Documents.
Module haystack_integrations.components.retrievers.opensearch.embedding_retriever
OpenSearchEmbeddingRetriever
Uses a vector similarity metric to retrieve documents from the OpenSearchDocumentStore.
Needs to be connected to the OpenSearchDocumentStore to run.
OpenSearchEmbeddingRetriever.__init__
def __init__(*,
document_store: OpenSearchDocumentStore,
filters: Optional[Dict[str, Any]] = None,
top_k: int = 10)
Create the OpenSearchEmbeddingRetriever component.
Arguments:
document_store
: An instance of OpenSearchDocumentStore.filters
: Filters applied to the retrieved Documents. Defaults to None. Filters are applied during the approximate kNN search to ensure that top_k matching documents are returned.top_k
: Maximum number of Documents to return, defaults to 10
Raises:
ValueError
: Ifdocument_store
is not an instance of OpenSearchDocumentStore.
OpenSearchEmbeddingRetriever.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
OpenSearchEmbeddingRetriever.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "OpenSearchEmbeddingRetriever"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
OpenSearchEmbeddingRetriever.run
@component.output_types(documents=List[Document])
def run(query_embedding: List[float],
filters: Optional[Dict[str, Any]] = None,
top_k: Optional[int] = None)
Retrieve documents using a vector similarity metric.
Arguments:
query_embedding
: Embedding of the query.filters
: Optional filters to narrow down the search space.top_k
: Maximum number of Documents to return.
Returns:
Dictionary with key "documents" containing the retrieved Documents.
- documents: List of Document similar to
query_embedding
.
Module haystack_integrations.document_stores.opensearch.document_store
OpenSearchDocumentStore
OpenSearchDocumentStore.__init__
def __init__(*,
hosts: Optional[Hosts] = None,
index: str = "default",
**kwargs)
Creates a new OpenSearchDocumentStore instance.
For more information on connection parameters, see the official OpenSearch documentation
For the full list of supported kwargs, see the official OpenSearch reference
Arguments:
hosts
: List of hosts running the OpenSearch client. Defaults to Noneindex
: Name of index in OpenSearch, if it doesn't exist it will be created. Defaults to "default"**kwargs
: Optional arguments thatOpenSearch
takes.
OpenSearchDocumentStore.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
OpenSearchDocumentStore.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "OpenSearchDocumentStore"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
OpenSearchDocumentStore.count_documents
def count_documents() -> int
Returns how many documents are present in the document store.
OpenSearchDocumentStore.write_documents
def write_documents(documents: List[Document],
policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int
Writes Documents to OpenSearch. If policy is not specified or set to DuplicatePolicy.NONE, it will raise an exception if a document with the same ID already exists in the document store.
OpenSearchDocumentStore.delete_documents
def delete_documents(document_ids: List[str]) -> None
Deletes all documents with a matching document_ids from the document store.
Arguments:
object_ids
: the object_ids to delete
Module haystack_integrations.document_stores.opensearch.filters
normalize_filters
def normalize_filters(filters: Dict[str, Any]) -> Dict[str, Any]
Converts Haystack filters in OpenSearch compatible filters.