Module haystack_experimental.document_stores.in_memory.document_store

InMemoryDocumentStore

Asynchronous version of the in-memory document store.

InMemoryDocumentStore.init

def __init__(bm25_tokenization_regex: str = r"(?u)\b\w\w+\b",
             bm25_algorithm: Literal["BM25Okapi", "BM25L",
                                     "BM25Plus"] = "BM25L",
             bm25_parameters: Optional[Dict] = None,
             embedding_similarity_function: Literal["dot_product",
                                                    "cosine"] = "dot_product",
             index: Optional[str] = None,
             async_executor: Optional[ThreadPoolExecutor] = None)

Initializes the DocumentStore.

Arguments:

bm25_tokenization_regex: The regular expression used to tokenize the text for BM25 retrieval.
bm25_algorithm: The BM25 algorithm to use. One of "BM25Okapi", "BM25L", or "BM25Plus".
bm25_parameters: Parameters for BM25 implementation in a dictionary format. For example: {'k1':1.5, 'b':0.75, 'epsilon':0.25} You can learn more about these parameters by visiting https://github.com/dorianbrown/rank_bm25.
embedding_similarity_function: The similarity function used to compare Documents embeddings. One of "dot_product" (default) or "cosine". To choose the most appropriate function, look for information about your embedding model.
index: A specific index to store the documents. If not specified, a random UUID is used. Using the same index allows you to store documents across multiple InMemoryDocumentStore instances.
async_executor: Optional ThreadPoolExecutor to use for async calls. If not provided, a single-threaded executor will initialized and used.

InMemoryDocumentStore.count_documents_async

async def count_documents_async() -> int

Returns the number of how many documents are present in the DocumentStore.

InMemoryDocumentStore.filter_documents_async

async def filter_documents_async(
        filters: Optional[Dict[str, Any]] = None) -> List[Document]

Returns the documents that match the filters provided.

For a detailed specification of the filters, refer to the DocumentStore.filter_documents() protocol documentation.

Arguments:

filters: The filters to apply to the document list.

Returns:

A list of Documents that match the given filters.

InMemoryDocumentStore.write_documents_async

async def write_documents_async(
        documents: List[Document],
        policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int

Refer to the DocumentStore.write_documents() protocol documentation.

If policy is set to DuplicatePolicy.NONE defaults to DuplicatePolicy.FAIL.

InMemoryDocumentStore.delete_documents_async

async def delete_documents_async(document_ids: List[str]) -> None

Deletes all documents with matching document_ids from the DocumentStore.

Arguments:

document_ids: The object_ids to delete.

InMemoryDocumentStore.bm25_retrieval_async

async def bm25_retrieval_async(query: str,
                               filters: Optional[Dict[str, Any]] = None,
                               top_k: int = 10,
                               scale_score: bool = False) -> List[Document]

Retrieves documents that are most relevant to the query using BM25 algorithm.

Arguments:

query: The query string.
filters: A dictionary with filters to narrow down the search space.
top_k: The number of top documents to retrieve. Default is 10.
scale_score: Whether to scale the scores of the retrieved documents. Default is False.

Returns:

A list of the top_k documents most relevant to the query.

InMemoryDocumentStore.embedding_retrieval_async

async def embedding_retrieval_async(
        query_embedding: List[float],
        filters: Optional[Dict[str, Any]] = None,
        top_k: int = 10,
        scale_score: bool = False,
        return_embedding: bool = False) -> List[Document]

Retrieves documents that are most similar to the query embedding using a vector similarity metric.

Arguments:

query_embedding: Embedding of the query.
filters: A dictionary with filters to narrow down the search space.
top_k: The number of top documents to retrieve. Default is 10.
scale_score: Whether to scale the scores of the retrieved Documents. Default is False.
return_embedding: Whether to return the embedding of the retrieved Documents. Default is False.

Returns:

A list of the top_k documents most relevant to the query.

Module haystack_experimental.document_stores.types.protocol

DocumentStore

Stores Documents to be used by the components of a Pipeline.

Classes implementing this protocol often store the documents permanently and allow specialized components to perform retrieval on them, either by embedding, by keyword, hybrid, and so on, depending on the backend used.

In order to retrieve documents, consider using a Retriever that supports the DocumentStore implementation that you're using.

DocumentStore.to_dict

def to_dict() -> Dict[str, Any]

Serializes this store to a dictionary.

DocumentStore.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "DocumentStore"

Deserializes the store from a dictionary.

DocumentStore.count_documents

def count_documents() -> int

Returns the number of documents stored.

DocumentStore.filter_documents

def filter_documents(
        filters: Optional[Dict[str, Any]] = None) -> List[Document]

Returns the documents that match the filters provided.

Filters are defined as nested dictionaries that can be of two types:

Comparison
Logic

Comparison dictionaries must contain the keys:

field
operator
value

Logic dictionaries must contain the keys:

operator
conditions

The conditions key must be a list of dictionaries, either of type Comparison or Logic.

The operator value in Comparison dictionaries must be one of:

==
!=
>
>=
<
<=
in
not in

The operator values in Logic dictionaries must be one of:

NOT
OR
AND

A simple filter:

filters = {"field": "meta.type", "operator": "==", "value": "article"}

A more complex filter:

filters = {
    "operator": "AND",
    "conditions": [
        {"field": "meta.type", "operator": "==", "value": "article"},
        {"field": "meta.date", "operator": ">=", "value": 1420066800},
        {"field": "meta.date", "operator": "<", "value": 1609455600},
        {"field": "meta.rating", "operator": ">=", "value": 3},
        {
            "operator": "OR",
            "conditions": [
                {"field": "meta.genre", "operator": "in", "value": ["economy", "politics"]},
                {"field": "meta.publisher", "operator": "==", "value": "nytimes"},
            ],
        },
    ],
}

**Arguments**:

- `filters`: the filters to apply to the document list.

**Returns**:

a list of Documents that match the given filters.

<a id="haystack_experimental.document_stores.types.protocol.DocumentStore.write_documents"></a>

#### DocumentStore.write\_documents

```python
def write_documents(documents: List[Document],
                    policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int

Writes Documents into the DocumentStore.

Arguments:

documents: a list of Document objects.
policy: the policy to apply when a Document with the same id already exists in the DocumentStore.
DuplicatePolicy.NONE: Default policy, behaviour depends on the Document Store.
DuplicatePolicy.SKIP: If a Document with the same id already exists, it is skipped and not written.
DuplicatePolicy.OVERWRITE: If a Document with the same id already exists, it is overwritten.
DuplicatePolicy.FAIL: If a Document with the same id already exists, an error is raised.

Raises:

DuplicateError: If policy is set to DuplicatePolicy.FAIL and a Document with the same id already exists.

Returns:

The number of Documents written. If DuplicatePolicy.OVERWRITE is used, this number is always equal to the number of documents in input. If DuplicatePolicy.SKIP is used, this number can be lower than the number of documents in the input list.

DocumentStore.delete_documents

def delete_documents(document_ids: List[str]) -> None

Deletes all documents with a matching document_ids from the DocumentStore.

Fails with MissingDocumentError if no document with this id is present in the DocumentStore.

Arguments:

document_ids: the object_ids to delete

Module haystack_experimental.document_stores.in_memory.document_store

InMemoryDocumentStore

InMemoryDocumentStore.__init__

InMemoryDocumentStore.count_documents_async

InMemoryDocumentStore.filter_documents_async

InMemoryDocumentStore.write_documents_async

InMemoryDocumentStore.delete_documents_async

InMemoryDocumentStore.bm25_retrieval_async

InMemoryDocumentStore.embedding_retrieval_async

Module haystack_experimental.document_stores.types.protocol

DocumentStore

DocumentStore.to_dict

DocumentStore.from_dict

DocumentStore.count_documents

DocumentStore.filter_documents

DocumentStore.delete_documents

InMemoryDocumentStore.init