Module haystack_integrations.document_stores.mongodb_atlas.document_store

MongoDBAtlasDocumentStore

A MongoDBAtlasDocumentStore implementation that uses the MongoDB Atlas service that is easy to deploy, operate, and scale.

To connect to MongoDB Atlas, you need to provide a connection string in the format: "mongodb+srv://{mongo_atlas_username}:{mongo_atlas_password}@{mongo_atlas_host}/?{mongo_atlas_params_string}".

This connection string can be obtained on the MongoDB Atlas Dashboard by clicking on the CONNECT button, selecting Python as the driver, and copying the connection string. The connection string can be provided as an environment variable MONGO_CONNECTION_STRING or directly as a parameter to the MongoDBAtlasDocumentStore constructor.

After providing the connection string, you'll need to specify the database_name and collection_name to use. Most likely that you'll create these via the MongoDB Atlas web UI but one can also create them via the MongoDB Python driver. Creating databases and collections is beyond the scope of MongoDBAtlasDocumentStore. The primary purpose of this document store is to read and write documents to an existing collection.

Users must provide both a vector_search_index for vector search operations and a full_text_search_index for full-text search operations. The vector_search_index supports a chosen metric (e.g., cosine, dot product, or Euclidean), while the full_text_search_index enables efficient text-based searches. Both indexes can be created through the Atlas web UI.

For more details on MongoDB Atlas, see the official MongoDB Atlas documentation.

Usage example:

from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore

store = MongoDBAtlasDocumentStore(database_name="your_existing_db",
                                  collection_name="your_existing_collection",
                                  vector_search_index="your_existing_index",
                                  full_text_search_index="your_existing_index")
print(store.count_documents())

MongoDBAtlasDocumentStore.init

def __init__(*,
             mongo_connection_string: Secret = Secret.from_env_var(
                 "MONGO_CONNECTION_STRING"),
             database_name: str,
             collection_name: str,
             vector_search_index: str,
             full_text_search_index: str,
             embedding_field: str = "embedding",
             content_field: str = "content")

Creates a new MongoDBAtlasDocumentStore instance.

Arguments:

mongo_connection_string: MongoDB Atlas connection string in the format: "mongodb+srv://{mongo_atlas_username}:{mongo_atlas_password}@{mongo_atlas_host}/?{mongo_atlas_params_string}". This can be obtained on the MongoDB Atlas Dashboard by clicking on the CONNECT button. This value will be read automatically from the env var "MONGO_CONNECTION_STRING".
database_name: Name of the database to use.
collection_name: Name of the collection to use. To use this document store for embedding retrieval, this collection needs to have a vector search index set up on the embedding field.
vector_search_index: The name of the vector search index to use for vector search operations. Create a vector_search_index in the Atlas web UI and specify the init params of MongoDBAtlasDocumentStore. For more details refer to MongoDB Atlas documentation.
full_text_search_index: The name of the search index to use for full-text search operations. Create a full_text_search_index in the Atlas web UI and specify the init params of MongoDBAtlasDocumentStore. For more details refer to MongoDB Atlas documentation.
embedding_field: The name of the field containing document embeddings. Default is "embedding".
content_field: The name of the field containing the document content. Default is "content". This field is allows defining which field to load into the Haystack Document object as content. It can be particularly useful when integrating with an existing collection for retrieval. We discourage using this parameter when working with collections created by Haystack.

Raises:

ValueError: If the collection name contains invalid characters.

MongoDBAtlasDocumentStore.del

def __del__() -> None

Destructor method to close MongoDB connections when the instance is destroyed.

MongoDBAtlasDocumentStore.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

MongoDBAtlasDocumentStore.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "MongoDBAtlasDocumentStore"

Deserializes the component from a dictionary.

Arguments:

data: Dictionary to deserialize from.

Returns:

Deserialized component.

MongoDBAtlasDocumentStore.count_documents

def count_documents() -> int

Returns how many documents are present in the document store.

Returns:

The number of documents in the document store.

MongoDBAtlasDocumentStore.count_documents_async

async def count_documents_async() -> int

Asynchronously returns how many documents are present in the document store.

Returns:

The number of documents in the document store.

MongoDBAtlasDocumentStore.filter_documents

def filter_documents(
        filters: Optional[Dict[str, Any]] = None) -> List[Document]

Returns the documents that match the filters provided.

For a detailed specification of the filters, refer to the Haystack documentation.

Arguments:

filters: The filters to apply. It returns only the documents that match the filters.

Returns:

A list of Documents that match the given filters.

MongoDBAtlasDocumentStore.filter_documents_async

async def filter_documents_async(
        filters: Optional[Dict[str, Any]] = None) -> List[Document]

Asynchronously returns the documents that match the filters provided.

For a detailed specification of the filters, refer to the Haystack documentation.

Arguments:

filters: The filters to apply. It returns only the documents that match the filters.

Returns:

A list of Documents that match the given filters.

MongoDBAtlasDocumentStore.write_documents

def write_documents(documents: List[Document],
                    policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int

Writes documents into the MongoDB Atlas collection.

Arguments:

documents: A list of Documents to write to the document store.
policy: The duplicate policy to use when writing documents.

Raises:

DuplicateDocumentError: If a document with the same ID already exists in the document store and the policy is set to DuplicatePolicy.FAIL (or not specified).
ValueError: If the documents are not of type Document.

Returns:

The number of documents written to the document store.

MongoDBAtlasDocumentStore.write_documents_async

async def write_documents_async(
        documents: List[Document],
        policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int

Writes documents into the MongoDB Atlas collection.

Arguments:

documents: A list of Documents to write to the document store.
policy: The duplicate policy to use when writing documents.

Raises:

DuplicateDocumentError: If a document with the same ID already exists in the document store and the policy is set to DuplicatePolicy.FAIL (or not specified).
ValueError: If the documents are not of type Document.

Returns:

The number of documents written to the document store.

MongoDBAtlasDocumentStore.delete_documents

def delete_documents(document_ids: List[str]) -> None

Deletes all documents with a matching document_ids from the document store.

Arguments:

document_ids: the document ids to delete

MongoDBAtlasDocumentStore.delete_documents_async

async def delete_documents_async(document_ids: List[str]) -> None

Asynchronously deletes all documents with a matching document_ids from the document store.

Arguments:

document_ids: the document ids to delete

Module haystack_integrations.components.retrievers.mongodb_atlas.embedding_retriever

MongoDBAtlasEmbeddingRetriever

Retrieves documents from the MongoDBAtlasDocumentStore by embedding similarity.

The similarity is dependent on the vector_search_index used in the MongoDBAtlasDocumentStore and the chosen metric during the creation of the index (i.e. cosine, dot product, or euclidean). See MongoDBAtlasDocumentStore for more information.

Usage example:

import numpy as np
from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore
from haystack_integrations.components.retrievers.mongodb_atlas import MongoDBAtlasEmbeddingRetriever

store = MongoDBAtlasDocumentStore(database_name="haystack_integration_test",
                                  collection_name="test_embeddings_collection",
                                  vector_search_index="cosine_index",
                                  full_text_search_index="full_text_index")
retriever = MongoDBAtlasEmbeddingRetriever(document_store=store)

results = retriever.run(query_embedding=np.random.random(768).tolist())
print(results["documents"])

The example above retrieves the 10 most similar documents to a random query embedding from the MongoDBAtlasDocumentStore. Note that dimensions of the query_embedding must match the dimensions of the embeddings stored in the MongoDBAtlasDocumentStore.

MongoDBAtlasEmbeddingRetriever.init

def __init__(*,
             document_store: MongoDBAtlasDocumentStore,
             filters: Optional[Dict[str, Any]] = None,
             top_k: int = 10,
             filter_policy: Union[str, FilterPolicy] = FilterPolicy.REPLACE)

Create the MongoDBAtlasDocumentStore component.

Arguments:

document_store: An instance of MongoDBAtlasDocumentStore.
filters: Filters applied to the retrieved Documents. Make sure that the fields used in the filters are included in the configuration of the vector_search_index. The configuration must be done manually in the Web UI of MongoDB Atlas.
top_k: Maximum number of Documents to return.
filter_policy: Policy to determine how filters are applied.

Raises:

ValueError: If document_store is not an instance of MongoDBAtlasDocumentStore.

MongoDBAtlasEmbeddingRetriever.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

MongoDBAtlasEmbeddingRetriever.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "MongoDBAtlasEmbeddingRetriever"

Deserializes the component from a dictionary.

Arguments:

data: Dictionary to deserialize from.

Returns:

Deserialized component.

MongoDBAtlasEmbeddingRetriever.run

@component.output_types(documents=List[Document])
def run(query_embedding: List[float],
        filters: Optional[Dict[str, Any]] = None,
        top_k: Optional[int] = None) -> Dict[str, List[Document]]

Retrieve documents from the MongoDBAtlasDocumentStore, based on the provided embedding similarity.

Arguments:

query_embedding: Embedding of the query.
filters: Filters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy chosen at retriever initialization. See init method docstring for more details.
top_k: Maximum number of Documents to return. Overrides the value specified at initialization.

Returns:

A dictionary with the following keys:

documents: List of Documents most similar to the given query_embedding

MongoDBAtlasEmbeddingRetriever.run_async

@component.output_types(documents=List[Document])
async def run_async(query_embedding: List[float],
                    filters: Optional[Dict[str, Any]] = None,
                    top_k: Optional[int] = None) -> Dict[str, List[Document]]

Asynchronously retrieve documents from the MongoDBAtlasDocumentStore, based on the provided embedding

similarity.

Arguments:

query_embedding: Embedding of the query.
filters: Filters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy chosen at retriever initialization. See init method docstring for more details.
top_k: Maximum number of Documents to return. Overrides the value specified at initialization.

Returns:

A dictionary with the following keys:

documents: List of Documents most similar to the given query_embedding

Module haystack_integrations.components.retrievers.mongodb_atlas.full_text_retriever

MongoDBAtlasFullTextRetriever

Retrieves documents from the MongoDBAtlasDocumentStore by full-text search.

The full-text search is dependent on the full_text_search_index used in the MongoDBAtlasDocumentStore. See MongoDBAtlasDocumentStore for more information.

Usage example:

from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore
from haystack_integrations.components.retrievers.mongodb_atlas import MongoDBAtlasFullTextRetriever

store = MongoDBAtlasDocumentStore(database_name="your_existing_db",
                                  collection_name="your_existing_collection",
                                  vector_search_index="your_existing_index",
                                  full_text_search_index="your_existing_index")
retriever = MongoDBAtlasFullTextRetriever(document_store=store)

results = retriever.run(query="Lorem ipsum")
print(results["documents"])

The example above retrieves the 10 most similar documents to the query "Lorem ipsum" from the MongoDBAtlasDocumentStore.

MongoDBAtlasFullTextRetriever.init

def __init__(*,
             document_store: MongoDBAtlasDocumentStore,
             filters: Optional[Dict[str, Any]] = None,
             top_k: int = 10,
             filter_policy: Union[str, FilterPolicy] = FilterPolicy.REPLACE)

Arguments:

document_store: An instance of MongoDBAtlasDocumentStore.
filters: Filters applied to the retrieved Documents. Make sure that the fields used in the filters are included in the configuration of the full_text_search_index. The configuration must be done manually in the Web UI of MongoDB Atlas.
top_k: Maximum number of Documents to return.
filter_policy: Policy to determine how filters are applied.

Raises:

ValueError: If document_store is not an instance of MongoDBAtlasDocumentStore.

MongoDBAtlasFullTextRetriever.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

MongoDBAtlasFullTextRetriever.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "MongoDBAtlasFullTextRetriever"

Deserializes the component from a dictionary.

Arguments:

data: Dictionary to deserialize from.

Returns:

Deserialized component.

MongoDBAtlasFullTextRetriever.run

@component.output_types(documents=List[Document])
def run(query: Union[str, List[str]],
        fuzzy: Optional[Dict[str, int]] = None,
        match_criteria: Optional[Literal["any", "all"]] = None,
        score: Optional[Dict[str, Dict]] = None,
        synonyms: Optional[str] = None,
        filters: Optional[Dict[str, Any]] = None,
        top_k: int = 10) -> Dict[str, List[Document]]

Retrieve documents from the MongoDBAtlasDocumentStore by full-text search.

Arguments:

query: The query string or a list of query strings to search for. If the query contains multiple terms, Atlas Search evaluates each term separately for matches.
fuzzy: Enables finding strings similar to the search term(s). Note, fuzzy cannot be used with synonyms. Configurable options include maxEdits, prefixLength, and maxExpansions. For more details refer to MongoDB Atlas documentation.
match_criteria: Defines how terms in the query are matched. Supported options are "any" and "all". For more details refer to MongoDB Atlas documentation.
score: Specifies the scoring method for matching results. Supported options include boost, constant, and function. For more details refer to MongoDB Atlas documentation.
synonyms: The name of the synonym mapping definition in the index. This value cannot be an empty string. Note, synonyms can not be used with fuzzy.
filters: Filters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy chosen at retriever initialization. See init method docstring for more details.
top_k: Maximum number of Documents to return. Overrides the value specified at initialization.

Returns:

A dictionary with the following keys:

documents: List of Documents most similar to the given query

MongoDBAtlasFullTextRetriever.run_async

@component.output_types(documents=List[Document])
async def run_async(query: Union[str, List[str]],
                    fuzzy: Optional[Dict[str, int]] = None,
                    match_criteria: Optional[Literal["any", "all"]] = None,
                    score: Optional[Dict[str, Dict]] = None,
                    synonyms: Optional[str] = None,
                    filters: Optional[Dict[str, Any]] = None,
                    top_k: int = 10) -> Dict[str, List[Document]]

Asynchronously retrieve documents from the MongoDBAtlasDocumentStore by full-text search.

Arguments:

query: The query string or a list of query strings to search for. If the query contains multiple terms, Atlas Search evaluates each term separately for matches.
fuzzy: Enables finding strings similar to the search term(s). Note, fuzzy cannot be used with synonyms. Configurable options include maxEdits, prefixLength, and maxExpansions. For more details refer to MongoDB Atlas documentation.
match_criteria: Defines how terms in the query are matched. Supported options are "any" and "all". For more details refer to MongoDB Atlas documentation.
score: Specifies the scoring method for matching results. Supported options include boost, constant, and function. For more details refer to MongoDB Atlas documentation.
synonyms: The name of the synonym mapping definition in the index. This value cannot be an empty string. Note, synonyms can not be used with fuzzy.
filters: Filters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy chosen at retriever initialization. See init method docstring for more details.
top_k: Maximum number of Documents to return. Overrides the value specified at initialization.

Returns:

A dictionary with the following keys:

documents: List of Documents most similar to the given query

Module haystack_integrations.document_stores.mongodb_atlas.document_store

MongoDBAtlasDocumentStore

MongoDBAtlasDocumentStore.__init__

MongoDBAtlasDocumentStore.__del__

MongoDBAtlasDocumentStore.to_dict

MongoDBAtlasDocumentStore.from_dict

MongoDBAtlasDocumentStore.count_documents

MongoDBAtlasDocumentStore.count_documents_async

MongoDBAtlasDocumentStore.filter_documents

MongoDBAtlasDocumentStore.filter_documents_async

MongoDBAtlasDocumentStore.write_documents

MongoDBAtlasDocumentStore.write_documents_async

MongoDBAtlasDocumentStore.delete_documents

MongoDBAtlasDocumentStore.delete_documents_async

Module haystack_integrations.components.retrievers.mongodb_atlas.embedding_retriever

MongoDBAtlasEmbeddingRetriever

MongoDBAtlasEmbeddingRetriever.__init__

MongoDBAtlasEmbeddingRetriever.to_dict

MongoDBAtlasEmbeddingRetriever.from_dict

MongoDBAtlasEmbeddingRetriever.run

MongoDBAtlasEmbeddingRetriever.run_async

Module haystack_integrations.components.retrievers.mongodb_atlas.full_text_retriever

MongoDBAtlasFullTextRetriever

MongoDBAtlasFullTextRetriever.__init__

MongoDBAtlasFullTextRetriever.to_dict

MongoDBAtlasFullTextRetriever.from_dict

MongoDBAtlasFullTextRetriever.run

MongoDBAtlasFullTextRetriever.run_async

MongoDBAtlasDocumentStore.init

MongoDBAtlasDocumentStore.del

MongoDBAtlasEmbeddingRetriever.init

MongoDBAtlasFullTextRetriever.init