DocumentationAPI ReferenceπŸ““ TutorialsπŸ§‘β€πŸ³ Cookbook🀝 IntegrationsπŸ’œ Discord

MongoDB Atlas integration for Haystack

Module haystack_integrations.document_stores.mongodb_atlas.document_store

MongoDBAtlasDocumentStore

MongoDBAtlasDocumentStore is a DocumentStore implementation that uses MongoDB Atlas service that is easy to deploy, operate, and scale.

To connect to MongoDB Atlas, you need to provide a connection string in the format: "mongodb+srv://{mongo_atlas_username}:{mongo_atlas_password}@{mongo_atlas_host}/?{mongo_atlas_params_string}".

This connection string can be obtained on the MongoDB Atlas Dashboard by clicking on the CONNECT button, selecting Python as the driver, and copying the connection string. The connection string can be provided as an environment variable MONGO_CONNECTION_STRING or directly as a parameter to the MongoDBAtlasDocumentStore constructor.

After providing the connection string, you'll need to specify the database_name and collection_name to use. Most likely that you'll create these via the MongoDB Atlas web UI but one can also create them via the MongoDB Python driver. Creating databases and collections is beyond the scope of MongoDBAtlasDocumentStore. The primary purpose of this document store is to read and write documents to an existing collection.

The last parameter users needs to provide is a vector_search_index - used for vector search operations. This index can support a chosen metric (i.e. cosine, dot product, or euclidean) and can be created in the Atlas web UI.

For more details on MongoDB Atlas, see the official MongoDB Atlas documentation.

Usage example:

from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore

store = MongoDBAtlasDocumentStore(database_name="your_existing_db",
                                  collection_name="your_existing_collection",
                                  vector_search_index="your_existing_index")
print(store.count_documents())

MongoDBAtlasDocumentStore.__init__

def __init__(*,
             mongo_connection_string: Secret = Secret.from_env_var(
                 "MONGO_CONNECTION_STRING"),
             database_name: str,
             collection_name: str,
             vector_search_index: str)

Creates a new MongoDBAtlasDocumentStore instance.

Arguments:

  • mongo_connection_string: MongoDB Atlas connection string in the format: "mongodb+srv://{mongo_atlas_username}:{mongo_atlas_password}@{mongo_atlas_host}/?{mongo_atlas_params_string}". This can be obtained on the MongoDB Atlas Dashboard by clicking on the CONNECT button. This value will be read automatically from the env var "MONGO_CONNECTION_STRING".
  • database_name: Name of the database to use.
  • collection_name: Name of the collection to use. To use this document store for embedding retrieval, this collection needs to have a vector search index set up on the embedding field.
  • vector_search_index: The name of the vector search index to use for vector search operations. Create a vector_search_index in the Atlas web UI and specify the init params of MongoDBAtlasDocumentStore. For more details refer to MongoDB Atlas documentation.

Raises:

  • ValueError: If the collection name contains invalid characters.

MongoDBAtlasDocumentStore.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

MongoDBAtlasDocumentStore.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "MongoDBAtlasDocumentStore"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

MongoDBAtlasDocumentStore.count_documents

def count_documents() -> int

Returns how many documents are present in the document store.

Returns:

The number of documents in the document store.

MongoDBAtlasDocumentStore.filter_documents

def filter_documents(
        filters: Optional[Dict[str, Any]] = None) -> List[Document]

Returns the documents that match the filters provided.

For a detailed specification of the filters, refer to the Haystack documentation.

Arguments:

  • filters: The filters to apply. It returns only the documents that match the filters.

Returns:

A list of Documents that match the given filters.

MongoDBAtlasDocumentStore.write_documents

def write_documents(documents: List[Document],
                    policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int

Writes documents into the MongoDB Atlas collection.

Arguments:

  • documents: A list of Documents to write to the document store.
  • policy: The duplicate policy to use when writing documents.

Raises:

  • DuplicateDocumentError: If a document with the same ID already exists in the document store and the policy is set to DuplicatePolicy.FAIL (or not specified).
  • ValueError: If the documents are not of type Document.

Returns:

The number of documents written to the document store.

MongoDBAtlasDocumentStore.delete_documents

def delete_documents(document_ids: List[str]) -> None

Deletes all documents with a matching document_ids from the document store.

Arguments:

  • document_ids: the document ids to delete

Module haystack_integrations.components.retrievers.mongodb_atlas.embedding_retriever

MongoDBAtlasEmbeddingRetriever

Retrieves documents from the MongoDBAtlasDocumentStore by embedding similarity.

The similarity is dependent on the vector_search_index used in the MongoDBAtlasDocumentStore and the chosen metric during the creation of the index (i.e. cosine, dot product, or euclidean). See MongoDBAtlasDocumentStore for more information.

Usage example:

import numpy as np
from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore
from haystack_integrations.components.retrievers.mongodb_atlas import MongoDBAtlasEmbeddingRetriever

store = MongoDBAtlasDocumentStore(database_name="haystack_integration_test",
                                  collection_name="test_embeddings_collection",
                                  vector_search_index="cosine_index")
retriever = MongoDBAtlasEmbeddingRetriever(document_store=store)

results = retriever.run(query_embedding=np.random.random(768).tolist())
print(results["documents"])

The example above retrieves the 10 most similar documents to a random query embedding from the MongoDBAtlasDocumentStore. Note that dimensions of the query_embedding must match the dimensions of the embeddings stored in the MongoDBAtlasDocumentStore.

MongoDBAtlasEmbeddingRetriever.__init__

def __init__(*,
             document_store: MongoDBAtlasDocumentStore,
             filters: Optional[Dict[str, Any]] = None,
             top_k: int = 10)

Create the MongoDBAtlasDocumentStore component.

Arguments:

  • document_store: An instance of MongoDBAtlasDocumentStore.
  • filters: Filters applied to the retrieved Documents. Make sure that the fields used in the filters are included in the configuration of the vector_search_index. The configuration must be done manually in the Web UI of MongoDB Atlas.
  • top_k: Maximum number of Documents to return.

Raises:

  • ValueError: If document_store is not an instance of MongoDBAtlasDocumentStore.

MongoDBAtlasEmbeddingRetriever.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

MongoDBAtlasEmbeddingRetriever.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "MongoDBAtlasEmbeddingRetriever"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

MongoDBAtlasEmbeddingRetriever.run

@component.output_types(documents=List[Document])
def run(query_embedding: List[float],
        filters: Optional[Dict[str, Any]] = None,
        top_k: Optional[int] = None) -> Dict[str, List[Document]]

Retrieve documents from the MongoDBAtlasDocumentStore, based on the provided embedding similarity.

Arguments:

  • query_embedding: Embedding of the query.
  • filters: Filters applied to the retrieved Documents. Overrides the value specified at initialization.
  • top_k: Maximum number of Documents to return. Overrides the value specified at initialization.

Returns:

A dictionary with the following keys:

  • documents: List of Documents most similar to the given query_embedding