MongoDB Atlas integration for Haystack
Module haystack_integrations.document_stores.mongodb_atlas.document_store
MongoDBAtlasDocumentStore
MongoDBAtlasDocumentStore is a DocumentStore implementation that uses MongoDB Atlas service that is easy to deploy, operate, and scale.
To connect to MongoDB Atlas, you need to provide a connection string in the format:
"mongodb+srv://{mongo_atlas_username}:{mongo_atlas_password}@{mongo_atlas_host}/?{mongo_atlas_params_string}"
.
This connection string can be obtained on the MongoDB Atlas Dashboard by clicking on the CONNECT
button, selecting
Python as the driver, and copying the connection string. The connection string can be provided as an environment
variable MONGO_CONNECTION_STRING
or directly as a parameter to the MongoDBAtlasDocumentStore
constructor.
After providing the connection string, you'll need to specify the database_name
and collection_name
to use.
Most likely that you'll create these via the MongoDB Atlas web UI but one can also create them via the MongoDB
Python driver. Creating databases and collections is beyond the scope of MongoDBAtlasDocumentStore. The primary
purpose of this document store is to read and write documents to an existing collection.
The last parameter users needs to provide is a vector_search_index
- used for vector search operations. This index
can support a chosen metric (i.e. cosine, dot product, or euclidean) and can be created in the Atlas web UI.
For more details on MongoDB Atlas, see the official MongoDB Atlas documentation.
Usage example:
from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore
store = MongoDBAtlasDocumentStore(database_name="your_existing_db",
collection_name="your_existing_collection",
vector_search_index="your_existing_index")
print(store.count_documents())
MongoDBAtlasDocumentStore.__init__
def __init__(*,
mongo_connection_string: Secret = Secret.from_env_var(
"MONGO_CONNECTION_STRING"),
database_name: str,
collection_name: str,
vector_search_index: str)
Creates a new MongoDBAtlasDocumentStore instance.
Arguments:
mongo_connection_string
: MongoDB Atlas connection string in the format:"mongodb+srv://{mongo_atlas_username}:{mongo_atlas_password}@{mongo_atlas_host}/?{mongo_atlas_params_string}"
. This can be obtained on the MongoDB Atlas Dashboard by clicking on theCONNECT
button. This value will be read automatically from the env var "MONGO_CONNECTION_STRING".database_name
: Name of the database to use.collection_name
: Name of the collection to use. To use this document store for embedding retrieval, this collection needs to have a vector search index set up on theembedding
field.vector_search_index
: The name of the vector search index to use for vector search operations. Create a vector_search_index in the Atlas web UI and specify the init params of MongoDBAtlasDocumentStore. For more details refer to MongoDB Atlas documentation.
Raises:
ValueError
: If the collection name contains invalid characters.
MongoDBAtlasDocumentStore.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
MongoDBAtlasDocumentStore.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "MongoDBAtlasDocumentStore"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
MongoDBAtlasDocumentStore.count_documents
def count_documents() -> int
Returns how many documents are present in the document store.
Returns:
The number of documents in the document store.
MongoDBAtlasDocumentStore.filter_documents
def filter_documents(
filters: Optional[Dict[str, Any]] = None) -> List[Document]
Returns the documents that match the filters provided.
For a detailed specification of the filters, refer to the Haystack documentation.
Arguments:
filters
: The filters to apply. It returns only the documents that match the filters.
Returns:
A list of Documents that match the given filters.
MongoDBAtlasDocumentStore.write_documents
def write_documents(documents: List[Document],
policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int
Writes documents into the MongoDB Atlas collection.
Arguments:
documents
: A list of Documents to write to the document store.policy
: The duplicate policy to use when writing documents.
Raises:
DuplicateDocumentError
: If a document with the same ID already exists in the document store and the policy is set to DuplicatePolicy.FAIL (or not specified).ValueError
: If the documents are not of type Document.
Returns:
The number of documents written to the document store.
MongoDBAtlasDocumentStore.delete_documents
def delete_documents(document_ids: List[str]) -> None
Deletes all documents with a matching document_ids from the document store.
Arguments:
document_ids
: the document ids to delete
Module haystack_integrations.components.retrievers.mongodb_atlas.embedding_retriever
MongoDBAtlasEmbeddingRetriever
Retrieves documents from the MongoDBAtlasDocumentStore by embedding similarity.
The similarity is dependent on the vector_search_index used in the MongoDBAtlasDocumentStore and the chosen metric during the creation of the index (i.e. cosine, dot product, or euclidean). See MongoDBAtlasDocumentStore for more information.
Usage example:
import numpy as np
from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore
from haystack_integrations.components.retrievers.mongodb_atlas import MongoDBAtlasEmbeddingRetriever
store = MongoDBAtlasDocumentStore(database_name="haystack_integration_test",
collection_name="test_embeddings_collection",
vector_search_index="cosine_index")
retriever = MongoDBAtlasEmbeddingRetriever(document_store=store)
results = retriever.run(query_embedding=np.random.random(768).tolist())
print(results["documents"])
The example above retrieves the 10 most similar documents to a random query embedding from the MongoDBAtlasDocumentStore. Note that dimensions of the query_embedding must match the dimensions of the embeddings stored in the MongoDBAtlasDocumentStore.
MongoDBAtlasEmbeddingRetriever.__init__
def __init__(*,
document_store: MongoDBAtlasDocumentStore,
filters: Optional[Dict[str, Any]] = None,
top_k: int = 10,
filter_policy: Union[str, FilterPolicy] = FilterPolicy.REPLACE)
Create the MongoDBAtlasDocumentStore component.
Arguments:
document_store
: An instance of MongoDBAtlasDocumentStore.filters
: Filters applied to the retrieved Documents. Make sure that the fields used in the filters are included in the configuration of thevector_search_index
. The configuration must be done manually in the Web UI of MongoDB Atlas.top_k
: Maximum number of Documents to return.filter_policy
: Policy to determine how filters are applied.
Raises:
ValueError
: Ifdocument_store
is not an instance ofMongoDBAtlasDocumentStore
.
MongoDBAtlasEmbeddingRetriever.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
MongoDBAtlasEmbeddingRetriever.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "MongoDBAtlasEmbeddingRetriever"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
MongoDBAtlasEmbeddingRetriever.run
@component.output_types(documents=List[Document])
def run(query_embedding: List[float],
filters: Optional[Dict[str, Any]] = None,
top_k: Optional[int] = None) -> Dict[str, List[Document]]
Retrieve documents from the MongoDBAtlasDocumentStore, based on the provided embedding similarity.
Arguments:
query_embedding
: Embedding of the query.filters
: Filters applied to the retrieved Documents. The way runtime filters are applied depends on thefilter_policy
chosen at retriever initialization. See init method docstring for more details.top_k
: Maximum number of Documents to return. Overrides the value specified at initialization.
Returns:
A dictionary with the following keys:
documents
: List of Documents most similar to the givenquery_embedding