ChromaDocumentStore
Chroma is an open source vector database capable of storing collections of documents along with their metadata, creating embeddings for documents and queries, and searching the collections filtering by document metadata or content. Additionally, Chroma supports multi-modal embedding functions.
Chroma can be used in-memory, as an embedded database, or in a client-server fashion. When running in-memory, Chroma can still keep its contents on disk across different sessions. This allows users to quickly put together prototypes using the in-memory version and later move to production, where the client-server version is deployed.
At the moment Haystack only supports using Chroma in-memory, without storing data across different sessions.
Initialization
First, install the Chroma integration, which will install Haystack and Chroma if they are not already present. The following command is all you need to start:
pip install chroma-haystack
To store data in Chroma, create a ChromaDocumentStore
instance and write Documents with:
from haystack_integrations.document_stores.chroma import ChromaDocumentStore
from haystack import Document
document_store = ChromaDocumentStore()
document_store.write_documents([
Document(content="This is the first document."),
Document(content="This is the second document.")
])
print(document_store.count_documents())
In this case, since we didn’t pass any embeddings along with our documents, Chroma will create them for us using its default embedding function.
Supported Retrievers
The Haystack Chroma integration comes with three Retriever components. They all rely on the Chroma query API, but they have different inputs and outputs so that you can pick the one that best fits your Pipeline:
ChromaQueryTextRetriever
: This Retriever takes a plain-text query string in input and returns a list of matching documents. Chroma will create the embeddings for the query using its default embedding function.ChromaEmbeddingRetriever
: This Retriever takes the embeddings of a single query in input and returns a list of matching Documents. The query needs to be embedded before being passed to this component. For example, you can use an embedder component.
Updated 9 months ago