DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio (Waitlist)
Documentation

ChromaQueryTextRetriever

This is a a Retriever compatible with the Chroma Document Store.

NameChromaQueryTextRetriever
Sourcehttps://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/chroma
Most common position in a pipeline1. After a Text Embedder and before a PromptBuilder in a RAG pipeline 2. The last component in the semantic search pipeline 3. After a Text Embedder and before an ExtractiveReader in an extractive QA pipeline
Mandatory input variables“query”: A single query in plain-text format to be processed by the Retriever
Output variables“documents”: A list of documents

Overview

The ChromaQueryTextRetriever is an embedding-based Retriever compatible with the ChromaDocumentStore that uses the Chroma query API.
This component takes a plain-text query string in input and returns the matching documents.
Chroma will create the embedding for the query using its embedding function; in case you do not want to use the default embedding function, this must be specified at ChromaDocumentStore initialization.

Usage

On its own

This Retriever needs the ChromaDocumentStore and indexed documents to run.

from haystack_integrations.document_stores.chroma import ChromaDocumentStore
from haystack_integrations.components.retrievers.chroma import ChromaQueryTextRetriever

document_store = ChromaDocumentStore()

retriever = ChromaQueryTextRetriever(document_store=document_store)

# example run query
retriever.run(query = "How does Chroma Retriever work?")

In a pipeline

Here is how you could use the ChromaQueryTextRetriever in a Pipeline. In this example, you would create two pipelines: an indexing one and a querying one.

In the indexing pipeline, the documents are written in the Document Store.

Then, in the querying pipeline, ChromaQueryTextRetriever gets the answer from the Document Store based on the provided query.

import os
from pathlib import Path

from haystack import Pipeline
from haystack.dataclasses import Document
from haystack.components.writers import DocumentWriter

from haystack_integrations.document_stores.chroma import ChromaDocumentStore
from haystack_integrations.components.retrievers.chroma import ChromaQueryTextRetriever

# Chroma is used in-memory so we use the same instances in the two pipelines below
document_store = ChromaDocumentStore()

documents = [
    Document(content="This contains variable declarations", meta={"title": "one"}),
    Document(content="This contains another sort of variable declarations", meta={"title": "two"}),
    Document(content="This has nothing to do with variable declarations", meta={"title": "three"}),
    Document(content="A random doc", meta={"title": "four"}),
]

indexing = Pipeline()
indexing.add_component("writer", DocumentWriter(document_store))
indexing.run({"writer": {"documents": documents}})

querying = Pipeline()
querying.add_component("retriever", ChromaQueryTextRetriever(document_store))
results = querying.run({"retriever": {"query": "Variable declarations", "top_k": 3}})

for d in results["retriever"]["documents"]:
    print(d.meta, d.score)

Related Links

Check out the API reference in the GitHub repo or in our docs: