Name	ChromaQueryTextRetriever
Path	<https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/chroma>
Most common Position in a Pipeline	Before a `PromptBuilder` in a RAG Pipeline The last component in a semantic search Pipeline Before an `ExtractiveReader` in an ExtractiveQA Pipeline
Mandatory Input variables	“query”: a single query in plain-text format to be processed by the Retriever
Output variables	“documents”: a list of Documents

Overview

The ChromaQueryTextRetriever is an embedding-based Retriever compatible with the ChromaDocumentStore that uses the Chroma query API.
This component takes a plain-text query string in input and returns the matching documents.
Chroma will create the embedding for the query using its embedding function; in case you do not want to use the default embedding function, this must be specified at ChromaDocumentStore initialization.

Usage

On its own

This Retriever needs the ChromaDocumentStore and indexed Documents to run.

from haystack_integrations.document_stores.chroma import ChromaDocumentStore
from haystack_integrations.components.retrievers.chroma import ChromaQueryTextRetriever

document_store = ChromaDocumentStore()

retriever = ChromaQueryTextRetriever(document_store=document_store)

# example run query
retriever.run(query = "How does Chroma Retriever work?")

In a Pipeline

Here is how you could use the ChromaQueryTextRetriever in a Pipeline. In this example, you would create two Pipelines: an indexing one and a querying one.

In the indexing Pipeline, the Documents are written into the Document Store.

Then, in the querying Pipeline, ChromaQueryTextRetriever gets the answer from the Document Store based on the provided query.

import os
from pathlib import Path

from haystack import Pipeline
from haystack.dataclasses import Document
from haystack.components.writers import DocumentWriter

from haystack_integrations.document_stores.chroma import ChromaDocumentStore
from haystack_integrations.components.retrievers.chroma import ChromaQueryTextRetriever

# Chroma is used in-memory so we use the same instances in the two pipelines below
document_store = ChromaDocumentStore()

documents = [
    Document(content="This contains variable declarations", meta={"title": "one"}),
    Document(content="This contains another sort of variable declarations", meta={"title": "two"}),
    Document(content="This has nothing to do with variable declarations", meta={"title": "three"}),
    Document(content="A random doc", meta={"title": "four"}),
]

indexing = Pipeline()
indexing.add_component("writer", DocumentWriter(document_store))
indexing.run({"writer": {"documents": documents}})

querying = Pipeline()
querying.add_component("retriever", ChromaQueryTextRetriever(document_store))
results = querying.run({"retriever": {"query": "Variable declarations", "top_k": 3}})

for d in results["retriever"]["documents"]:
    print(d.meta, d.score)