ChromaQueryTextRetriever
This is a a Retriever compatible with the Chroma Document Store.
Name | ChromaQueryTextRetriever |
Path | <https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/chroma> |
Most common Position in a Pipeline | 1. Before a PromptBuilder in a RAG Pipeline2. The last component in a semantic search Pipeline 3. Before an ExtractiveReader in an ExtractiveQA Pipeline |
Mandatory Input variables | “query”: a single query in plain-text format to be processed by the Retriever |
Output variables | “documents”: a list of Documents |
Overview
The ChromaQueryTextRetriever
is an embedding-based Retriever compatible with the ChromaDocumentStore
that uses the Chroma query API.
This component takes a plain-text query string in input and returns the matching documents.
Chroma will create the embedding for the query using its embedding function; in case you do not want to use the default embedding function, this must be specified at ChromaDocumentStore
initialization.
Usage
On its own
This Retriever needs the ChromaDocumentStore
and indexed Documents to run.
from haystack_integrations.document_stores.chroma import ChromaDocumentStore
from haystack_integrations.components.retrievers.chroma import ChromaQueryTextRetriever
document_store = ChromaDocumentStore()
retriever = ChromaQueryTextRetriever(document_store=document_store)
# example run query
retriever.run(query = "How does Chroma Retriever work?")
In a Pipeline
Here is how you could use the ChromaQueryTextRetriever
in a Pipeline. In this example, you would create two Pipelines: an indexing one and a querying one.
In the indexing Pipeline, the Documents are written into the Document Store.
Then, in the querying Pipeline, ChromaQueryTextRetriever
gets the answer from the Document Store based on the provided query.
import os
from pathlib import Path
from haystack import Pipeline
from haystack.dataclasses import Document
from haystack.components.writers import DocumentWriter
from haystack_integrations.document_stores.chroma import ChromaDocumentStore
from haystack_integrations.components.retrievers.chroma import ChromaQueryTextRetriever
# Chroma is used in-memory so we use the same instances in the two pipelines below
document_store = ChromaDocumentStore()
documents = [
Document(content="This contains variable declarations", meta={"title": "one"}),
Document(content="This contains another sort of variable declarations", meta={"title": "two"}),
Document(content="This has nothing to do with variable declarations", meta={"title": "three"}),
Document(content="A random doc", meta={"title": "four"}),
]
indexing = Pipeline()
indexing.add_component("writer", DocumentWriter(document_store))
indexing.run({"writer": {"documents": documents}})
querying = Pipeline()
querying.add_component("retriever", ChromaQueryTextRetriever(document_store))
results = querying.run({"retriever": {"query": "Variable declarations", "top_k": 3}})
for d in results["retriever"]["documents"]:
print(d.meta, d.score)
Updated 8 months ago