Version: 2.31

SupabaseGroongaBM25Retriever

A full-text Retriever that fetches documents from the SupabaseGroongaDocumentStore using PGroonga search.


Most common position in a pipeline	1. Before a `PromptBuilder` in a RAG pipeline 2. The last component in the full-text search pipeline
Mandatory init variables	`document_store`: An instance of a SupabaseGroongaDocumentStore
Mandatory run variables	`query`: A string
Output variables	`documents`: A list of documents (matching the query)
API reference	Supabase
GitHub link	https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/supabase
Package name	`supabase-haystack`

Overview

SupabaseGroongaBM25Retriever retrieves Documents from the SupabaseGroongaDocumentStore using PGroonga, a PostgreSQL extension for fast, multilingual full-text search.

Unlike embedding-based retrievers, this Retriever works with plain text queries and requires no embeddings. It supports a wide range of languages out of the box through PGroonga's multilingual indexing capabilities.

The Retriever can be combined with SupabasePgvectorEmbeddingRetriever and a DocumentJoiner for hybrid search pipelines that take advantage of both keyword and semantic retrieval. You can also use of the Smart Pipeline Connections and skip the DocumentJoiner if you want to combine the results of both retrievers in a RAG pipeline.

In addition to query, the Retriever accepts optional parameters including top_k (the maximum number of Documents to retrieve) and filters to narrow the search space.

Prerequisites

PGroonga must be enabled in your Supabase project. Run the following SQL in the Supabase SQL editor:

sql

CREATE EXTENSION IF NOT EXISTS pgroonga;

You also need to create a SQL function that PGroonga uses for search. See the integration README for the required function definition.

Installation

shell

pip install supabase-haystack

Usage

On its own

This Retriever needs the SupabaseGroongaDocumentStore and indexed Documents to run.

Set the SUPABASE_URL and SUPABASE_SERVICE_KEY environment variables for your Supabase project.

python

from haystack_integrations.document_stores.supabase import SupabaseGroongaDocumentStore
from haystack_integrations.components.retrievers.supabase import (
    SupabaseGroongaBM25Retriever,
)
from haystack.utils import Secret

document_store = SupabaseGroongaDocumentStore(
    supabase_url="https://<project-ref>.supabase.co",
    supabase_key=Secret.from_env_var("SUPABASE_SERVICE_KEY"),
    table_name="haystack_groonga_documents",
)

retriever = SupabaseGroongaBM25Retriever(document_store=document_store)

retriever.run(query="my nice query")

In a RAG pipeline

The prerequisites for running this code are:

Set an environment variable OPENAI_API_KEY with your OpenAI API key.
Set an environment variable SUPABASE_SERVICE_KEY with your Supabase service role key.

python

from haystack import Document, Pipeline
from haystack.components.builders.answer_builder import AnswerBuilder
from haystack.components.builders import ChatPromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack.document_stores.types import DuplicatePolicy
from haystack.utils import Secret

from haystack_integrations.document_stores.supabase import SupabaseGroongaDocumentStore
from haystack_integrations.components.retrievers.supabase import (
    SupabaseGroongaBM25Retriever,
)

document_store = SupabaseGroongaDocumentStore(
    supabase_url="https://<project-ref>.supabase.co",
    supabase_key=Secret.from_env_var("SUPABASE_SERVICE_KEY"),
    table_name="haystack_groonga_documents",
)

documents = [
    Document(content="There are over 7,000 languages spoken around the world today."),
    Document(
        content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors.",
    ),
    Document(
        content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.",
    ),
]

document_store.write_documents(documents=documents, policy=DuplicatePolicy.SKIP)

prompt_template = [
    ChatMessage.from_user(
        "Given these documents, answer the question.\nDocuments:\n"
        "{% for doc in documents %}{{ doc.content }}{% endfor %}\n"
        "Question: {{question}}\nAnswer:",
    ),
]

retriever = SupabaseGroongaBM25Retriever(document_store=document_store)
rag_pipeline = Pipeline()
rag_pipeline.add_component(name="retriever", instance=retriever)
rag_pipeline.add_component(
    instance=ChatPromptBuilder(
        template=prompt_template,
        required_variables={"question", "documents"},
    ),
    name="prompt_builder",
)
rag_pipeline.add_component(instance=OpenAIChatGenerator(), name="llm")
rag_pipeline.add_component(instance=AnswerBuilder(), name="answer_builder")
rag_pipeline.connect("retriever", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder.prompt", "llm.messages")
rag_pipeline.connect("llm.replies", "answer_builder.replies")
rag_pipeline.connect("retriever", "answer_builder.documents")

question = "languages spoken around the world today"
result = rag_pipeline.run(
    {
        "retriever": {"query": question},
        "prompt_builder": {"question": question},
        "answer_builder": {"query": question},
    },
)
print(result["answer_builder"])

Overview​

Prerequisites​

Installation​

Usage​

On its own​

In a RAG pipeline​