SupabaseDocumentStore
| API reference | Supabase |
| GitHub link | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/supabase/ |
Supabase is an open-source backend platform built on PostgreSQL. The Supabase integration for Haystack provides two document stores:
SupabasePgvectorDocumentStore— vector similarity search using the pgvector PostgreSQL extension, which comes pre-installed on Supabase.SupabaseGroongaDocumentStore— multilingual full-text search using the PGroonga PostgreSQL extension. No embeddings required.
Installation
SupabasePgvectorDocumentStore
SupabasePgvectorDocumentStore is a thin wrapper around PgvectorDocumentStore with Supabase-specific defaults:
- Reads the connection string from the
SUPABASE_DB_URLenvironment variable. - Defaults
create_extensiontoFalsesince pgvector is pre-installed on Supabase.
Connection
Set the SUPABASE_DB_URL environment variable with your Supabase database connection string.
Supabase offers two pooler ports: transaction mode (port 6543) and session mode (port 5432). For best compatibility with pgvector operations, use session mode or a direct connection.
Initialization
from haystack_integrations.document_stores.supabase import SupabasePgvectorDocumentStore
document_store = SupabasePgvectorDocumentStore(
embedding_dimension=768,
vector_function="cosine_similarity",
recreate_table=True,
)
To learn more about the initialization parameters, see the API docs.
Supported Retrievers
SupabasePgvectorEmbeddingRetriever: Fetches documents from the store based on a query embedding.SupabasePgvectorKeywordRetriever: Fetches documents matching a keyword query using PostgreSQL'sts_rank_cdranking.
Example: RAG pipeline
from haystack import Document, Pipeline
from haystack.document_stores.types.policy import DuplicatePolicy
from haystack.components.embedders import (
SentenceTransformersTextEmbedder,
SentenceTransformersDocumentEmbedder,
)
from haystack.components.builders import ChatPromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack.utils import Secret
from haystack_integrations.document_stores.supabase import SupabasePgvectorDocumentStore
from haystack_integrations.components.retrievers.supabase import (
SupabasePgvectorEmbeddingRetriever,
)
document_store = SupabasePgvectorDocumentStore(
embedding_dimension=768,
vector_function="cosine_similarity",
recreate_table=True,
)
# Index documents
documents = [
Document(content="There are over 7,000 languages spoken around the world today."),
Document(
content="Elephants have been observed to behave in a way that indicates a high level of self-awareness.",
),
Document(
content="In certain places, you can witness the phenomenon of bioluminescent waves.",
),
]
embedder = SentenceTransformersDocumentEmbedder()
documents_with_embeddings = embedder.run(documents)
document_store.write_documents(
documents_with_embeddings["documents"],
policy=DuplicatePolicy.OVERWRITE,
)
# Query pipeline
prompt_template = [
ChatMessage.from_system("Answer the question based on the provided context."),
ChatMessage.from_user(
"Query: {{query}}\nDocuments:\n{% for doc in documents %}{{ doc.content }}\n{% endfor %}\nAnswer:",
),
]
query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
query_pipeline.add_component(
"retriever",
SupabasePgvectorEmbeddingRetriever(document_store=document_store),
)
query_pipeline.add_component(
"prompt_builder",
ChatPromptBuilder(
template=prompt_template,
required_variables=["query", "documents"],
),
)
query_pipeline.add_component("generator", OpenAIChatGenerator(model="gpt-4o"))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
query_pipeline.connect("retriever.documents", "prompt_builder.documents")
query_pipeline.connect("prompt_builder.prompt", "generator.messages")
result = query_pipeline.run(
{
"text_embedder": {"text": "How many languages are there?"},
"prompt_builder": {"query": "How many languages are there?"},
},
)
SupabaseGroongaDocumentStore
SupabaseGroongaDocumentStore uses PGroonga, a PostgreSQL extension for fast, multilingual full-text search. Unlike the pgvector store, it works with plain text queries and requires no embeddings.
Prerequisites
PGroonga must be enabled in your Supabase project. Run the following SQL in the Supabase SQL editor:
You also need to create a SQL function that PGroonga uses for search. See the integration README for the required function definition.
Initialization
from haystack_integrations.document_stores.supabase import SupabaseGroongaDocumentStore
from haystack.utils import Secret
document_store = SupabaseGroongaDocumentStore(
supabase_url="https://<project-ref>.supabase.co",
supabase_key=Secret.from_env_var("SUPABASE_SERVICE_KEY"),
table_name="haystack_groonga_documents",
)
document_store.warm_up()
warm_up() must be called before using the store. It initializes the Supabase client and creates the table and PGroonga index if they don't exist.
To learn more about the initialization parameters, see the API docs.
Supported Retrievers
SupabaseGroongaBM25Retriever: Retrieves documents using PGroonga full-text search. Works without embeddings and can be combined withSupabasePgvectorEmbeddingRetrieverfor hybrid search pipelines.