Version: 2.26

Supabase

haystack_integrations.components.downloaders.supabase.supabase_bucket_downloader

SupabaseBucketDownloader

Downloads files from a Supabase Storage bucket and returns them as ByteStream objects.

Files are downloaded in-memory and returned as ByteStream objects ready for further processing in indexing pipelines (e.g. passing to a DocumentConverter).

Example usage:

python

from haystack_integrations.components.downloaders.supabase import SupabaseBucketDownloader
from haystack.utils import Secret

downloader = SupabaseBucketDownloader(
    supabase_url="https://<project-ref>.supabase.co",
    supabase_key=Secret.from_env_var("SUPABASE_SERVICE_KEY"),
    bucket_name="my-documents",
)
result = downloader.run(sources=["reports/report.pdf", "data/notes.txt"])
streams = result["streams"]

init

python

__init__(
    *,
    supabase_url: str,
    supabase_key: Secret = Secret.from_env_var("SUPABASE_SERVICE_KEY"),
    bucket_name: str,
    file_extensions: list[str] | None = None
) -> None

Creates a new SupabaseBucketDownloader instance.

Parameters:

supabase_url (str) – The URL of your Supabase project, e.g. https://<project-ref>.supabase.co.
supabase_key (Secret) – The Supabase API key used to authenticate requests. Defaults to the SUPABASE_SERVICE_KEY environment variable. Use the service role key for private buckets.
bucket_name (str) – The name of the Supabase Storage bucket to download files from.
file_extensions (list[str] | None) – Optional list of file extensions to filter downloads (e.g. [".pdf", ".txt"]). If None, all files are downloaded. Extensions are matched case-insensitively.

warm_up

python

warm_up() -> None

Initializes the Supabase client.

Called automatically on the first run(), or can be called explicitly in a pipeline.

run

python

run(sources: list[str]) -> dict[str, list[ByteStream]]

Downloads files from the Supabase Storage bucket.

Parameters:

sources (list[str]) – List of file paths within the bucket to download, e.g. ["folder/file.pdf", "notes.txt"].

Returns:

dict[str, list[ByteStream]] – A dictionary with:
streams: list of ByteStream objects, one per successfully downloaded file. Each ByteStream has meta["file_path"] and meta["bucket_name"] set.

to_dict

python

to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

dict[str, Any] – Dictionary with serialized data.

from_dict

python

from_dict(data: dict[str, Any]) -> SupabaseBucketDownloader

Deserializes the component from a dictionary.

Parameters:

data (dict[str, Any]) – Dictionary to deserialize from.

Returns:

SupabaseBucketDownloader – Deserialized component.

haystack_integrations.components.retrievers.supabase.embedding_retriever

SupabasePgvectorEmbeddingRetriever

Bases: PgvectorEmbeddingRetriever

Retrieves documents from the SupabasePgvectorDocumentStore, based on their dense embeddings.

This is a thin wrapper around PgvectorEmbeddingRetriever, adapted for use with SupabasePgvectorDocumentStore.

Example usage:

Set an environment variable `SUPABASE_DB_URL` with the connection string to your Supabase database.

bash

export SUPABASE_DB_URL=postgresql://postgres:postgres@localhost:5432/postgres

python

from haystack import Document, Pipeline
from haystack.document_stores.types.policy import DuplicatePolicy
from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder

from haystack_integrations.document_stores.supabase import SupabasePgvectorDocumentStore
from haystack_integrations.components.retrievers.supabase import SupabasePgvectorEmbeddingRetriever

document_store = SupabasePgvectorDocumentStore(
    embedding_dimension=768,
    vector_function="cosine_similarity",
    recreate_table=True,
)

documents = [Document(content="There are over 7,000 languages spoken around the world today."),
             Document(content="Elephants have been observed to behave in a way that indicates..."),
             Document(content="In certain places, you can witness the phenomenon of bioluminescent waves.")]

document_embedder = SentenceTransformersDocumentEmbedder()
document_embedder.warm_up()
documents_with_embeddings = document_embedder.run(documents)
document_store.write_documents(documents_with_embeddings.get("documents"), policy=DuplicatePolicy.OVERWRITE)

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
query_pipeline.add_component("retriever", SupabasePgvectorEmbeddingRetriever(document_store=document_store))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

query = "How many languages are there?"

res = query_pipeline.run({"text_embedder": {"text": query}})
print(res['retriever']['documents'][0].content)
# >> "There are over 7,000 languages spoken around the world today."

init

python

__init__(
    *,
    document_store: SupabasePgvectorDocumentStore,
    filters: dict[str, Any] | None = None,
    top_k: int = 10,
    vector_function: (
        Literal["cosine_similarity", "inner_product", "l2_distance"] | None
    ) = None,
    filter_policy: str | FilterPolicy = FilterPolicy.REPLACE
) -> None

Initialize the SupabasePgvectorEmbeddingRetriever.

Parameters:

document_store (SupabasePgvectorDocumentStore) – An instance of SupabasePgvectorDocumentStore.
filters (dict[str, Any] | None) – Filters applied to the retrieved Documents.
top_k (int) – Maximum number of Documents to return.
vector_function (Literal['cosine_similarity', 'inner_product', 'l2_distance'] | None) – The similarity function to use when searching for similar embeddings. Defaults to the one set in the document_store instance. "cosine_similarity" and "inner_product" are similarity functions and higher scores indicate greater similarity between the documents. "l2_distance" returns the straight-line distance between vectors, and the most similar documents are the ones with the smallest score. Important: if the document store is using the "hnsw" search strategy, the vector function should match the one utilized during index creation to take advantage of the index.
filter_policy (str | FilterPolicy) – Policy to determine how filters are applied.

Raises:

ValueError – If document_store is not an instance of SupabasePgvectorDocumentStore or if vector_function is not one of the valid options.

to_dict

python

to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

dict[str, Any] – Dictionary with serialized data.

from_dict

python

from_dict(data: dict[str, Any]) -> SupabasePgvectorEmbeddingRetriever

Deserializes the component from a dictionary.

Parameters:

data (dict[str, Any]) – Dictionary to deserialize from.

Returns:

SupabasePgvectorEmbeddingRetriever – Deserialized component.

haystack_integrations.components.retrievers.supabase.keyword_retriever

SupabasePgvectorKeywordRetriever

Bases: PgvectorKeywordRetriever

Retrieves documents from the SupabasePgvectorDocumentStore, based on keywords.

This is a thin wrapper around PgvectorKeywordRetriever, adapted for use with SupabasePgvectorDocumentStore.

To rank the documents, the ts_rank_cd function of PostgreSQL is used. It considers how often the query terms appear in the document, how close together the terms are in the document, and how important is the part of the document where they occur.

Example usage:

Set an environment variable `SUPABASE_DB_URL` with the connection string to your Supabase database.

bash

export SUPABASE_DB_URL=postgresql://postgres:postgres@localhost:5432/postgres

python

from haystack import Document, Pipeline
from haystack.document_stores.types.policy import DuplicatePolicy

from haystack_integrations.document_stores.supabase import SupabasePgvectorDocumentStore
from haystack_integrations.components.retrievers.supabase import SupabasePgvectorKeywordRetriever

document_store = SupabasePgvectorDocumentStore(
    embedding_dimension=768,
    recreate_table=True,
)

documents = [Document(content="There are over 7,000 languages spoken around the world today."),
             Document(content="Elephants have been observed to behave in a way that indicates..."),
             Document(content="In certain places, you can witness the phenomenon of bioluminescent waves.")]

document_store.write_documents(documents, policy=DuplicatePolicy.OVERWRITE)
retriever = SupabasePgvectorKeywordRetriever(document_store=document_store)
result = retriever.run(query="languages")

print(result['documents'][0].content)
# >> "There are over 7,000 languages spoken around the world today."

init

python

__init__(
    *,
    document_store: SupabasePgvectorDocumentStore,
    filters: dict[str, Any] | None = None,
    top_k: int = 10,
    filter_policy: str | FilterPolicy = FilterPolicy.REPLACE
) -> None

Initialize the SupabasePgvectorKeywordRetriever.

Parameters:

document_store (SupabasePgvectorDocumentStore) – An instance of SupabasePgvectorDocumentStore.
filters (dict[str, Any] | None) – Filters applied to the retrieved Documents.
top_k (int) – Maximum number of Documents to return.
filter_policy (str | FilterPolicy) – Policy to determine how filters are applied.

Raises:

ValueError – If document_store is not an instance of SupabasePgvectorDocumentStore.

to_dict

python

to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

dict[str, Any] – Dictionary with serialized data.

from_dict

python

from_dict(data: dict[str, Any]) -> SupabasePgvectorKeywordRetriever

Deserializes the component from a dictionary.

Parameters:

data (dict[str, Any]) – Dictionary to deserialize from.

Returns:

SupabasePgvectorKeywordRetriever – Deserialized component.

haystack_integrations.document_stores.supabase.document_store

SupabasePgvectorDocumentStore

Bases: PgvectorDocumentStore

A Document Store for Supabase, using PostgreSQL with the pgvector extension.

It should be used with Supabase installed.

This is a thin wrapper around PgvectorDocumentStore with Supabase-specific defaults:

Reads the connection string from the SUPABASE_DB_URL environment variable.
Defaults create_extension to False since pgvector is pre-installed on Supabase.

Connection notes: Supabase offers two pooler ports — transaction mode (6543) and session mode (5432). For best compatibility with pgvector operations, use session mode (port 5432) or a direct connection.

Example usage:

Set an environment variable `SUPABASE_DB_URL` with the connection string to your Supabase database.

bash

export SUPABASE_DB_URL=postgresql://postgres:postgres@localhost:5432/postgres

python

from haystack_integrations.document_stores.supabase import SupabasePgvectorDocumentStore

document_store = SupabasePgvectorDocumentStore(
    embedding_dimension=768,
    vector_function="cosine_similarity",
    recreate_table=True,
)

init

python

__init__(
    *,
    connection_string: Secret = Secret.from_env_var("SUPABASE_DB_URL"),
    create_extension: bool = False,
    schema_name: str = "public",
    table_name: str = "haystack_documents",
    language: str = "english",
    embedding_dimension: int = 768,
    vector_type: Literal["vector", "halfvec"] = "vector",
    vector_function: Literal[
        "cosine_similarity", "inner_product", "l2_distance"
    ] = "cosine_similarity",
    recreate_table: bool = False,
    search_strategy: Literal[
        "exact_nearest_neighbor", "hnsw"
    ] = "exact_nearest_neighbor",
    hnsw_recreate_index_if_exists: bool = False,
    hnsw_index_creation_kwargs: dict[str, int] | None = None,
    hnsw_index_name: str = "haystack_hnsw_index",
    hnsw_ef_search: int | None = None,
    keyword_index_name: str = "haystack_keyword_index"
) -> None

Creates a new SupabasePgvectorDocumentStore instance.

Parameters:

connection_string (Secret) – The connection string for the Supabase PostgreSQL database, defined as an environment variable. Default: SUPABASE_DB_URL. Format: postgresql://postgres.[project-ref]:[password]@aws-0-[region].pooler.supabase.com:5432/postgres
create_extension (bool) – Whether to create the pgvector extension if it doesn't exist. Defaults to False since Supabase has pgvector pre-installed.
schema_name (str) – The name of the schema the table is created in.
table_name (str) – The name of the table to use to store Haystack documents.
language (str) – The language to be used to parse query and document content in keyword retrieval.
embedding_dimension (int) – The dimension of the embedding.
vector_type (Literal['vector', 'halfvec']) – The type of vector used for embedding storage. "vector" or "halfvec".
vector_function (Literal['cosine_similarity', 'inner_product', 'l2_distance']) – The similarity function to use when searching for similar embeddings.
recreate_table (bool) – Whether to recreate the table if it already exists.
search_strategy (Literal['exact_nearest_neighbor', 'hnsw']) – The search strategy to use: "exact_nearest_neighbor" or "hnsw".
hnsw_recreate_index_if_exists (bool) – Whether to recreate the HNSW index if it already exists.
hnsw_index_creation_kwargs (dict[str, int] | None) – Additional keyword arguments for HNSW index creation.
hnsw_index_name (str) – Index name for the HNSW index.
hnsw_ef_search (int | None) – The ef_search parameter to use at query time for HNSW.
keyword_index_name (str) – Index name for the Keyword index.

to_dict

python

to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

dict[str, Any] – Dictionary with serialized data.

from_dict

python

from_dict(data: dict[str, Any]) -> SupabasePgvectorDocumentStore

Deserializes the component from a dictionary.

Parameters:

data (dict[str, Any]) – Dictionary to deserialize from.

Returns:

SupabasePgvectorDocumentStore – Deserialized component.

haystack_integrations.components.downloaders.supabase.supabase_bucket_downloader​

SupabaseBucketDownloader​

init​

warm_up​

run​

to_dict​

from_dict​

haystack_integrations.components.retrievers.supabase.embedding_retriever​

SupabasePgvectorEmbeddingRetriever​

Set an environment variable SUPABASE_DB_URL with the connection string to your Supabase database.

init​

to_dict​

from_dict​

haystack_integrations.components.retrievers.supabase.keyword_retriever​

SupabasePgvectorKeywordRetriever​

Set an environment variable SUPABASE_DB_URL with the connection string to your Supabase database.

init​

to_dict​

from_dict​

haystack_integrations.document_stores.supabase.document_store​

SupabasePgvectorDocumentStore​

Set an environment variable SUPABASE_DB_URL with the connection string to your Supabase database.

init​

to_dict​

from_dict​

haystack_integrations.components.downloaders.supabase.supabase_bucket_downloader

SupabaseBucketDownloader

init

warm_up

run

to_dict

from_dict

haystack_integrations.components.retrievers.supabase.embedding_retriever

SupabasePgvectorEmbeddingRetriever

Set an environment variable `SUPABASE_DB_URL` with the connection string to your Supabase database.

init

to_dict

from_dict

haystack_integrations.components.retrievers.supabase.keyword_retriever

SupabasePgvectorKeywordRetriever

Set an environment variable `SUPABASE_DB_URL` with the connection string to your Supabase database.

init

to_dict

from_dict

haystack_integrations.document_stores.supabase.document_store

SupabasePgvectorDocumentStore

Set an environment variable `SUPABASE_DB_URL` with the connection string to your Supabase database.

init

to_dict

from_dict