Skip to main content
Version: 2.30-unstable

Vespa

haystack_integrations.components.retrievers.vespa.embedding_retriever

VespaEmbeddingRetriever

Retrieve documents from Vespa using dense vector similarity.

init

python
__init__(
*,
document_store: VespaDocumentStore,
filters: dict[str, Any] | None = None,
top_k: int = 10,
ranking: str | None = DEFAULT_SEMANTIC_RANKING,
query_tensor_name: str = "query_embedding",
target_hits: int | None = None
) -> None

Create a Vespa embedding retriever.

Parameters:

Raises:

  • ValueError – If document_store is not an instance of VespaDocumentStore.

run

python
run(
query_embedding: list[float],
filters: dict[str, Any] | None = None,
top_k: int | None = None,
) -> dict[str, list[Document]]

Retrieve documents from Vespa.

Parameters:

  • query_embedding (list[float]) – Dense query embedding.
  • filters (dict[str, Any] | None) – Filters applied when fetching documents from the Document Store.
  • top_k (int | None) – Maximum number of documents to return.

Returns:

  • dict[str, list[Document]] – Retrieved documents.

haystack_integrations.components.retrievers.vespa.keyword_retriever

VespaKeywordRetriever

Retrieve documents from Vespa using lexical search.

init

python
__init__(
*,
document_store: VespaDocumentStore,
filters: dict[str, Any] | None = None,
top_k: int = 10,
ranking: str | None = DEFAULT_BM25_RANKING
) -> None

Create a Vespa keyword retriever.

Parameters:

  • document_store (VespaDocumentStore) – Configured VespaDocumentStore for your application, for example VespaDocumentStore(url="http://localhost", schema="doc", namespace="doc") so it matches the deployed schema and endpoint. See https://docs.vespa.ai/en/basics/documents.html and the integration package README.
  • filters (dict[str, Any] | None) – Optional static Haystack metadata filters applied on each retrieval unless overridden in :meth:run, for example {"field": "meta.category", "operator": "==", "value": "news"}. See https://docs.haystack.deepset.ai/docs/metadata-filtering and https://docs.vespa.ai/en/query-language.html.
  • top_k (int) – Default maximum number of documents to return per query (for example 10).
  • ranking (str | None) – Vespa rank profile for lexical matches, for example bm25 for a profile that uses bm25(content). Defaults to bm25. Pass None to use the schema default. See https://docs.vespa.ai/en/basics/ranking.html.

Raises:

  • ValueError – If document_store is not an instance of VespaDocumentStore.

run

python
run(
query: str, filters: dict[str, Any] | None = None, top_k: int | None = None
) -> dict[str, list[Document]]

Retrieve documents from Vespa.

Parameters:

  • query (str) – Query text.
  • filters (dict[str, Any] | None) – Filters applied when fetching documents from the Document Store.
  • top_k (int | None) – Maximum number of documents to return.

Returns:

  • dict[str, list[Document]] – Retrieved documents.

haystack_integrations.document_stores.vespa.document_store

VespaDocumentStore

Document store backed by an existing Vespa application.

init

python
__init__(
*,
url: str | None = None,
port: int = 8080,
cert: Secret | None = None,
key: Secret | None = None,
vespa_cloud_secret_token: Secret | None = None,
additional_headers: dict[str, str] | None = None,
content_cluster_name: str = "content",
schema: str = "doc",
namespace: str | None = None,
groupname: str | None = None,
content_field: str = "content",
embedding_field: str = "embedding",
id_field: str = "id",
metadata_fields: list[str] | None = None,
query_limit: int = DEFAULT_QUERY_LIMIT
) -> None

Create a new Vespa document store.

Parameters:

  • url (str | None) – Vespa endpoint base URL. If omitted, the VESPA_URL environment variable is used.
  • port (int) – Vespa HTTP port.
  • cert (Secret | None) – Secret resolving to the data plane certificate file path for mTLS authentication.
  • key (Secret | None) – Secret resolving to the data plane key file path for mTLS authentication.
  • vespa_cloud_secret_token (Secret | None) – Vespa Cloud data plane secret token for token authentication. If omitted, the VESPA_CLOUD_SECRET_TOKEN environment variable is used when set, matching pyvespa.
  • additional_headers (dict[str, str] | None) – Additional headers to send to the Vespa application.
  • content_cluster_name (str) – Vespa content cluster name.
  • schema (str) – Vespa schema name to read from and write to.
  • namespace (str | None) – Vespa namespace. Defaults to the schema name when omitted.
  • groupname (str | None) – Optional Vespa group name.
  • content_field (str) – Vespa field containing the document text.
  • embedding_field (str) – Vespa field containing the dense embedding.
  • id_field (str) – Optional Vespa field containing the document id in query responses. Vespa document IDs are always written via data_id. If this field is missing in the schema or summaries, the integration falls back to parsing the Vespa document path.
  • metadata_fields (list[str] | None) – Optional allowlist of metadata fields to feed and return.
  • query_limit (int) – Maximum number of documents returned by bulk queries. Defaults to 400 to stay within Vespa's common query hit limit unless explicitly overridden.

app

python
app: Any

Return the underlying pyvespa Vespa HTTP client.

It is built from this store's url, port, and authentication settings (cert, key, vespa_cloud_secret_token, additional_headers) so mTLS, bearer token, and custom headers from the constructor (or environment) are applied.

to_dict

python
to_dict() -> dict[str, Any]

Serialize the document store to a dictionary.

Uses the same init-parameter names as :meth:__init__ and default_to_dict so nested serialization stays aligned with Haystack's default component serialization.

Returns:

  • dict[str, Any] – Serialized document store data.

count_documents

python
count_documents() -> int

Return the total number of documents in Vespa.

Returns:

  • int – Document count.

count_documents_by_filter

python
count_documents_by_filter(filters: dict[str, Any]) -> int

Return the number of documents matching the provided filters.

Parameters:

  • filters (dict[str, Any]) – Haystack metadata filters.

Returns:

  • int – Count of matching documents.

write_documents

python
write_documents(
documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE
) -> int

Write documents to Vespa.

Parameters:

  • documents (list[Document]) – Documents to store.
  • policy (DuplicatePolicy) – Duplicate handling policy.

Returns:

  • int – Number of documents written.

delete_documents

python
delete_documents(document_ids: list[str]) -> None

Delete documents by id.

Parameters:

  • document_ids (list[str]) – Document ids to delete.

delete_all_documents

python
delete_all_documents() -> None

Delete all documents for this store's schema, namespace, and content cluster.

Implemented with pyvespa Vespa.delete_all_docs (Document V1 bulk delete).

delete_by_filter

python
delete_by_filter(filters: dict[str, Any]) -> int

Delete all documents matching the provided filters.

Parameters:

  • filters (dict[str, Any]) – Haystack metadata filters.

Returns:

  • int – Number of deleted documents.

update_by_filter

python
update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int

Update metadata fields for documents matching the provided filters.

Parameters:

  • filters (dict[str, Any]) – Haystack metadata filters.
  • meta (dict[str, Any]) – Metadata values to merge into the matched documents.

Returns:

  • int – Number of updated documents.

get_documents_by_id

python
get_documents_by_id(document_ids: list[str]) -> list[Document]

Retrieve documents by their ids.

Parameters:

  • document_ids (list[str]) – Document ids to fetch.

Returns:

  • list[Document] – Matching documents.

filter_documents

python
filter_documents(filters: dict[str, Any] | None = None) -> list[Document]

Retrieve documents matching the provided filters.

Parameters:

  • filters (dict[str, Any] | None) – Haystack metadata filters.

Returns:

  • list[Document] – Matching documents.

get_metadata_fields_info

python
get_metadata_fields_info() -> dict[str, dict[str, str]]

Return best-effort metadata field information based on configured fields.

Returns:

  • dict[str, dict[str, str]] – Field metadata information.

haystack_integrations.document_stores.vespa.filters