Vespa
haystack_integrations.components.retrievers.vespa.embedding_retriever
VespaEmbeddingRetriever
Retrieve documents from Vespa using dense vector similarity.
init
__init__(
*,
document_store: VespaDocumentStore,
filters: dict[str, Any] | None = None,
top_k: int = 10,
ranking: str | None = DEFAULT_SEMANTIC_RANKING,
query_tensor_name: str = "query_embedding",
target_hits: int | None = None
) -> None
Create a Vespa embedding retriever.
Parameters:
- document_store (
VespaDocumentStore) – ConfiguredVespaDocumentStorefor your application, for exampleVespaDocumentStore(url="http://localhost", schema="doc", namespace="doc")aligned with your Vespa schema. See https://docs.vespa.ai/en/basics/documents.html and the integration package README. - filters (
dict[str, Any] | None) – Optional static Haystack metadata filters unless overridden in :meth:run, for example{"field": "meta.category", "operator": "==", "value": "news"}. See https://docs.haystack.deepset.ai/docs/metadata-filtering and https://docs.vespa.ai/en/query-language.html. - top_k (
int) – Default maximum number of documents to return per query (for example10). - ranking (
str | None) – Vespa rank profile used after nearest-neighbor retrieval, for examplesemanticfor a profile that scores withcloseness(field, embedding). Defaults tosemantic. PassNoneto use the schema default profile. See https://docs.vespa.ai/en/basics/ranking.html. - query_tensor_name (
str) – Name of the query tensor in YQL and ininput.query(...)in your rank profile. For examplequery_embeddingmatches the defaultsemanticprofile. See https://docs.vespa.ai/en/nearest-neighbor-search.html. - target_hits (
int | None) – Optional nearest-neighbortargetHitsvalue, for example10or100: how many neighbors are considered per content node before first-phase ranking. See https://docs.vespa.ai/en/nearest-neighbor-search.html.
Raises:
ValueError– Ifdocument_storeis not an instance of VespaDocumentStore.
run
run(
query_embedding: list[float],
filters: dict[str, Any] | None = None,
top_k: int | None = None,
) -> dict[str, list[Document]]
Retrieve documents from Vespa.
Parameters:
- query_embedding (
list[float]) – Dense query embedding. - filters (
dict[str, Any] | None) – Filters applied when fetching documents from the Document Store. - top_k (
int | None) – Maximum number of documents to return.
Returns:
dict[str, list[Document]]– Retrieved documents.
haystack_integrations.components.retrievers.vespa.keyword_retriever
VespaKeywordRetriever
Retrieve documents from Vespa using lexical search.
init
__init__(
*,
document_store: VespaDocumentStore,
filters: dict[str, Any] | None = None,
top_k: int = 10,
ranking: str | None = DEFAULT_BM25_RANKING
) -> None
Create a Vespa keyword retriever.
Parameters:
- document_store (
VespaDocumentStore) – ConfiguredVespaDocumentStorefor your application, for exampleVespaDocumentStore(url="http://localhost", schema="doc", namespace="doc")so it matches the deployed schema and endpoint. See https://docs.vespa.ai/en/basics/documents.html and the integration package README. - filters (
dict[str, Any] | None) – Optional static Haystack metadata filters applied on each retrieval unless overridden in :meth:run, for example{"field": "meta.category", "operator": "==", "value": "news"}. See https://docs.haystack.deepset.ai/docs/metadata-filtering and https://docs.vespa.ai/en/query-language.html. - top_k (
int) – Default maximum number of documents to return per query (for example10). - ranking (
str | None) – Vespa rank profile for lexical matches, for examplebm25for a profile that usesbm25(content). Defaults tobm25. PassNoneto use the schema default. See https://docs.vespa.ai/en/basics/ranking.html.
Raises:
ValueError– Ifdocument_storeis not an instance of VespaDocumentStore.
run
run(
query: str, filters: dict[str, Any] | None = None, top_k: int | None = None
) -> dict[str, list[Document]]
Retrieve documents from Vespa.
Parameters:
- query (
str) – Query text. - filters (
dict[str, Any] | None) – Filters applied when fetching documents from the Document Store. - top_k (
int | None) – Maximum number of documents to return.
Returns:
dict[str, list[Document]]– Retrieved documents.
haystack_integrations.document_stores.vespa.document_store
VespaDocumentStore
Document store backed by an existing Vespa application.
init
__init__(
*,
url: str | None = None,
port: int = 8080,
cert: Secret | None = None,
key: Secret | None = None,
vespa_cloud_secret_token: Secret | None = None,
additional_headers: dict[str, str] | None = None,
content_cluster_name: str = "content",
schema: str = "doc",
namespace: str | None = None,
groupname: str | None = None,
content_field: str = "content",
embedding_field: str = "embedding",
id_field: str = "id",
metadata_fields: list[str] | None = None,
query_limit: int = DEFAULT_QUERY_LIMIT
) -> None
Create a new Vespa document store.
Parameters:
- url (
str | None) – Vespa endpoint base URL. If omitted, theVESPA_URLenvironment variable is used. - port (
int) – Vespa HTTP port. - cert (
Secret | None) – Secret resolving to the data plane certificate file path for mTLS authentication. - key (
Secret | None) – Secret resolving to the data plane key file path for mTLS authentication. - vespa_cloud_secret_token (
Secret | None) – Vespa Cloud data plane secret token for token authentication. If omitted, theVESPA_CLOUD_SECRET_TOKENenvironment variable is used when set, matching pyvespa. - additional_headers (
dict[str, str] | None) – Additional headers to send to the Vespa application. - content_cluster_name (
str) – Vespa content cluster name. - schema (
str) – Vespa schema name to read from and write to. - namespace (
str | None) – Vespa namespace. Defaults to the schema name when omitted. - groupname (
str | None) – Optional Vespa group name. - content_field (
str) – Vespa field containing the document text. - embedding_field (
str) – Vespa field containing the dense embedding. - id_field (
str) – Optional Vespa field containing the document id in query responses. Vespa document IDs are always written viadata_id. If this field is missing in the schema or summaries, the integration falls back to parsing the Vespa document path. - metadata_fields (
list[str] | None) – Optional allowlist of metadata fields to feed and return. - query_limit (
int) – Maximum number of documents returned by bulk queries. Defaults to 400 to stay within Vespa's common query hit limit unless explicitly overridden.
app
Return the underlying pyvespa Vespa HTTP client.
It is built from this store's url, port, and authentication settings
(cert, key, vespa_cloud_secret_token, additional_headers) so mTLS, bearer token,
and custom headers from the constructor (or environment) are applied.
to_dict
Serialize the document store to a dictionary.
Uses the same init-parameter names as :meth:__init__ and default_to_dict so nested serialization stays
aligned with Haystack's default component serialization.
Returns:
dict[str, Any]– Serialized document store data.
count_documents
Return the total number of documents in Vespa.
Returns:
int– Document count.
count_documents_by_filter
Return the number of documents matching the provided filters.
Parameters:
- filters (
dict[str, Any]) – Haystack metadata filters.
Returns:
int– Count of matching documents.
write_documents
write_documents(
documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE
) -> int
Write documents to Vespa.
Parameters:
- documents (
list[Document]) – Documents to store. - policy (
DuplicatePolicy) – Duplicate handling policy.
Returns:
int– Number of documents written.
delete_documents
Delete documents by id.
Parameters:
- document_ids (
list[str]) – Document ids to delete.
delete_all_documents
Delete all documents for this store's schema, namespace, and content cluster.
Implemented with pyvespa Vespa.delete_all_docs (Document V1 bulk delete).
delete_by_filter
Delete all documents matching the provided filters.
Parameters:
- filters (
dict[str, Any]) – Haystack metadata filters.
Returns:
int– Number of deleted documents.
update_by_filter
Update metadata fields for documents matching the provided filters.
Parameters:
- filters (
dict[str, Any]) – Haystack metadata filters. - meta (
dict[str, Any]) – Metadata values to merge into the matched documents.
Returns:
int– Number of updated documents.
get_documents_by_id
Retrieve documents by their ids.
Parameters:
- document_ids (
list[str]) – Document ids to fetch.
Returns:
list[Document]– Matching documents.
filter_documents
Retrieve documents matching the provided filters.
Parameters:
- filters (
dict[str, Any] | None) – Haystack metadata filters.
Returns:
list[Document]– Matching documents.
get_metadata_fields_info
Return best-effort metadata field information based on configured fields.
Returns:
dict[str, dict[str, str]]– Field metadata information.