Version: 2.30-unstable

Pinecone

haystack_integrations.components.retrievers.pinecone.embedding_retriever

PineconeEmbeddingRetriever

Retrieves documents from the PineconeDocumentStore, based on their dense embeddings.

Usage example:

python

import os
from haystack.document_stores.types import DuplicatePolicy
from haystack import Document
from haystack import Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder
from haystack_integrations.components.retrievers.pinecone import PineconeEmbeddingRetriever
from haystack_integrations.document_stores.pinecone import PineconeDocumentStore

os.environ["PINECONE_API_KEY"] = "YOUR_PINECONE_API_KEY"
document_store = PineconeDocumentStore(index="my_index", namespace="my_namespace", dimension=768)

documents = [Document(content="There are over 7,000 languages spoken around the world today."),
             Document(content="Elephants have been observed to behave in a way that indicates..."),
             Document(content="In certain places, you can witness the phenomenon of bioluminescent waves.")]

document_embedder = SentenceTransformersDocumentEmbedder()
document_embedder.warm_up()
documents_with_embeddings = document_embedder.run(documents)

document_store.write_documents(documents_with_embeddings.get("documents"), policy=DuplicatePolicy.OVERWRITE)

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
query_pipeline.add_component("retriever", PineconeEmbeddingRetriever(document_store=document_store))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

query = "How many languages are there?"

res = query_pipeline.run({"text_embedder": {"text": query}})
assert res['retriever']['documents'][0].content == "There are over 7,000 languages spoken around the world today."

init

python

__init__(
    *,
    document_store: PineconeDocumentStore,
    filters: dict[str, Any] | None = None,
    top_k: int = 10,
    filter_policy: str | FilterPolicy = FilterPolicy.REPLACE
) -> None

Initialize the PineconeEmbeddingRetriever.

Parameters:

document_store (PineconeDocumentStore) – The Pinecone Document Store.
filters (dict[str, Any] | None) – Filters applied to the retrieved Documents.
top_k (int) – Maximum number of Documents to return.
filter_policy (str | FilterPolicy) – Policy to determine how filters are applied.

Raises:

ValueError – If document_store is not an instance of PineconeDocumentStore.

to_dict

python

to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

dict[str, Any] – Dictionary with serialized data.

from_dict

python

from_dict(data: dict[str, Any]) -> PineconeEmbeddingRetriever

Deserializes the component from a dictionary.

Parameters:

data (dict[str, Any]) – Dictionary to deserialize from.

Returns:

PineconeEmbeddingRetriever – Deserialized component.

run

python

run(
    query_embedding: list[float],
    filters: dict[str, Any] | None = None,
    top_k: int | None = None,
) -> dict[str, list[Document]]

Retrieve documents from the PineconeDocumentStore, based on their dense embeddings.

Parameters:

query_embedding (list[float]) – Embedding of the query.
filters (dict[str, Any] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy chosen at retriever initialization. See init method docstring for more details.
top_k (int | None) – Maximum number of Documents to return.

Returns:

dict[str, list[Document]] – List of Document similar to query_embedding.

run_async

python

run_async(
    query_embedding: list[float],
    filters: dict[str, Any] | None = None,
    top_k: int | None = None,
) -> dict[str, list[Document]]

Asynchronously retrieve documents from the PineconeDocumentStore, based on their dense embeddings.

Parameters:

query_embedding (list[float]) – Embedding of the query.
filters (dict[str, Any] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy chosen at retriever initialization. See init method docstring for more details.
top_k (int | None) – Maximum number of Documents to return.

Returns:

dict[str, list[Document]] – List of Document similar to query_embedding.

haystack_integrations.document_stores.pinecone.document_store

PineconeDocumentStore

A Document Store using Pinecone vector database.

init

python

__init__(
    *,
    api_key: Secret = Secret.from_env_var("PINECONE_API_KEY"),
    index: str = "default",
    namespace: str = "default",
    batch_size: int = 100,
    dimension: int = 768,
    spec: dict[str, Any] | None = None,
    metric: Literal["cosine", "euclidean", "dotproduct"] = "cosine",
    show_progress: bool = True
) -> None

Creates a new PineconeDocumentStore instance.

It is meant to be connected to a Pinecone index and namespace.

Parameters:

api_key (Secret) – The Pinecone API key.
index (str) – The Pinecone index to connect to. If the index does not exist, it will be created.
namespace (str) – The Pinecone namespace to connect to. If the namespace does not exist, it will be created at the first write.
batch_size (int) – The number of documents to write in a single batch. When setting this parameter, consider documented Pinecone limits.
dimension (int) – The dimension of the embeddings. This parameter is only used when creating a new index.
spec (dict[str, Any] | None) – The Pinecone spec to use when creating a new index. Allows choosing between serverless and pod deployment options and setting additional parameters. Refer to the Pinecone documentation for more details. If not provided, a default spec with serverless deployment in the us-east-1 region will be used (compatible with the free tier).
metric (Literal['cosine', 'euclidean', 'dotproduct']) – The metric to use for similarity search. This parameter is only used when creating a new index.
show_progress (bool) – Whether to show a progress bar when upserting documents. Set to False to disable (e.g. in tests or scripts where quiet output is preferred).

close

python

close() -> None

Close the associated synchronous resources.

close_async

python

close_async() -> None

Close the associated asynchronous resources. To be invoked manually when the Document Store is no longer needed.

from_dict

python

from_dict(data: dict[str, Any]) -> PineconeDocumentStore

Deserializes the component from a dictionary.

Parameters:

data (dict[str, Any]) – Dictionary to deserialize from.

Returns:

PineconeDocumentStore – Deserialized component.

to_dict

python

to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

dict[str, Any] – Dictionary with serialized data.

count_documents

python

count_documents() -> int

Returns how many documents are present in the document store.

count_documents_async

python

count_documents_async() -> int

Asynchronously returns how many documents are present in the document store.

write_documents

python

write_documents(
    documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE
) -> int

Writes Documents to Pinecone.

Parameters:

documents (list[Document]) – A list of Documents to write to the document store.
policy (DuplicatePolicy) – The duplicate policy to use when writing documents. PineconeDocumentStore only supports DuplicatePolicy.OVERWRITE.

Returns:

int – The number of documents written to the document store.

write_documents_async

python

write_documents_async(
    documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE
) -> int

Asynchronously writes Documents to Pinecone.

Parameters:

documents (list[Document]) – A list of Documents to write to the document store.
policy (DuplicatePolicy) – The duplicate policy to use when writing documents. PineconeDocumentStore only supports DuplicatePolicy.OVERWRITE.

Returns:

int – The number of documents written to the document store.

filter_documents

python

filter_documents(filters: dict[str, Any] | None = None) -> list[Document]

Returns the documents that match the filters provided.

For a detailed specification of the filters, refer to the documentation

Parameters:

filters (dict[str, Any] | None) – The filters to apply to the document list.

Returns:

list[Document] – A list of Documents that match the given filters.

filter_documents_async

python

filter_documents_async(filters: dict[str, Any] | None = None) -> list[Document]

Asynchronously returns the documents that match the filters provided.

Parameters:

filters (dict[str, Any] | None) – The filters to apply to the document list.

Returns:

list[Document] – A list of Documents that match the given filters.

delete_documents

python

delete_documents(document_ids: list[str]) -> None

Deletes documents that match the provided document_ids from the document store.

Parameters:

document_ids (list[str]) – the document ids to delete

delete_documents_async

python

delete_documents_async(document_ids: list[str]) -> None

Asynchronously deletes documents that match the provided document_ids from the document store.

Parameters:

document_ids (list[str]) – the document ids to delete

delete_all_documents

python

delete_all_documents() -> None

Deletes all documents in the document store.

delete_all_documents_async

python

delete_all_documents_async() -> None

Asynchronously deletes all documents in the document store.

delete_by_filter

python

delete_by_filter(filters: dict[str, Any]) -> int

Deletes all documents that match the provided filters.

Pinecone does not support server-side delete by filter, so this method first searches for matching documents, then deletes them by ID.

Parameters:

filters (dict[str, Any]) – The filters to apply to select documents for deletion. For filter syntax, see Haystack metadata filtering

Returns:

int – The number of documents deleted.

delete_by_filter_async

python

delete_by_filter_async(filters: dict[str, Any]) -> int

Asynchronously deletes all documents that match the provided filters.

Pinecone does not support server-side delete by filter, so this method first searches for matching documents, then deletes them by ID.

Parameters:

filters (dict[str, Any]) – The filters to apply to select documents for deletion. For filter syntax, see Haystack metadata filtering

Returns:

int – The number of documents deleted.

update_by_filter

python

update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int

Updates the metadata of all documents that match the provided filters.

Pinecone does not support server-side update by filter, so this method first searches for matching documents, then updates their metadata and re-writes them.

Parameters:

filters (dict[str, Any]) – The filters to apply to select documents for updating. For filter syntax, see Haystack metadata filtering
meta (dict[str, Any]) – The metadata fields to update. This will be merged with existing metadata.

Returns:

int – The number of documents updated.

update_by_filter_async

python

update_by_filter_async(filters: dict[str, Any], meta: dict[str, Any]) -> int

Asynchronously updates the metadata of all documents that match the provided filters.

Pinecone does not support server-side update by filter, so this method first searches for matching documents, then updates their metadata and re-writes them.

Parameters:

filters (dict[str, Any]) – The filters to apply to select documents for updating. For filter syntax, see Haystack metadata filtering
meta (dict[str, Any]) – The metadata fields to update. This will be merged with existing metadata.

Returns:

int – The number of documents updated.

count_documents_by_filter

python

count_documents_by_filter(filters: dict[str, Any]) -> int

Returns the count of documents that match the provided filters.

Note: Due to Pinecone's limitations, this method fetches documents and counts them. For large result sets, this is subject to Pinecone's TOP_K_LIMIT of 1000 documents.

Parameters:

filters (dict[str, Any]) – The filters to apply to the document list. For filter syntax, see Haystack metadata filtering

Returns:

int – The number of documents that match the filters.

count_documents_by_filter_async

python

count_documents_by_filter_async(filters: dict[str, Any]) -> int

Asynchronously returns the count of documents that match the provided filters.

Note: Due to Pinecone's limitations, this method fetches documents and counts them. For large result sets, this is subject to Pinecone's TOP_K_LIMIT of 1000 documents.

Parameters:

filters (dict[str, Any]) – The filters to apply to the document list.

Returns:

int – The number of documents that match the filters.

count_unique_metadata_by_filter

python

count_unique_metadata_by_filter(
    filters: dict[str, Any], metadata_fields: list[str]
) -> dict[str, int]

Counts unique values for each specified metadata field in documents matching the filters.

Note: Due to Pinecone's limitations, this method fetches documents and aggregates in Python. Subject to Pinecone's TOP_K_LIMIT of 1000 documents.

Parameters:

filters (dict[str, Any]) – The filters to apply to select documents.
metadata_fields (list[str]) – List of metadata field names to count unique values for.

Returns:

dict[str, int] – Dictionary mapping field names to counts of unique values.

count_unique_metadata_by_filter_async

python

count_unique_metadata_by_filter_async(
    filters: dict[str, Any], metadata_fields: list[str]
) -> dict[str, int]

Asynchronously counts unique values for each specified metadata field in documents matching the filters.

Note: Due to Pinecone's limitations, this method fetches documents and aggregates in Python. Subject to Pinecone's TOP_K_LIMIT of 1000 documents.

Parameters:

filters (dict[str, Any]) – The filters to apply to select documents.
metadata_fields (list[str]) – List of metadata field names to count unique values for.

Returns:

dict[str, int] – Dictionary mapping field names to counts of unique values.

get_metadata_fields_info

python

get_metadata_fields_info() -> dict[str, dict[str, str]]

Returns information about metadata fields and their types by sampling documents.

Note: Pinecone doesn't provide a schema introspection API, so this method infers field types by examining the metadata of documents stored in the index (up to 1000 documents).

Type mappings:

'text': Document content field
'keyword': String metadata values
'long': Numeric metadata values (int or float)
'boolean': Boolean metadata values

Returns:

dict[str, dict[str, str]] – Dictionary mapping field names to type information. Example:

python

{
    'content': {'type': 'text'},
    'category': {'type': 'keyword'},
    'priority': {'type': 'long'},
}

get_metadata_fields_info_async

python

get_metadata_fields_info_async() -> dict[str, dict[str, str]]

Asynchronously returns information about metadata fields and their types by sampling documents.

Note: Pinecone doesn't provide a schema introspection API, so this method infers field types by examining the metadata of documents stored in the index (up to 1000 documents).

Type mappings:

'text': Document content field
'keyword': String metadata values
'long': Numeric metadata values (int or float)
'boolean': Boolean metadata values

Returns:

dict[str, dict[str, str]] – Dictionary mapping field names to type information. Example:

python

{
    'content': {'type': 'text'},
    'category': {'type': 'keyword'},
    'priority': {'type': 'long'},
}

get_metadata_field_min_max

python

get_metadata_field_min_max(metadata_field: str) -> dict[str, Any]

Returns the minimum and maximum values for a metadata field.

Supports numeric (int, float), boolean, and string (keyword) types:

Numeric: Returns min/max based on numeric value
Boolean: Returns False as min, True as max
String: Returns min/max based on alphabetical ordering

Note: This method fetches all documents and computes min/max in Python. Subject to Pinecone's TOP_K_LIMIT of 1000 documents.

Parameters:

metadata_field (str) – The metadata field name to analyze.

Returns:

dict[str, Any] – Dictionary with 'min' and 'max' keys. Both values are None if the field has no values (empty store, field absent, or unsupported field type).

get_metadata_field_min_max_async

python

get_metadata_field_min_max_async(metadata_field: str) -> dict[str, Any]

Asynchronously returns the minimum and maximum values for a metadata field.

Supports numeric (int, float), boolean, and string (keyword) types:

Numeric: Returns min/max based on numeric value
Boolean: Returns False as min, True as max
String: Returns min/max based on alphabetical ordering

Note: This method fetches all documents and computes min/max in Python. Subject to Pinecone's TOP_K_LIMIT of 1000 documents.

Parameters:

metadata_field (str) – The metadata field name to analyze.

Returns:

dict[str, Any] – Dictionary with 'min' and 'max' keys. Both values are None if the field has no values (empty store, field absent, or unsupported field type).

get_metadata_field_unique_values

python

get_metadata_field_unique_values(
    metadata_field: str,
    search_term: str | None = None,
    from_: int = 0,
    size: int = 10,
) -> tuple[list[str], int]

Retrieves unique values for a metadata field with optional search and pagination.

Note: This method fetches documents and extracts unique values in Python. Subject to Pinecone's TOP_K_LIMIT of 1000 documents.

Parameters:

metadata_field (str) – The metadata field name to get unique values for.
search_term (str | None) – Optional search term to filter values (case-insensitive substring match).
from_ (int) – Starting offset for pagination (default: 0).
size (int) – Number of values to return (default: 10).

Returns:

tuple[list[str], int] – Tuple of (list of unique values, total count of matching values).

get_metadata_field_unique_values_async

python

get_metadata_field_unique_values_async(
    metadata_field: str,
    search_term: str | None = None,
    from_: int = 0,
    size: int = 10,
) -> tuple[list[str], int]

Asynchronously retrieves unique values for a metadata field with optional search and pagination.

Note: This method fetches documents and extracts unique values in Python. Subject to Pinecone's TOP_K_LIMIT of 1000 documents.

Parameters:

metadata_field (str) – The metadata field name to get unique values for.
search_term (str | None) – Optional search term to filter values (case-insensitive substring match).
from_ (int) – Starting offset for pagination (default: 0).
size (int) – Number of values to return (default: 10).

Returns:

tuple[list[str], int] – Tuple of (list of unique values, total count of matching values).

haystack_integrations.components.retrievers.pinecone.embedding_retriever​

PineconeEmbeddingRetriever​

init​

to_dict​

from_dict​

run​

run_async​

haystack_integrations.document_stores.pinecone.document_store​

PineconeDocumentStore​

init​

close​

close_async​

from_dict​

to_dict​

count_documents​

count_documents_async​

write_documents​

write_documents_async​

filter_documents​

filter_documents_async​

delete_documents​

delete_documents_async​

delete_all_documents​

delete_all_documents_async​

delete_by_filter​

delete_by_filter_async​

update_by_filter​

update_by_filter_async​

count_documents_by_filter​

count_documents_by_filter_async​

count_unique_metadata_by_filter​

count_unique_metadata_by_filter_async​

get_metadata_fields_info​

get_metadata_fields_info_async​

get_metadata_field_min_max​

get_metadata_field_min_max_async​

get_metadata_field_unique_values​

get_metadata_field_unique_values_async​

haystack_integrations.components.retrievers.pinecone.embedding_retriever

PineconeEmbeddingRetriever

init

to_dict

from_dict

run

run_async

haystack_integrations.document_stores.pinecone.document_store

PineconeDocumentStore

init

close

close_async

from_dict

to_dict

count_documents

count_documents_async

write_documents

write_documents_async

filter_documents

filter_documents_async

delete_documents

delete_documents_async

delete_all_documents

delete_all_documents_async

delete_by_filter

delete_by_filter_async

update_by_filter

update_by_filter_async

count_documents_by_filter

count_documents_by_filter_async

count_unique_metadata_by_filter

count_unique_metadata_by_filter_async

get_metadata_fields_info

get_metadata_fields_info_async

get_metadata_field_min_max

get_metadata_field_min_max_async

get_metadata_field_unique_values

get_metadata_field_unique_values_async