DocumentationAPI ReferenceπŸ““ TutorialsπŸ§‘β€πŸ³ Cookbook🀝 IntegrationsπŸ’œ Discord

Pinecone integration for Haystack

Module haystack_integrations.components.retrievers.pinecone.embedding_retriever

PineconeEmbeddingRetriever

Retrieves documents from the PineconeDocumentStore, based on their dense embeddings.

Usage example:

import os
from haystack.document_stores.types import DuplicatePolicy
from haystack import Document
from haystack import Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder
from haystack_integrations.components.retrievers.pinecone import PineconeEmbeddingRetriever
from haystack_integrations.document_stores.pinecone import PineconeDocumentStore

os.environ["PINECONE_API_KEY"] = "YOUR_PINECONE_API_KEY"
document_store = PineconeDocumentStore(index="my_index", namespace="my_namespace", dimension=768)

documents = [Document(content="There are over 7,000 languages spoken around the world today."),
             Document(content="Elephants have been observed to behave in a way that indicates..."),
             Document(content="In certain places, you can witness the phenomenon of bioluminescent waves.")]

document_embedder = SentenceTransformersDocumentEmbedder()
document_embedder.warm_up()
documents_with_embeddings = document_embedder.run(documents)

document_store.write_documents(documents_with_embeddings.get("documents"), policy=DuplicatePolicy.OVERWRITE)

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
query_pipeline.add_component("retriever", PineconeEmbeddingRetriever(document_store=document_store))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

query = "How many languages are there?"

res = query_pipeline.run({"text_embedder": {"text": query}})
assert res['retriever']['documents'][0].content == "There are over 7,000 languages spoken around the world today."

PineconeEmbeddingRetriever.__init__

def __init__(*,
             document_store: PineconeDocumentStore,
             filters: Optional[Dict[str, Any]] = None,
             top_k: int = 10)

Arguments:

  • document_store: The Pinecone Document Store.
  • filters: Filters applied to the retrieved Documents.
  • top_k: Maximum number of Documents to return.

Raises:

  • ValueError: If document_store is not an instance of PineconeDocumentStore.

PineconeEmbeddingRetriever.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

PineconeEmbeddingRetriever.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "PineconeEmbeddingRetriever"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

PineconeEmbeddingRetriever.run

@component.output_types(documents=List[Document])
def run(query_embedding: List[float])

Retrieve documents from the PineconeDocumentStore, based on their dense embeddings.

Arguments:

  • query_embedding: Embedding of the query.

Returns:

List of Document similar to query_embedding.

Module haystack_integrations.document_stores.pinecone.document_store

PineconeDocumentStore

A Document Store using Pinecone vector database.

PineconeDocumentStore.__init__

def __init__(*,
             api_key: Secret = Secret.from_env_var("PINECONE_API_KEY"),
             environment: str = "us-west1-gcp",
             index: str = "default",
             namespace: str = "default",
             batch_size: int = 100,
             dimension: int = 768,
             **index_creation_kwargs)

Creates a new PineconeDocumentStore instance.

It is meant to be connected to a Pinecone index and namespace.

Arguments:

  • api_key: The Pinecone API key.
  • environment: The Pinecone environment to connect to.
  • index: The Pinecone index to connect to. If the index does not exist, it will be created.
  • namespace: The Pinecone namespace to connect to. If the namespace does not exist, it will be created at the first write.
  • batch_size: The number of documents to write in a single batch. When setting this parameter, consider documented Pinecone limits.
  • dimension: The dimension of the embeddings. This parameter is only used when creating a new index.
  • index_creation_kwargs: Additional keyword arguments to pass to the index creation method. You can find the full list of supported arguments in the API reference.

PineconeDocumentStore.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "PineconeDocumentStore"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

PineconeDocumentStore.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

PineconeDocumentStore.count_documents

def count_documents() -> int

Returns how many documents are present in the document store.

PineconeDocumentStore.write_documents

def write_documents(documents: List[Document],
                    policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int

Writes Documents to Pinecone.

Arguments:

  • documents: A list of Documents to write to the document store.
  • policy: The duplicate policy to use when writing documents. PineconeDocumentStore only supports DuplicatePolicy.OVERWRITE.

Returns:

The number of documents written to the document store.

PineconeDocumentStore.filter_documents

def filter_documents(
        filters: Optional[Dict[str, Any]] = None) -> List[Document]

Returns the documents that match the filters provided.

For a detailed specification of the filters, refer to the documentation

Arguments:

  • filters: The filters to apply to the document list.

Returns:

A list of Documents that match the given filters.

PineconeDocumentStore.delete_documents

def delete_documents(document_ids: List[str]) -> None

Deletes documents that match the provided document_ids from the document store.

Arguments:

  • document_ids: the document ids to delete