Pinecone integration for Haystack
Module haystack_integrations.components.retrievers.pinecone.embedding_retriever
PineconeEmbeddingRetriever
Retrieves documents from the PineconeDocumentStore
, based on their dense embeddings.
Usage example:
import os
from haystack.document_stores.types import DuplicatePolicy
from haystack import Document
from haystack import Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder
from haystack_integrations.components.retrievers.pinecone import PineconeEmbeddingRetriever
from haystack_integrations.document_stores.pinecone import PineconeDocumentStore
os.environ["PINECONE_API_KEY"] = "YOUR_PINECONE_API_KEY"
document_store = PineconeDocumentStore(index="my_index", namespace="my_namespace", dimension=768)
documents = [Document(content="There are over 7,000 languages spoken around the world today."),
Document(content="Elephants have been observed to behave in a way that indicates..."),
Document(content="In certain places, you can witness the phenomenon of bioluminescent waves.")]
document_embedder = SentenceTransformersDocumentEmbedder()
document_embedder.warm_up()
documents_with_embeddings = document_embedder.run(documents)
document_store.write_documents(documents_with_embeddings.get("documents"), policy=DuplicatePolicy.OVERWRITE)
query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
query_pipeline.add_component("retriever", PineconeEmbeddingRetriever(document_store=document_store))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
query = "How many languages are there?"
res = query_pipeline.run({"text_embedder": {"text": query}})
assert res['retriever']['documents'][0].content == "There are over 7,000 languages spoken around the world today."
PineconeEmbeddingRetriever.__init__
def __init__(*,
document_store: PineconeDocumentStore,
filters: Optional[Dict[str, Any]] = None,
top_k: int = 10,
filter_policy: Union[str, FilterPolicy] = FilterPolicy.REPLACE)
Arguments:
document_store
: The Pinecone Document Store.filters
: Filters applied to the retrieved Documents.top_k
: Maximum number of Documents to return.filter_policy
: Policy to determine how filters are applied.
Raises:
ValueError
: Ifdocument_store
is not an instance ofPineconeDocumentStore
.
PineconeEmbeddingRetriever.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
PineconeEmbeddingRetriever.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "PineconeEmbeddingRetriever"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
PineconeEmbeddingRetriever.run
@component.output_types(documents=List[Document])
def run(query_embedding: List[float],
filters: Optional[Dict[str, Any]] = None,
top_k: Optional[int] = None)
Retrieve documents from the PineconeDocumentStore
, based on their dense embeddings.
Arguments:
query_embedding
: Embedding of the query.filters
: Filters applied to the retrieved Documents. The way runtime filters are applied depends on thefilter_policy
chosen at retriever initialization. See init method docstring for more details.top_k
: Maximum number ofDocument
s to return.
Returns:
List of Document similar to query_embedding
.
Module haystack_integrations.document_stores.pinecone.document_store
METADATA_SUPPORTED_TYPES
List[str] is supported and checked separately
PineconeDocumentStore
A Document Store using Pinecone vector database.
PineconeDocumentStore.__init__
def __init__(*,
api_key: Secret = Secret.from_env_var("PINECONE_API_KEY"),
index: str = "default",
namespace: str = "default",
batch_size: int = 100,
dimension: int = 768,
spec: Optional[Dict[str, Any]] = None,
metric: Literal["cosine", "euclidean", "dotproduct"] = "cosine")
Creates a new PineconeDocumentStore instance.
It is meant to be connected to a Pinecone index and namespace.
Arguments:
api_key
: The Pinecone API key.index
: The Pinecone index to connect to. If the index does not exist, it will be created.namespace
: The Pinecone namespace to connect to. If the namespace does not exist, it will be created at the first write.batch_size
: The number of documents to write in a single batch. When setting this parameter, consider documented Pinecone limits.dimension
: The dimension of the embeddings. This parameter is only used when creating a new index.spec
: The Pinecone spec to use when creating a new index. Allows choosing between serverless and pod deployment options and setting additional parameters. Refer to the Pinecone documentation for more details. If not provided, a default spec with serverless deployment in theus-east-1
region will be used (compatible with the free tier).metric
: The metric to use for similarity search. This parameter is only used when creating a new index.
PineconeDocumentStore.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "PineconeDocumentStore"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
PineconeDocumentStore.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
PineconeDocumentStore.count_documents
def count_documents() -> int
Returns how many documents are present in the document store.
PineconeDocumentStore.write_documents
def write_documents(documents: List[Document],
policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int
Writes Documents to Pinecone.
Arguments:
documents
: A list of Documents to write to the document store.policy
: The duplicate policy to use when writing documents. PineconeDocumentStore only supportsDuplicatePolicy.OVERWRITE
.
Returns:
The number of documents written to the document store.
PineconeDocumentStore.filter_documents
def filter_documents(
filters: Optional[Dict[str, Any]] = None) -> List[Document]
Returns the documents that match the filters provided.
For a detailed specification of the filters, refer to the documentation
Arguments:
filters
: The filters to apply to the document list.
Returns:
A list of Documents that match the given filters.
PineconeDocumentStore.delete_documents
def delete_documents(document_ids: List[str]) -> None
Deletes documents that match the provided document_ids
from the document store.
Arguments:
document_ids
: the document ids to delete