PerplexityDocumentEmbedder
PerplexityDocumentEmbedder computes the embeddings of a list of documents and stores the obtained vectors in the embedding field of each document. It uses Perplexity embedding models.
The vectors computed by this component are necessary to perform embedding retrieval on a collection of documents. At retrieval time, the vector representing the query is compared with those of the documents to find the most similar or relevant documents.
| Most common position in a pipeline | Before a DocumentWriter in an indexing pipeline |
| Mandatory init variables | api_key: A Perplexity API key. Can be set with PERPLEXITY_API_KEY env var. |
| Mandatory run variables | documents: A list of documents |
| Output variables | documents: A list of documents (enriched with embeddings) meta: A dictionary of metadata |
| API reference | Integrations |
| GitHub link | https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/perplexity/src/haystack_integrations/components/embedders/perplexity/document_embedder.py |
| Package name | perplexity-haystack |
Overview
PerplexityDocumentEmbedder supports the following embedding models:
pplx-embed-v1-0.6b(default)pplx-embed-v1-4b
Use this component to embed a list of documents. To embed a single string (such as a query), use PerplexityTextEmbedder.
The component uses a PERPLEXITY_API_KEY environment variable by default. You can also pass an API key directly at initialization:
from haystack_integrations.components.embedders.perplexity import (
PerplexityDocumentEmbedder,
)
from haystack.utils import Secret
embedder = PerplexityDocumentEmbedder(api_key=Secret.from_token("<your-api-key>"))
Embedding Metadata
If your documents have semantically meaningful metadata fields, you can embed them alongside the document text to improve retrieval quality:
from haystack import Document
from haystack_integrations.components.embedders.perplexity import (
PerplexityDocumentEmbedder,
)
doc = Document(content="some text", meta={"title": "relevant title", "page_number": 18})
embedder = PerplexityDocumentEmbedder(meta_fields_to_embed=["title"])
docs_with_embeddings = embedder.run(documents=[doc])["documents"]
Usage
On its own
from haystack import Document
from haystack_integrations.components.embedders.perplexity import (
PerplexityDocumentEmbedder,
)
doc = Document(content="I love pizza!")
document_embedder = PerplexityDocumentEmbedder()
result = document_embedder.run([doc])
print(result["documents"][0].embedding)
# [0.017020374536514282, -0.023255806416273117, ...]
We recommend setting PERPLEXITY_API_KEY as an environment variable instead of passing it as a parameter.
In a pipeline
from haystack import Document, Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.writers import DocumentWriter
from haystack_integrations.components.embedders.perplexity import (
PerplexityTextEmbedder,
PerplexityDocumentEmbedder,
)
document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")
documents = [
Document(content="My name is Wolfgang and I live in Berlin"),
Document(content="I saw a black horse running"),
Document(content="Germany has many big cities"),
]
indexing_pipeline = Pipeline()
indexing_pipeline.add_component("embedder", PerplexityDocumentEmbedder())
indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store))
indexing_pipeline.connect("embedder", "writer")
indexing_pipeline.run({"embedder": {"documents": documents}})
query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", PerplexityTextEmbedder())
query_pipeline.add_component(
"retriever",
InMemoryEmbeddingRetriever(document_store=document_store),
)
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
result = query_pipeline.run({"text_embedder": {"text": "Who lives in Berlin?"}})
print(result["retriever"]["documents"][0])