Skip to main content
Version: 2.30-unstable

PerplexityDocumentEmbedder

PerplexityDocumentEmbedder computes the embeddings of a list of documents and stores the obtained vectors in the embedding field of each document. It uses Perplexity embedding models.

The vectors computed by this component are necessary to perform embedding retrieval on a collection of documents. At retrieval time, the vector representing the query is compared with those of the documents to find the most similar or relevant documents.

Most common position in a pipelineBefore a DocumentWriter in an indexing pipeline
Mandatory init variablesapi_key: A Perplexity API key. Can be set with PERPLEXITY_API_KEY env var.
Mandatory run variablesdocuments: A list of documents
Output variablesdocuments: A list of documents (enriched with embeddings)

meta: A dictionary of metadata
API referenceIntegrations
GitHub linkhttps://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/perplexity/src/haystack_integrations/components/embedders/perplexity/document_embedder.py
Package nameperplexity-haystack

Overview

PerplexityDocumentEmbedder supports the following embedding models:

  • pplx-embed-v1-0.6b (default)
  • pplx-embed-v1-4b

Use this component to embed a list of documents. To embed a single string (such as a query), use PerplexityTextEmbedder.

The component uses a PERPLEXITY_API_KEY environment variable by default. You can also pass an API key directly at initialization:

python
from haystack_integrations.components.embedders.perplexity import (
PerplexityDocumentEmbedder,
)
from haystack.utils import Secret

embedder = PerplexityDocumentEmbedder(api_key=Secret.from_token("<your-api-key>"))

Embedding Metadata

If your documents have semantically meaningful metadata fields, you can embed them alongside the document text to improve retrieval quality:

python
from haystack import Document
from haystack_integrations.components.embedders.perplexity import (
PerplexityDocumentEmbedder,
)

doc = Document(content="some text", meta={"title": "relevant title", "page_number": 18})

embedder = PerplexityDocumentEmbedder(meta_fields_to_embed=["title"])
docs_with_embeddings = embedder.run(documents=[doc])["documents"]

Usage

On its own

python
from haystack import Document
from haystack_integrations.components.embedders.perplexity import (
PerplexityDocumentEmbedder,
)

doc = Document(content="I love pizza!")

document_embedder = PerplexityDocumentEmbedder()
result = document_embedder.run([doc])
print(result["documents"][0].embedding)

# [0.017020374536514282, -0.023255806416273117, ...]
info

We recommend setting PERPLEXITY_API_KEY as an environment variable instead of passing it as a parameter.

In a pipeline

python
from haystack import Document, Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.writers import DocumentWriter
from haystack_integrations.components.embedders.perplexity import (
PerplexityTextEmbedder,
PerplexityDocumentEmbedder,
)

document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")

documents = [
Document(content="My name is Wolfgang and I live in Berlin"),
Document(content="I saw a black horse running"),
Document(content="Germany has many big cities"),
]

indexing_pipeline = Pipeline()
indexing_pipeline.add_component("embedder", PerplexityDocumentEmbedder())
indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store))
indexing_pipeline.connect("embedder", "writer")
indexing_pipeline.run({"embedder": {"documents": documents}})

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", PerplexityTextEmbedder())
query_pipeline.add_component(
"retriever",
InMemoryEmbeddingRetriever(document_store=document_store),
)
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

result = query_pipeline.run({"text_embedder": {"text": "Who lives in Berlin?"}})
print(result["retriever"]["documents"][0])