Version: 2.32-unstable

PerplexityDocumentEmbedder

PerplexityDocumentEmbedder computes the embeddings of a list of documents and stores the obtained vectors in the embedding field of each document. It uses Perplexity embedding models.

The vectors computed by this component are necessary to perform embedding retrieval on a collection of documents. At retrieval time, the vector representing the query is compared with those of the documents to find the most similar or relevant documents.


Most common position in a pipeline	Before a `DocumentWriter` in an indexing pipeline
Mandatory init variables	`api_key`: A Perplexity API key. Can be set with `PERPLEXITY_API_KEY` env var.
Mandatory run variables	`documents`: A list of documents
Output variables	`documents`: A list of documents (enriched with embeddings) `meta`: A dictionary of metadata
API reference	Integrations
GitHub link	https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/perplexity/src/haystack_integrations/components/embedders/perplexity/document_embedder.py
Package name	`perplexity-haystack`

Overview

PerplexityDocumentEmbedder supports the following embedding models:

pplx-embed-v1-0.6b (default)
pplx-embed-v1-4b

Use this component to embed a list of documents. To embed a single string (such as a query), use PerplexityTextEmbedder.

The component uses a PERPLEXITY_API_KEY environment variable by default. You can also pass an API key directly at initialization:

python

from haystack_integrations.components.embedders.perplexity import (
    PerplexityDocumentEmbedder,
)
from haystack.utils import Secret

embedder = PerplexityDocumentEmbedder(api_key=Secret.from_token("<your-api-key>"))

Embedding Metadata

If your documents have semantically meaningful metadata fields, you can embed them alongside the document text to improve retrieval quality:

python

from haystack import Document
from haystack_integrations.components.embedders.perplexity import (
    PerplexityDocumentEmbedder,
)

doc = Document(content="some text", meta={"title": "relevant title", "page_number": 18})

embedder = PerplexityDocumentEmbedder(meta_fields_to_embed=["title"])
docs_with_embeddings = embedder.run(documents=[doc])["documents"]

Usage

On its own

python

from haystack import Document
from haystack_integrations.components.embedders.perplexity import (
    PerplexityDocumentEmbedder,
)

doc = Document(content="I love pizza!")

document_embedder = PerplexityDocumentEmbedder()
result = document_embedder.run([doc])
print(result["documents"][0].embedding)

# [0.017020374536514282, -0.023255806416273117, ...]

info

We recommend setting PERPLEXITY_API_KEY as an environment variable instead of passing it as a parameter.

In a pipeline

python

from haystack import Document, Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.writers import DocumentWriter
from haystack_integrations.components.embedders.perplexity import (
    PerplexityTextEmbedder,
    PerplexityDocumentEmbedder,
)

document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")

documents = [
    Document(content="My name is Wolfgang and I live in Berlin"),
    Document(content="I saw a black horse running"),
    Document(content="Germany has many big cities"),
]

indexing_pipeline = Pipeline()
indexing_pipeline.add_component("embedder", PerplexityDocumentEmbedder())
indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store))
indexing_pipeline.connect("embedder", "writer")
indexing_pipeline.run({"embedder": {"documents": documents}})

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", PerplexityTextEmbedder())
query_pipeline.add_component(
    "retriever",
    InMemoryEmbeddingRetriever(document_store=document_store),
)
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

result = query_pipeline.run({"text_embedder": {"text": "Who lives in Berlin?"}})
print(result["retriever"]["documents"][0])

Overview​

Embedding Metadata​

Usage​

On its own​

In a pipeline​

Overview

Embedding Metadata

Usage

On its own

In a pipeline