DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio
Documentation

GradientDocumentEmbedder

GradientDocumentEmbedder computes embeddings for documents using models deployed through the Gradient AI platform.

NameGradientDocumentEmbedder
Sourcehttps://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/gradient
Most common position in a pipelineBefore a DocumentWriter in an indexing pipeline
Mandatory input variables“documents”: A list of documents
Output variables“documents”: A list of documents (enriched with embeddings)

GradientDocumentEmbedder allows you to compute embeddings for documents using embedding models deployed on the Gradient AI platform.

Check out the Gradient documentation for the full list of available embedding models on Gradient. Currently, the component allows you to use bge-large model.

📘

For an example showcasing this component, check out this article and the related 🧑‍🍳 Cookbook.

Parameters Overview

GradientDocumentEmbedder needs an access_token and workspace_id. You can provide these in one of the following ways:

For the access_token and workspace_id, do one of the following:

  • Provide the access_token and workspace_id init parameter.
  • Set GRADIENT_ACCESS_TOKEN and GRADIENT_WORKSPACE_ID environment variables.

As more models become available, you can change the model in the component by setting the model parameter at initialization.

Usage

You need to install gradient-haystack package to use the GradientDocumentEmbedder:

pip install gradient-haystack

On its own

Here is how you can use the component on its own:

import os
from haystack.dataclasses import Document
from gradient_haystack.embedders.gradient_document_embedder import GradientDocumentEmbedder

documents = [Document(content="Pizza is made with dough and cheese"),
             Document(content="Cake is made with floud and sugar"),
             Document(content="Omlette is made with eggs")]

os.environ["GRADIENT_ACCESS_TOKEN"]="YOUR_GRADIENT_ACCESS_TOKEN"
os.environ["GRADIENT_WORKSPACE_ID"]="GRADIENT_WORKSPACE_ID"

document_embedder = GradientDocumentEmbedder()
document_embedder.warm_up()
document_embedder.run(documents=documents)

In a pipeline

Document embedders are most commonly used in indexing pipelines, to index documents alongside their embeddings into a Document Store. Here is an example of this component being used in an indexing pipeline with the InMemoryDocumentStore.

import os
from haystack import Pipeline
from haystack.dataclasses import Document
from gradient_haystack.embedders.gradient_document_embedder import GradientDocumentEmbedder
from haystack.components.writers import DocumentWriter
from haystack.document_stores.in_memory import InMemoryDocumentStore

documents = [Document(content="Pizza is made with dough and cheese"),
             Document(content="Cake is made with floud and sugar"),
             Document(content="Omlette is made with eggs")]

document_store = InMemoryDocumentStore()

os.environ["GRADIENT_ACCESS_TOKEN"]="YOUR_GRADIENT_ACCESS_TOKEN"
os.environ["GRADIENT_WORKSPACE_ID"]="GRADIENT_WORKSPACE_ID"

document_embedder = GradientDocumentEmbedder()
writer = DocumentWriter(document_store=document_store)

indexing_pipeline = Pipeline()
indexing_pipeline.add_component(instance=document_embedder, name="document_embedder")
indexing_pipeline.add_component(instance=writer, name="writer")

indexing_pipeline.connect("document_embedder.documents", "writer.documents")
indexing_pipeline.run(data={"document_embedder":{"documents": documents}})