GradientDocumentEmbedder
GradientDocumentEmbedder
computes embeddings for documents using models deployed through the Gradient AI platform.
Name | GradientDocumentEmbedder |
Source | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/gradient |
Most common position in a pipeline | Before a DocumentWriter in an indexing pipeline |
Mandatory input variables | “documents”: A list of documents |
Output variables | “documents”: A list of documents (enriched with embeddings) |
GradientDocumentEmbedder
allows you to compute embeddings for documents using embedding models deployed on the Gradient AI platform.
Check out the Gradient documentation for the full list of available embedding models on Gradient. Currently, the component allows you to use bge-large
model.
For an example showcasing this component, check out this article and the related 🧑🍳 Cookbook.
Parameters Overview
GradientDocumentEmbedder
needs an access_token
and workspace_id
. You can provide these in one of the following ways:
For the access_token
and workspace_id
, do one of the following:
- Provide the
access_token
andworkspace_id
init parameter. - Set
GRADIENT_ACCESS_TOKEN
andGRADIENT_WORKSPACE_ID
environment variables.
As more models become available, you can change the model in the component by setting the model
parameter at initialization.
Usage
You need to install gradient-haystack
package to use the GradientDocumentEmbedder
:
pip install gradient-haystack
On its own
Here is how you can use the component on its own:
import os
from haystack.dataclasses import Document
from gradient_haystack.embedders.gradient_document_embedder import GradientDocumentEmbedder
documents = [Document(content="Pizza is made with dough and cheese"),
Document(content="Cake is made with floud and sugar"),
Document(content="Omlette is made with eggs")]
os.environ["GRADIENT_ACCESS_TOKEN"]="YOUR_GRADIENT_ACCESS_TOKEN"
os.environ["GRADIENT_WORKSPACE_ID"]="GRADIENT_WORKSPACE_ID"
document_embedder = GradientDocumentEmbedder()
document_embedder.warm_up()
document_embedder.run(documents=documents)
In a pipeline
Document embedders are most commonly used in indexing pipelines, to index documents alongside their embeddings into a Document Store. Here is an example of this component being used in an indexing pipeline with the InMemoryDocumentStore
.
import os
from haystack import Pipeline
from haystack.dataclasses import Document
from gradient_haystack.embedders.gradient_document_embedder import GradientDocumentEmbedder
from haystack.components.writers import DocumentWriter
from haystack.document_stores.in_memory import InMemoryDocumentStore
documents = [Document(content="Pizza is made with dough and cheese"),
Document(content="Cake is made with floud and sugar"),
Document(content="Omlette is made with eggs")]
document_store = InMemoryDocumentStore()
os.environ["GRADIENT_ACCESS_TOKEN"]="YOUR_GRADIENT_ACCESS_TOKEN"
os.environ["GRADIENT_WORKSPACE_ID"]="GRADIENT_WORKSPACE_ID"
document_embedder = GradientDocumentEmbedder()
writer = DocumentWriter(document_store=document_store)
indexing_pipeline = Pipeline()
indexing_pipeline.add_component(instance=document_embedder, name="document_embedder")
indexing_pipeline.add_component(instance=writer, name="writer")
indexing_pipeline.connect("document_embedder.documents", "writer.documents")
indexing_pipeline.run(data={"document_embedder":{"documents": documents}})
Updated 8 months ago