GradientTextEmbedder
This component computes embeddings for text (such as a query) using models deployed through the Gradient AI platform.
Name | GradientTextEmbedder |
Source | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/gradient |
Most common position in a pipeline | Before an embedding Retriever in a query/RAG pipeline |
Mandatory input variables | “text”: A string |
Output variables | "embedding": A list of float numbers representing the embedding of the text |
GradientTextEmbedder
allows you to compute the embedding of a string using embedding models deployed on the Gradient AI platform. This component should be used to embed a simple string (such as a query) into a vector.
For embedding lists of documents, use one of the Document Embedders, which enrich the document with the computed embedding, also known as vector.
Check out the Gradient documentation for the full list of available embedding models on Gradient. Currently, the component allows you to use the bge-large
model.
For an example showcasing this component, check out this article and the related Colab notebook.
Parameters Overview
GradientTextEmbedder
needs an access_token
and workspace_id
. You can provide these in one of the following ways:
For the access_token
and workspace_id
, do one of the following:
- Provide the
access_token
andworkspace_id
init parameter. - Set
GRADIENT_ACCESS_TOKEN
andGRADIENT_WORKSPACE_ID
environment variables.
As more models become available, you can change the model in the component by setting the model
parameter at initialization.
Usage
You need to install gradient-haystack
package to use the GradientTextEmbedder
:
pip install gradient-haystack
On its own
Here is how you can use the component on its own:
import os
from gradient_haystack.embedders.gradient_text_embedder import GradientTextEmbedder
os.environ["GRADIENT_ACCESS_TOKEN"]="YOUR_GRADIENT_ACCESS_TOKEN"
os.environ["GRADIENT_WORKSPACE_ID"]="GRADIENT_WORKSPACE_ID"
text_embedder = GradientDocumentEmbedder()
text_embedder.warm_up()
text_embedder.run(text="Pizza is made with dough and cheese")
In a pipeline
Text embedders are commonly used to embed queries before an embedding retriever in query/RAG pipelines. Here is an example of this component being used in a RAG pipeline, which is doing question answering based on documents in an InMemoryDocumentStore
:
import os
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.builders import PromptBuilder
from haystack.components.generators import HuggingFaceTGIGenerator
from gradient_haystack.embedders.gradient_text_embedder import GradientTextEmbedder
document_store = InMemoryDocumentStore()
prompt = """ Answer the query, based on the
content in the documents.
Documents:
{% for doc in documents %}
{{doc.content}}
{% endfor %}
Query: {{query}}
"""
os.environ["GRADIENT_ACCESS_TOKEN"]="YOUR_GRADIENT_ACCESS_TOKEN"
os.environ["GRADIENT_WORKSPACE_ID"]="GRADIENT_WORKSPACE_ID"
text_embedder = GradientDocumentEmbedder()
retriever = InMemoryEmbeddingRetriever(document_store=document_store)
prompt_builder = PromptBuilder(template=prompt)
generator = HuggingFaceTGIGenerator(model="mistralai/Mistral-7B-v0.1",
token="YOUR_HUGGINGFACE_TOKEN")
generator.warm_up()
rag_pipeline = Pipeline()
rag_pipeline.add_component(instance=text_embedder, name="text_embedder")
rag_pipeline.add_component(instance=retriever, name="retriever")
rag_pipeline.add_component(instance=prompt_builder, name="prompt_builder")
rag_pipeline.add_component(instance=generator, name="generator")
rag_pipeline.connect("text_embedder", "retriever")
rag_pipeline.connect("retriever.documents", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder", "generator")
question = "What are the steps for creating a custom component?"
result = rag_pipeline.run(data={"text_embedder":{"text": question},
"prompt_builder":{"query": question}})
Updated 8 months ago