MistralTextEmbedder
This component transforms a string into a vector using the Mistral API and models. Use it for embedding retrieval to transform your query into an embedding.
Name | MistralTextEmbedder |
Source | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/mistral |
Most common position in a pipeline | Before an embedding Retriever in a query/RAG pipeline |
Mandatory input variables | “text”: A string |
Output variables | “embedding”: A list of float numbers (vectors) “meta”: A dictionary of metadata strings |
Use MistalTextEmbedder
to embed a simple string (such as a query) into a vector. For embedding lists of documents, use the MistralDocumentEmbedder
, which enriches the document with the computed embedding, also known as vector.
Overview
MistralTextEmbedder
transforms a string into a vector that captures its semantics using a Mistral embedding model.
The component currently supports the mistral-embed
embedding model. The list of all supported models can be found in Mistral’s embedding models documentation.
To start using this integration with Haystack, install it with:
pip install mistral-haystack
MistralTextEmbedder
needs a Mistral API key to work. It uses a MISTRAL_API_KEY
environment variable by default. Otherwise, you can pass an API key at initialization with api_key
:
embedder = MistralTextEmbedder(api_key=Secret.from_token("<your-api-key>"), model="mistral-embed")
Usage
On its own
Remember to set theMISTRAL_API_KEY
as an environment variable first or pass it in directly.
Here is how you can use the component on its own:
from haystack_integrations.components.embedders.mistral.text_embedder import MistralTextEmbedder
embedder = MistralTextEmbedder(api_key=Secret.from_token("<your-api-key>"), model="mistral-embed")
result = embedder.run(text="How can I ise the Mistral embedding models with Haystack?")
print(result['embedding'])
# [-0.0015687942504882812, 0.052154541015625, 0.037109375...]
In a pipeline
Below is an example of the MistralTextEmbedder
in a document search pipeline. We are building this pipeline on top of an InMemoryDocumentStore
where we index the contents of two URLs.
from haystack import Document
from haystack import Pipeline
from haystack.components.builders import PromptBuilder
from haystack.components.fetchers import LinkContentFetcher
from haystack.components.converters import HTMLToDocument
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.writers import DocumentWriter
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack_integrations.components.embedders.mistral.document_embedder import MistralDocumentEmbedder
from haystack_integrations.components.embedders.mistral.text_embedder import MistralTextEmbedder
document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")
fetcher = LinkContentFetcher()
converter = HTMLToDocument()
embedder = MistralDocumentEmbedder()
writer = DocumentWriter(document_store=document_store)
indexing = Pipeline()
indexing.add_component(name="fetcher", instance=fetcher)
indexing.add_component(name="converter", instance=converter)
indexing.add_component(name="embedder", instance=embedder)
indexing.add_component(name="writer", instance=writer)
indexing.connect("fetcher", "converter")
indexing.connect("converter", "embedder")
indexing.connect("embedder", "writer")
indexing.run(data={"fetcher": {"urls": ["https://docs.mistral.ai/self-deployment/cloudflare/",
"https://docs.mistral.ai/platform/endpoints/"]}})
text_embedder = MistralTextEmbedder()
retriever = InMemoryEmbeddingRetriever(document_store=document_store)
doc_search = Pipeline()
doc_search.add_component("text_embedder", text_embedder)
doc_search.add_component("retriever", retriever)
doc_search.connect("text_embedder.embedding", "retriever.query_embedding")
result = doc_search.run(
{
"text_embedder": {"text": "How can I deploy Mistral models with Cloudflare?"},
"retriever": {"top_k": 1}
}
)
result["retriever"]["documents"]
Updated 8 months ago