DocumentationAPI ReferenceπŸ““ TutorialsπŸ§‘β€πŸ³ Cookbook🀝 IntegrationsπŸ’œ Discord

MistralTextEmbedder

This component transforms a string into a vector using the Mistral API and models. Use it for embedding retrieval to transform your query into an embedding.

NameMistralTextEmbedder
TypeText Embedder
Pathhttps://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/mistral
Position in a PipelineBefore an embedding Retriever in a Query/RAG pipeline
Inputsβ€œtext”: a string
Outputsβ€œembedding”: a list of float numbers (vectors)

β€œmeta”: a dictionary of metadata strings

Use MistalTextEmbedder to embed a simple string (such as a query) into a vector. For embedding lists of Documents, use the MistralDocumentEmbedder, which enriches the Document with the computed embedding, also known as vector.

Overview

MistralTextEmbedder transforms a string into a vector that captures its semantics using a Mistral embedding model.

The component currently supports the mistral-embed embedding model. The list of all supported models can be found in Mistral’s embedding models documentation.

To start using this integration with Haystack, install it with:

pip install mistral-haystack

MistralTextEmbedderΒ needs a Mistral API key to work. It uses aΒ MISTRAL_API_KEYΒ environment variable by default. Otherwise, you can pass an API key at initialization withΒ api_key:

embedder = MistralTextEmbedder(api_key=Secret.from_token("<your-api-key>"), model="mistral-embed")

Usage

On its own

Remember to set theMISTRAL_API_KEY as an environment variable first or pass it in directly.

Here is how you can use the component on its own:


from haystack_integrations.components.embedders.mistral.text_embedder import MistralTextEmbedder

embedder = MistralTextEmbedder(api_key=Secret.from_token("<your-api-key>"), model="mistral-embed")

result = embedder.run(text="How can I ise the Mistral embedding models with Haystack?")

print(result['embedding'])
# [-0.0015687942504882812, 0.052154541015625, 0.037109375...]

In a Pipeline

Below is an example of the MistralTextEmbedder in a document search Pipeline. We are building this pipeline on top of an InMemoryDocumentStore where we index the contents of two URLs.

from haystack import Document
from haystack import Pipeline
from haystack.components.builders import PromptBuilder
from haystack.components.fetchers import LinkContentFetcher
from haystack.components.converters import HTMLToDocument
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.writers import DocumentWriter
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack_integrations.components.embedders.mistral.document_embedder import MistralDocumentEmbedder
from haystack_integrations.components.embedders.mistral.text_embedder import MistralTextEmbedder

document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")

fetcher = LinkContentFetcher()
converter = HTMLToDocument()
embedder = MistralDocumentEmbedder()
writer = DocumentWriter(document_store=document_store)

indexing = Pipeline()
indexing.add_component(name="fetcher", instance=fetcher)
indexing.add_component(name="converter", instance=converter)
indexing.add_component(name="embedder", instance=embedder)
indexing.add_component(name="writer", instance=writer)

indexing.connect("fetcher", "converter")
indexing.connect("converter", "embedder")
indexing.connect("embedder", "writer")

indexing.run(data={"fetcher": {"urls": ["https://docs.mistral.ai/self-deployment/cloudflare/", 
                                        "https://docs.mistral.ai/platform/endpoints/"]}})

text_embedder = MistralTextEmbedder()
retriever = InMemoryEmbeddingRetriever(document_store=document_store)

doc_search = Pipeline()
doc_search.add_component("text_embedder", text_embedder)
doc_search.add_component("retriever", retriever)

doc_search.connect("text_embedder.embedding", "retriever.query_embedding")

result = doc_search.run(
    {
        "text_embedder": {"text": "How can I deploy Mistral models with Cloudflare?"}, 
        "retriever": {"top_k": 1}
    }
)
result["retriever"]["documents"]

Related Links

Check out the API reference in the GitHub repo or in our docs: