Most common position in a pipeline	Before an embedding Retriever in a query/RAG pipeline
Mandatory init variables	"api_key": The Mistral API key. Can be set with `MISTRAL_API_KEY` env var.
Mandatory run variables	“text”: A string
Output variables	“embedding”: A list of float numbers (vectors) “meta”: A dictionary of metadata strings
API reference	Mistral
GitHub link	https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/mistral

Use MistalTextEmbedder to embed a simple string (such as a query) into a vector. For embedding lists of documents, use the MistralDocumentEmbedder, which enriches the document with the computed embedding, also known as vector.

Overview

MistralTextEmbedder transforms a string into a vector that captures its semantics using a Mistral embedding model.

The component currently supports the mistral-embed embedding model. The list of all supported models can be found in Mistral’s embedding models documentation.

To start using this integration with Haystack, install it with:

pip install mistral-haystack

MistralTextEmbedder needs a Mistral API key to work. It uses a MISTRAL_API_KEY environment variable by default. Otherwise, you can pass an API key at initialization with api_key:

embedder = MistralTextEmbedder(api_key=Secret.from_token("<your-api-key>"), model="mistral-embed")

Usage

On its own

Remember to set theMISTRAL_API_KEY as an environment variable first or pass it in directly.

Here is how you can use the component on its own:


from haystack_integrations.components.embedders.mistral.text_embedder import MistralTextEmbedder

embedder = MistralTextEmbedder(api_key=Secret.from_token("<your-api-key>"), model="mistral-embed")

result = embedder.run(text="How can I ise the Mistral embedding models with Haystack?")

print(result['embedding'])
# [-0.0015687942504882812, 0.052154541015625, 0.037109375...]

In a pipeline

Below is an example of the MistralTextEmbedder in a document search pipeline. We are building this pipeline on top of an InMemoryDocumentStore where we index the contents of two URLs.

from haystack import Document, Pipeline
from haystack.utils import Secret
from haystack.components.builders.chat_prompt_builder import ChatPromptBuilder
from haystack.components.fetchers import LinkContentFetcher
from haystack.components.converters import HTMLToDocument
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.writers import DocumentWriter
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack_integrations.components.embedders.mistral.document_embedder import MistralDocumentEmbedder
from haystack_integrations.components.embedders.mistral.text_embedder import MistralTextEmbedder
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage

# Initialize document store
document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")

# Indexing components
fetcher = LinkContentFetcher()
converter = HTMLToDocument()
embedder = MistralDocumentEmbedder()
writer = DocumentWriter(document_store=document_store)

indexing = Pipeline()
indexing.add_component(name="fetcher", instance=fetcher)
indexing.add_component(name="converter", instance=converter)
indexing.add_component(name="embedder", instance=embedder)
indexing.add_component(name="writer", instance=writer)

indexing.connect("fetcher", "converter")
indexing.connect("converter", "embedder")
indexing.connect("embedder", "writer")

indexing.run(data={"fetcher": {"urls": ["https://docs.mistral.ai/self-deployment/cloudflare/", 
                                        "https://docs.mistral.ai/platform/endpoints/"]}})

# Retrieval components
text_embedder = MistralTextEmbedder()
retriever = InMemoryEmbeddingRetriever(document_store=document_store)

# Define prompt template
prompt_template = [
    ChatMessage.from_system("You are a helpful assistant."),
    ChatMessage.from_user(
        "Given the retrieved documents, answer the question.\nDocuments:\n"
        "{% for document in documents %}{{ document.content }}{% endfor %}\n"
        "Question: {{ query }}\nAnswer:"
    )
]

prompt_builder = ChatPromptBuilder(template=prompt_template, required_variables={"query", "documents"})
llm = OpenAIChatGenerator(model="gpt-4o-mini", api_key=Secret.from_token("<your-api-key>"))

doc_search = Pipeline()
doc_search.add_component("text_embedder", text_embedder)
doc_search.add_component("retriever", retriever)
doc_search.add_component("prompt_builder", prompt_builder)
doc_search.add_component("llm", llm)

doc_search.connect("text_embedder.embedding", "retriever.query_embedding")
doc_search.connect("retriever.documents", "prompt_builder.documents")
doc_search.connect("prompt_builder.messages", "llm.messages")

query = "How can I deploy Mistral models with Cloudflare?"

result = doc_search.run(
    {
        "text_embedder": {"text": query},
        "retriever": {"top_k": 1},
        "prompt_builder": {"query": query}
    }
)

print(result["llm"]["replies"])