DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio (Waitlist)
API Reference

Ollama integration for Haystack

Module haystack_integrations.components.generators.ollama.generator


Provides an interface to generate text using an LLM running on Ollama.

Usage example:

from haystack_integrations.components.generators.ollama import OllamaGenerator

generator = OllamaGenerator(model="zephyr",
                            url = "http://localhost:11434",
                            "num_predict": 100,
                            "temperature": 0.9,

print("Who is the best American actor?"))


def __init__(model: str = "orca-mini",
             url: str = "http://localhost:11434",
             generation_kwargs: Optional[Dict[str, Any]] = None,
             system_prompt: Optional[str] = None,
             template: Optional[str] = None,
             raw: bool = False,
             timeout: int = 120,
             streaming_callback: Optional[Callable[[StreamingChunk],
                                                   None]] = None)


  • model: The name of the model to use. The model should be available in the running Ollama instance.
  • url: The URL of a running Ollama instance.
  • generation_kwargs: Optional arguments to pass to the Ollama generation endpoint, such as temperature, top_p, and others. See the available arguments in Ollama docs.
  • system_prompt: Optional system message (overrides what is defined in the Ollama Modelfile).
  • template: The full prompt template (overrides what is defined in the Ollama Modelfile).
  • raw: If True, no formatting will be applied to the prompt. You may choose to use the raw parameter if you are specifying a full templated prompt in your API request.
  • timeout: The number of seconds before throwing a timeout error from the Ollama API.
  • streaming_callback: A callback function that is called when a new token is received from the stream. The callback function accepts StreamingChunk as an argument.


def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.


Dictionary with serialized data.


def from_dict(cls, data: Dict[str, Any]) -> "OllamaGenerator"

Deserializes the component from a dictionary.


  • data: Dictionary to deserialize from.


Deserialized component.

@component.output_types(replies=List[str], meta=List[Dict[str, Any]])
def run(prompt: str, generation_kwargs: Optional[Dict[str, Any]] = None)

Runs an Ollama Model on the given prompt.


  • prompt: The prompt to generate a response for.
  • generation_kwargs: Optional arguments to pass to the Ollama generation endpoint, such as temperature, top_p, and others. See the available arguments in Ollama docs.


A dictionary with the following keys:

  • replies: The responses from the model
  • meta: The metadata collected during the run



Supports models running on Ollama, such as llama2 and mixtral. Find the full list of supported models here.

Usage example:
from haystack_integrations.components.generators.ollama import OllamaChatGenerator
from haystack.dataclasses import ChatMessage

generator = OllamaChatGenerator(model="zephyr",
                            url = "http://localhost:11434",
                            "num_predict": 100,
                            "temperature": 0.9,

messages = [ChatMessage.from_system("

You are a helpful, respectful and honest assistant"), ChatMessage.from_user("What's Natural Language Processing?")]



def __init__(model: str = "orca-mini",
             url: str = "http://localhost:11434",
             generation_kwargs: Optional[Dict[str, Any]] = None,
             timeout: int = 120,
             streaming_callback: Optional[Callable[[StreamingChunk],
                                                   None]] = None)


  • model: The name of the model to use. The model should be available in the running Ollama instance.
  • url: The URL of a running Ollama instance.
  • generation_kwargs: Optional arguments to pass to the Ollama generation endpoint, such as temperature, top_p, and others. See the available arguments in Ollama docs.
  • timeout: The number of seconds before throwing a timeout error from the Ollama API.
  • streaming_callback: A callback function that is called when a new token is received from the stream. The callback function accepts StreamingChunk as an argument.

def run(messages: List[ChatMessage],
        generation_kwargs: Optional[Dict[str, Any]] = None)

Runs an Ollama Model on a given chat history.


  • messages: A list of ChatMessage instances representing the input messages.
  • generation_kwargs: Optional arguments to pass to the Ollama generation endpoint, such as temperature, top_p, etc. See the Ollama docs.
  • streaming_callback: A callback function that will be called with each response chunk in streaming mode.


A dictionary with the following keys:

  • replies: The responses from the model

Module haystack_integrations.components.embedders.ollama.document_embedder


Computes the embeddings of a list of Documents and stores the obtained vectors in the embedding field of each Document. It uses embedding models compatible with the Ollama Library.

Usage example:

from haystack import Document
from haystack_integrations.components.embedders.ollama import OllamaDocumentEmbedder

doc = Document(content="What do llamas say once you have thanked them? No probllama!")
document_embedder = OllamaDocumentEmbedder()

result =[doc])


def __init__(model: str = "nomic-embed-text",
             url: str = "http://localhost:11434",
             generation_kwargs: Optional[Dict[str, Any]] = None,
             timeout: int = 120,
             prefix: str = "",
             suffix: str = "",
             progress_bar: bool = True,
             meta_fields_to_embed: Optional[List[str]] = None,
             embedding_separator: str = "\n")


  • model: The name of the model to use. The model should be available in the running Ollama instance.
  • url: The URL of a running Ollama instance.
  • generation_kwargs: Optional arguments to pass to the Ollama generation endpoint, such as temperature, top_p, and others. See the available arguments in Ollama docs.
  • timeout: The number of seconds before throwing a timeout error from the Ollama API.

@component.output_types(documents=List[Document], meta=Dict[str, Any])
def run(documents: List[Document],
        generation_kwargs: Optional[Dict[str, Any]] = None)

Runs an Ollama Model to compute embeddings of the provided documents.


  • documents: Documents to be converted to an embedding.
  • generation_kwargs: Optional arguments to pass to the Ollama generation endpoint, such as temperature, top_p, etc. See the Ollama docs.


A dictionary with the following keys:

  • documents: Documents with embedding information attached
  • meta: The metadata collected during the embedding process

Module haystack_integrations.components.embedders.ollama.text_embedder


Computes the embeddings of a list of Documents and stores the obtained vectors in the embedding field of each Document. It uses embedding models compatible with the Ollama Library.

Usage example:

from haystack_integrations.components.embedders.ollama import OllamaTextEmbedder

embedder = OllamaTextEmbedder()
result ="What do llamas say once you have thanked them? No probllama!")


def __init__(model: str = "nomic-embed-text",
             url: str = "http://localhost:11434",
             generation_kwargs: Optional[Dict[str, Any]] = None,
             timeout: int = 120)


  • model: The name of the model to use. The model should be available in the running Ollama instance.
  • url: The URL of a running Ollama instance.
  • generation_kwargs: Optional arguments to pass to the Ollama generation endpoint, such as temperature, top_p, and others. See the available arguments in Ollama docs.
  • timeout: The number of seconds before throwing a timeout error from the Ollama API.

@component.output_types(embedding=List[float], meta=Dict[str, Any])
def run(text: str, generation_kwargs: Optional[Dict[str, Any]] = None)

Runs an Ollama Model to compute embeddings of the provided text.


  • text: Text to be converted to an embedding.
  • generation_kwargs: Optional arguments to pass to the Ollama generation endpoint, such as temperature, top_p, etc. See the Ollama docs.


A dictionary with the following keys:

  • embedding: The computed embeddings
  • meta: The metadata collected during the embedding process