DocumentationAPI ReferenceπŸ““ TutorialsπŸ§‘β€πŸ³ Cookbook🀝 IntegrationsπŸ’œ Discord

Ollama integration for Haystack

Module haystack_integrations.components.generators.ollama.generator

OllamaGenerator

Provides an interface to generate text using an LLM running on Ollama.

Usage example:

from haystack_integrations.components.generators.ollama import OllamaGenerator

generator = OllamaGenerator(model="zephyr",
                            url = "http://localhost:11434/api/generate",
                            generation_kwargs={
                            "num_predict": 100,
                            "temperature": 0.9,
                            })

print(generator.run("Who is the best American actor?"))

OllamaGenerator.__init__

def __init__(model: str = "orca-mini",
             url: str = "http://localhost:11434/api/generate",
             generation_kwargs: Optional[Dict[str, Any]] = None,
             system_prompt: Optional[str] = None,
             template: Optional[str] = None,
             raw: bool = False,
             timeout: int = 120,
             streaming_callback: Optional[Callable[[StreamingChunk],
                                                   None]] = None)

Arguments:

  • model: The name of the model to use. The model should be available in the running Ollama instance.
  • url: The URL of the generation endpoint of a running Ollama instance.
  • generation_kwargs: Optional arguments to pass to the Ollama generation endpoint, such as temperature, top_p, and others. See the available arguments in Ollama docs.
  • system_prompt: Optional system message (overrides what is defined in the Ollama Modelfile).
  • template: The full prompt template (overrides what is defined in the Ollama Modelfile).
  • raw: If True, no formatting will be applied to the prompt. You may choose to use the raw parameter if you are specifying a full templated prompt in your API request.
  • timeout: The number of seconds before throwing a timeout error from the Ollama API.
  • streaming_callback: A callback function that is called when a new token is received from the stream. The callback function accepts StreamingChunk as an argument.

OllamaGenerator.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

OllamaGenerator.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "OllamaGenerator"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

OllamaGenerator.run

@component.output_types(replies=List[str], metadata=List[Dict[str, Any]])
def run(prompt: str, generation_kwargs: Optional[Dict[str, Any]] = None)

Runs an Ollama Model on the given prompt.

Arguments:

  • prompt: The prompt to generate a response for.
  • generation_kwargs: Optional arguments to pass to the Ollama generation endpoint, such as temperature, top_p, and others. See the available arguments in Ollama docs.

Returns:

A dictionary with the following keys:

  • replies: The responses from the model
  • meta: The metadata collected during the run

Module haystack_integrations.components.generators.ollama.chat.chat_generator

OllamaChatGenerator

Supports models running on Ollama, such as llama2 and mixtral. Find the full list of supported models here.

Usage example:
```python
from haystack_integrations.components.generators.ollama import OllamaChatGenerator
from haystack.dataclasses import ChatMessage

generator = OllamaChatGenerator(model="zephyr",
                            url = "http://localhost:11434/api/chat",
                            generation_kwargs={
                            "num_predict": 100,
                            "temperature": 0.9,
                            })

messages = [ChatMessage.from_system("

You are a helpful, respectful and honest assistant"), ChatMessage.from_user("What's Natural Language Processing?")]

print(generator.run(messages=messages))
```

OllamaChatGenerator.__init__

def __init__(model: str = "orca-mini",
             url: str = "http://localhost:11434/api/chat",
             generation_kwargs: Optional[Dict[str, Any]] = None,
             template: Optional[str] = None,
             timeout: int = 120)

Arguments:

  • model: The name of the model to use. The model should be available in the running Ollama instance.
  • url: The URL of the chat endpoint of a running Ollama instance.
  • generation_kwargs: Optional arguments to pass to the Ollama generation endpoint, such as temperature, top_p, and others. See the available arguments in Ollama docs.
  • template: The full prompt template (overrides what is defined in the Ollama Modelfile).
  • timeout: The number of seconds before throwing a timeout error from the Ollama API.

OllamaChatGenerator.run

@component.output_types(replies=List[ChatMessage])
def run(messages: List[ChatMessage],
        generation_kwargs: Optional[Dict[str, Any]] = None)

Runs an Ollama Model on a given chat history.

Arguments:

  • messages: A list of ChatMessage instances representing the input messages.
  • generation_kwargs: Optional arguments to pass to the Ollama generation endpoint, such as temperature, top_p, etc. See the Ollama docs.

Returns:

A dictionary with the following keys:

  • replies: The responses from the model

Module haystack_integrations.components.embedders.ollama.document_embedder

OllamaDocumentEmbedder

Computes the embeddings of a list of Documents and stores the obtained vectors in the embedding field of each Document. It uses embedding models compatible with the Ollama Library.

Usage example:

from haystack import Document
from haystack_integrations.components.embedders.ollama import OllamaDocumentEmbedder

doc = Document(content="What do llamas say once you have thanked them? No probllama!")
document_embedder = OllamaDocumentEmbedder()

result = document_embedder.run([doc])
print(result['documents'][0].embedding)

OllamaDocumentEmbedder.__init__

def __init__(model: str = "nomic-embed-text",
             url: str = "http://localhost:11434/api/embeddings",
             generation_kwargs: Optional[Dict[str, Any]] = None,
             timeout: int = 120,
             prefix: str = "",
             suffix: str = "",
             progress_bar: bool = True,
             meta_fields_to_embed: Optional[List[str]] = None,
             embedding_separator: str = "\n")

Arguments:

  • model: The name of the model to use. The model should be available in the running Ollama instance.
  • url: The URL of the chat endpoint of a running Ollama instance.
  • generation_kwargs: Optional arguments to pass to the Ollama generation endpoint, such as temperature, top_p, and others. See the available arguments in Ollama docs.
  • timeout: The number of seconds before throwing a timeout error from the Ollama API.

OllamaDocumentEmbedder.run

@component.output_types(documents=List[Document], meta=Dict[str, Any])
def run(documents: List[Document],
        generation_kwargs: Optional[Dict[str, Any]] = None)

Runs an Ollama Model to compute embeddings of the provided documents.

Arguments:

  • documents: Documents to be converted to an embedding.
  • generation_kwargs: Optional arguments to pass to the Ollama generation endpoint, such as temperature, top_p, etc. See the Ollama docs.

Returns:

A dictionary with the following keys:

  • documents: Documents with embedding information attached
  • meta: The metadata collected during the embedding process

Module haystack_integrations.components.embedders.ollama.text_embedder

OllamaTextEmbedder

Computes the embeddings of a list of Documents and stores the obtained vectors in the embedding field of each Document. It uses embedding models compatible with the Ollama Library.

Usage example:

from haystack_integrations.components.embedders.ollama import OllamaTextEmbedder

embedder = OllamaTextEmbedder()
result = embedder.run(text="What do llamas say once you have thanked them? No probllama!")
print(result['embedding'])

OllamaTextEmbedder.__init__

def __init__(model: str = "nomic-embed-text",
             url: str = "http://localhost:11434/api/embeddings",
             generation_kwargs: Optional[Dict[str, Any]] = None,
             timeout: int = 120)

Arguments:

  • model: The name of the model to use. The model should be available in the running Ollama instance.
  • url: The URL of the chat endpoint of a running Ollama instance.
  • generation_kwargs: Optional arguments to pass to the Ollama generation endpoint, such as temperature, top_p, and others. See the available arguments in Ollama docs.
  • timeout: The number of seconds before throwing a timeout error from the Ollama API.

OllamaTextEmbedder.run

@component.output_types(embedding=List[float], meta=Dict[str, Any])
def run(text: str, generation_kwargs: Optional[Dict[str, Any]] = None)

Runs an Ollama Model to compute embeddings of the provided text.

Arguments:

  • text: Text to be converted to an embedding.
  • generation_kwargs: Optional arguments to pass to the Ollama generation endpoint, such as temperature, top_p, etc. See the Ollama docs.

Returns:

A dictionary with the following keys:

  • embedding: The computed embeddings
  • meta: The metadata collected during the embedding process