OllamaChatGenerator
This component enables chat completion using an LLM running on Ollama.
Name | OllamaChatGenerator |
Source | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/ollama |
Most common position in a pipeline | After a ChatPromptBuilder |
Mandatory input variables | “messages”: A list of ChatMessage objects representing the chat |
Output variables | “replies”: A list of LLM’s alternative replies |
Overview
Ollama is a project focused on running LLMs locally. Internally, it uses the quantized GGUF format by default. This means it is possible to run LLMs on standard machines (even without GPUs) without having to handle complex installation procedures.
OllamaChatGenerator
supports models running on Ollama, such as llama2
and mixtral
. Find the full list of supported models here.
OllamaChatGenerator
needs a model
name and a url
to work. By default, it uses "orca-mini"
model and "http://localhost:11434/api/chat"
url.
The way to operate with OllamaChatGenerator
is by using ChatMessage
objects. ChatMessage is a data class that contains a message, a role (who generated the message, such as user
, assistant
, system
, function
), and optional metadata. See the usage section for an example.
Usage
- You need a running instance of Ollama. The installation instructions are in the Ollama GitHub repository.
A fast way to run Ollama is using Docker:
docker run -d -p 11434:11434 --name ollama ollama/ollama:latest
- You need to download or pull the desired LLM. The model library is available on the Ollama website.
If you are using Docker, you can, for example, pull the Zephyr model:
docker exec ollama ollama pull zephyr
If you already installed Ollama in your system, you can execute:
ollama pull zephyr
Choose a specific version of a model
You can also specify a tag to choose a specific (quantized) version of your model. The available tags are shown in the model card of the Ollama models library. This is an example for Zephyr.
In this case, simply run# ollama pull model:tag ollama pull zephyr:7b-alpha-q3_K_S
- You also need to install the
ollama-haystack
package:
pip install ollama-haystack
On its own
from haystack_integrations.components.generators.ollama import OllamaChatGenerator
from haystack.dataclasses import ChatMessage
generator = OllamaChatGenerator(model="zephyr",
url = "http://localhost:11434/api/chat",
generation_kwargs={
"num_predict": 100,
"temperature": 0.9,
})
messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
ChatMessage.from_user("What's Natural Language Processing?")]
print(generator.run(messages=messages))
>> {'replies': [ChatMessage(content='Natural Language Processing (NLP) is a
>> subfield of Artificial Intelligence that deals with ...',
>> role=<ChatRole.ASSISTANT: 'assistant'>,
>> meta={'model': 'zephyr', ...})]}
In a Pipeline
from haystack.components.builders import ChatPromptBuilder
from haystack_integrations.components.generators.ollama import OllamaChatGenerator
from haystack.dataclasses import ChatMessage
from haystack import Pipeline
# no parameter init, we don't use any runtime template variables
prompt_builder = ChatPromptBuilder()
generator = OllamaChatGenerator(model="zephyr",
url = "http://localhost:11434/api/chat",
generation_kwargs={
"temperature": 0.9,
})
pipe = Pipeline()
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", generator)
pipe.connect("prompt_builder.prompt", "llm.messages")
location = "Berlin"
messages = [ChatMessage.from_system("Always respond in Spanish even if some input data is in other languages."),
ChatMessage.from_user("Tell me about {{location}}")]
print(pipe.run(data={"prompt_builder": {"template_variables":{"location": location}, "template": messages}}))
>> {'llm': {'replies': [ChatMessage(content='Berlín es la capital y la mayor ciudad
>> de Alemania. Está ubicada en el estado federado de Berlín, y tiene más...",
>> role=<ChatRole.ASSISTANT: 'assistant'>, meta={'model': 'zephyr', ...})]}
Updated 7 months ago