DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio (Waitlist)
API Reference

Llama.cpp integration for Haystack

Module haystack_integrations.components.generators.llama_cpp.generator

LlamaCppGenerator

Provides an interface to generate text using LLM via llama.cpp.

llama.cpp is a project written in C/C++ for efficient inference of LLMs. It employs the quantized GGUF format, suitable for running these models on standard machines (even without GPUs).

Usage example:

from haystack_integrations.components.generators.llama_cpp import LlamaCppGenerator
generator = LlamaCppGenerator(model="zephyr-7b-beta.Q4_0.gguf", n_ctx=2048, n_batch=512)

print(generator.run("Who is the best American actor?", generation_kwargs={"max_tokens": 128}))
# {'replies': ['John Cusack'], 'meta': [{"object": "text_completion", ...}]}

LlamaCppGenerator.__init__

def __init__(model: str,
             n_ctx: Optional[int] = 0,
             n_batch: Optional[int] = 512,
             model_kwargs: Optional[Dict[str, Any]] = None,
             generation_kwargs: Optional[Dict[str, Any]] = None)

Arguments:

  • model: The path of a quantized model for text generation, for example, "zephyr-7b-beta.Q4_0.gguf". If the model path is also specified in the model_kwargs, this parameter will be ignored.
  • n_ctx: The number of tokens in the context. When set to 0, the context will be taken from the model.
  • n_batch: Prompt processing maximum batch size.
  • model_kwargs: Dictionary containing keyword arguments used to initialize the LLM for text generation. These keyword arguments provide fine-grained control over the model loading. In case of duplication, these kwargs override model, n_ctx, and n_batch init parameters. For more information on the available kwargs, see llama.cpp documentation.
  • generation_kwargs: A dictionary containing keyword arguments to customize text generation. For more information on the available kwargs, see llama.cpp documentation.

LlamaCppGenerator.run

@component.output_types(replies=List[str], meta=List[Dict[str, Any]])
def run(prompt: str, generation_kwargs: Optional[Dict[str, Any]] = None)

Run the text generation model on the given prompt.

Arguments:

  • prompt: the prompt to be sent to the generative model.
  • generation_kwargs: A dictionary containing keyword arguments to customize text generation. For more information on the available kwargs, see llama.cpp documentation.

Returns:

A dictionary with the following keys:

  • replies: the list of replies generated by the model.
  • meta: metadata about the request.