Llama.cpp integration for Haystack
Module haystack_integrations.components.generators.llama_cpp.generator
LlamaCppGenerator
Provides an interface to generate text using LLM via llama.cpp.
llama.cpp is a project written in C/C++ for efficient inference of LLMs. It employs the quantized GGUF format, suitable for running these models on standard machines (even without GPUs).
Usage example:
from haystack_integrations.components.generators.llama_cpp import LlamaCppGenerator
generator = LlamaCppGenerator(model="zephyr-7b-beta.Q4_0.gguf", n_ctx=2048, n_batch=512)
print(generator.run("Who is the best American actor?", generation_kwargs={"max_tokens": 128}))
# {'replies': ['John Cusack'], 'meta': [{"object": "text_completion", ...}]}
LlamaCppGenerator.__init__
def __init__(model: str,
n_ctx: Optional[int] = 0,
n_batch: Optional[int] = 512,
model_kwargs: Optional[Dict[str, Any]] = None,
generation_kwargs: Optional[Dict[str, Any]] = None)
Arguments:
model
: The path of a quantized model for text generation, for example, "zephyr-7b-beta.Q4_0.gguf". If the model path is also specified in themodel_kwargs
, this parameter will be ignored.n_ctx
: The number of tokens in the context. When set to 0, the context will be taken from the model.n_batch
: Prompt processing maximum batch size.model_kwargs
: Dictionary containing keyword arguments used to initialize the LLM for text generation. These keyword arguments provide fine-grained control over the model loading. In case of duplication, these kwargs overridemodel
,n_ctx
, andn_batch
init parameters. For more information on the available kwargs, see llama.cpp documentation.generation_kwargs
: A dictionary containing keyword arguments to customize text generation. For more information on the available kwargs, see llama.cpp documentation.
LlamaCppGenerator.run
@component.output_types(replies=List[str], meta=List[Dict[str, Any]])
def run(prompt: str, generation_kwargs: Optional[Dict[str, Any]] = None)
Run the text generation model on the given prompt.
Arguments:
prompt
: the prompt to be sent to the generative model.generation_kwargs
: A dictionary containing keyword arguments to customize text generation. For more information on the available kwargs, see llama.cpp documentation.
Returns:
A dictionary with the following keys:
replies
: the list of replies generated by the model.meta
: metadata about the request.