DocumentationAPI ReferenceπŸ““ TutorialsπŸ§‘β€πŸ³ Cookbook🀝 IntegrationsπŸ’œ Discord

Cohere integration for Haystack

Module haystack_integrations.components.embedders.cohere.document_embedder

CohereDocumentEmbedder

A component for computing Document embeddings using Cohere models.

The embedding of each Document is stored in the embedding field of the Document.

Usage example:

from haystack import Document
from cohere_haystack.embedders.document_embedder import CohereDocumentEmbedder

doc = Document(content="I love pizza!")

document_embedder = CohereDocumentEmbedder()

result = document_embedder.run([doc])
print(result['documents'][0].embedding)

# [-0.453125, 1.2236328, 2.0058594, ...]

CohereDocumentEmbedder.__init__

def __init__(api_key: Secret = Secret.from_env_var(
    ["COHERE_API_KEY", "CO_API_KEY"]),
             model: str = "embed-english-v2.0",
             input_type: str = "search_document",
             api_base_url: str = COHERE_API_URL,
             truncate: str = "END",
             use_async_client: bool = False,
             max_retries: int = 3,
             timeout: int = 120,
             batch_size: int = 32,
             progress_bar: bool = True,
             meta_fields_to_embed: Optional[List[str]] = None,
             embedding_separator: str = "\n")

Arguments:

  • api_key: the Cohere API key.
  • model: the name of the model to use. Supported Models are: "embed-english-v3.0", "embed-english-light-v3.0", "embed-multilingual-v3.0", "embed-multilingual-light-v3.0", "embed-english-v2.0", "embed-english-light-v2.0", "embed-multilingual-v2.0". This list of all supported models can be found in the model documentation.
  • input_type: specifies the type of input you're giving to the model. Supported values are "search_document", "search_query", "classification" and "clustering". Not required for older versions of the embedding models (meaning anything lower than v3), but is required for more recent versions (meaning anything bigger than v2).
  • api_base_url: the Cohere API Base url.
  • truncate: truncate embeddings that are too long from start or end, ("NONE"|"START"|"END"). Passing "START" will discard the start of the input. "END" will discard the end of the input. In both cases, input is discarded until the remaining input is exactly the maximum input token length for the model. If "NONE" is selected, when the input exceeds the maximum input token length an error will be returned.
  • use_async_client: flag to select the AsyncClient. It is recommended to use AsyncClient for applications with many concurrent calls.
  • max_retries: maximal number of retries for requests.
  • timeout: request timeout in seconds.
  • batch_size: number of Documents to encode at once.
  • progress_bar: whether to show a progress bar or not. Can be helpful to disable in production deployments to keep the logs clean.
  • meta_fields_to_embed: list of meta fields that should be embedded along with the Document text.
  • embedding_separator: separator used to concatenate the meta fields to the Document text.

CohereDocumentEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

CohereDocumentEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "CohereDocumentEmbedder"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

CohereDocumentEmbedder.run

@component.output_types(documents=List[Document], meta=Dict[str, Any])
def run(documents: List[Document])

Embed a list of Documents.

Arguments:

  • documents: documents to embed.

Raises:

  • TypeError: if the input is not a list of Documents.

Returns:

A dictionary with the following keys:

  • documents: documents with the embedding field set.
  • meta: metadata about the embedding process.

Module haystack_integrations.components.embedders.cohere.text_embedder

CohereTextEmbedder

A component for embedding strings using Cohere models.

Usage example:

from haystack_integrations.components.embedders.cohere import CohereDocumentEmbedder

text_to_embed = "I love pizza!"

text_embedder = CohereTextEmbedder()

print(text_embedder.run(text_to_embed))

# {'embedding': [-0.453125, 1.2236328, 2.0058594, ...]
# 'meta': {'api_version': {'version': '1'}, 'billed_units': {'input_tokens': 4}}}

CohereTextEmbedder.__init__

def __init__(api_key: Secret = Secret.from_env_var(
    ["COHERE_API_KEY", "CO_API_KEY"]),
             model: str = "embed-english-v2.0",
             input_type: str = "search_query",
             api_base_url: str = COHERE_API_URL,
             truncate: str = "END",
             use_async_client: bool = False,
             max_retries: int = 3,
             timeout: int = 120)

Arguments:

  • api_key: the Cohere API key.
  • model: the name of the model to use. Supported Models are: "embed-english-v3.0", "embed-english-light-v3.0", "embed-multilingual-v3.0", "embed-multilingual-light-v3.0", "embed-english-v2.0", "embed-english-light-v2.0", "embed-multilingual-v2.0". This list of all supported models can be found in the model documentation.
  • input_type: specifies the type of input you're giving to the model. Supported values are "search_document", "search_query", "classification" and "clustering". Not required for older versions of the embedding models (meaning anything lower than v3), but is required for more recent versions (meaning anything bigger than v2).
  • api_base_url: the Cohere API Base url.
  • truncate: truncate embeddings that are too long from start or end, ("NONE"|"START"|"END"). Passing "START" will discard the start of the input. "END" will discard the end of the input. In both cases, input is discarded until the remaining input is exactly the maximum input token length for the model. If "NONE" is selected, when the input exceeds the maximum input token length an error will be returned.
  • use_async_client: flag to select the AsyncClient. It is recommended to use AsyncClient for applications with many concurrent calls.
  • max_retries: maximum number of retries for requests.
  • timeout: request timeout in seconds.

CohereTextEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

CohereTextEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "CohereTextEmbedder"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

CohereTextEmbedder.run

@component.output_types(embedding=List[float], meta=Dict[str, Any])
def run(text: str)

Embed text.

Arguments:

  • text: the text to embed.

Raises:

  • TypeError: If the input is not a string.

Returns:

A dictionary with the following keys:

  • embedding: the embedding of the text.
  • meta: metadata about the request.

Module haystack_integrations.components.embedders.cohere.utils

get_async_response

async def get_async_response(cohere_async_client: AsyncClient,
                             texts: List[str], model_name, input_type,
                             truncate)

Embeds a list of texts asynchronously using the Cohere API.

Arguments:

  • cohere_async_client: the Cohere AsyncClient
  • texts: the texts to embed
  • model_name: the name of the model to use
  • input_type: one of "classification", "clustering", "search_document", "search_query". The type of input text provided to embed.
  • truncate: one of "NONE", "START", "END". How the API handles text longer than the maximum token length.

Raises:

  • ValueError: If an error occurs while querying the Cohere API.

Returns:

A tuple of the embeddings and metadata.

get_response

def get_response(
        cohere_client: Client,
        texts: List[str],
        model_name,
        input_type,
        truncate,
        batch_size=32,
        progress_bar=False) -> Tuple[List[List[float]], Dict[str, Any]]

Embeds a list of texts using the Cohere API.

Arguments:

  • cohere_client: the Cohere Client
  • texts: the texts to embed
  • model_name: the name of the model to use
  • input_type: one of "classification", "clustering", "search_document", "search_query". The type of input text provided to embed.
  • truncate: one of "NONE", "START", "END". How the API handles text longer than the maximum token length.
  • batch_size: the batch size to use
  • progress_bar: if True, show a progress bar

Raises:

  • ValueError: If an error occurs while querying the Cohere API.

Returns:

A tuple of the embeddings and metadata.

Module haystack_integrations.components.generators.cohere.generator

CohereGenerator

LLM Generator compatible with Cohere's generate endpoint.

Queries the LLM using Cohere's API. Invocations are made using 'cohere' package. See Cohere API for more details.

Example usage:

from haystack_integrations.components.generators.cohere import CohereGenerator

generator = CohereGenerator(api_key="test-api-key")
generator.run(prompt="What's the capital of France?")

CohereGenerator.__init__

def __init__(api_key: Secret = Secret.from_env_var(
    ["COHERE_API_KEY", "CO_API_KEY"]),
             model: str = "command",
             streaming_callback: Optional[Callable] = None,
             api_base_url: Optional[str] = None,
             **kwargs)

Instantiates a CohereGenerator component.

Arguments:

  • api_key: the API key for the Cohere API.
  • model: the name of the model to use. Available models are: [command, command-light, command-nightly, command-nightly-light].
  • streaming_callback: A callback function to be called with the streaming response.
  • api_base_url: the base URL of the Cohere API.
  • kwargs: additional model parameters. These will be used during generation. Refer to https://docs.cohere.com/reference/generate for more details. Some of the parameters are:
  • 'max_tokens': The maximum number of tokens to be generated. Defaults to 1024.
  • 'truncate': One of NONE|START|END to specify how the API will handle inputs longer than the maximum token length. Defaults to END.
  • 'temperature': A non-negative float that tunes the degree of randomness in generation. Lower temperatures mean less random generations.
  • 'preset': Identifier of a custom preset. A preset is a combination of parameters, such as prompt, temperature etc. You can create presets in the playground.
  • 'end_sequences': The generated text will be cut at the beginning of the earliest occurrence of an end sequence. The sequence will be excluded from the text.
  • 'stop_sequences': The generated text will be cut at the end of the earliest occurrence of a stop sequence. The sequence will be included the text.
  • 'k': Defaults to 0, min value of 0.01, max value of 0.99.
  • 'p': Ensures that only the most likely tokens, with total probability mass of p, are considered for generation at each step. If both k and p are enabled, p acts after k.
  • 'frequency_penalty': Used to reduce repetitiveness of generated tokens. The higher the value, the stronger a penalty is applied to previously present tokens, proportional to how many times they have already appeared in the prompt or prior generation.'
  • 'presence_penalty': Defaults to 0.0, min value of 0.0, max value of 1.0. Can be used to reduce repetitiveness of generated tokens. Similar to frequency_penalty, except that this penalty is applied equally to all tokens that have already appeared, regardless of their exact frequencies.
  • 'return_likelihoods': One of GENERATION|ALL|NONE to specify how and if the token likelihoods are returned with the response. Defaults to NONE.
  • 'logit_bias': Used to prevent the model from generating unwanted tokens or to incentivize it to include desired tokens. The format is {token_id: bias} where bias is a float between -10 and 10.

CohereGenerator.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

CohereGenerator.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "CohereGenerator"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

CohereGenerator.run

@component.output_types(replies=List[str], meta=List[Dict[str, Any]])
def run(prompt: str)

Queries the LLM with the prompts to produce replies.

Arguments:

  • prompt: the prompt to be sent to the generative model.

Returns:

A dictionary with the following keys:

  • replies: the list of replies generated by the model.
  • meta: metadata about the request.

Module haystack_integrations.components.generators.cohere.chat.chat_generator

CohereChatGenerator

Enables text generation using Cohere's chat endpoint.

This component is designed to inference Cohere's chat models.

Users can pass any text generation parameters valid for the cohere.Client,chat method directly to this component via the **generation_kwargs parameter in init or the **generation_kwargs parameter in run method.

Invocations are made using 'cohere' package. See Cohere API for more details.

Example usage:

from haystack_integrations.components.generators.cohere import CohereChatGenerator

component = CohereChatGenerator(api_key=Secret.from_token("test-api-key"))
response = component.run(chat_messages)

assert response["replies"]

CohereChatGenerator.__init__

def __init__(api_key: Secret = Secret.from_env_var(
    ["COHERE_API_KEY", "CO_API_KEY"]),
             model: str = "command",
             streaming_callback: Optional[Callable[[StreamingChunk],
                                                   None]] = None,
             api_base_url: Optional[str] = None,
             generation_kwargs: Optional[Dict[str, Any]] = None,
             **kwargs)

Initialize the CohereChatGenerator instance.

Arguments:

  • api_key: the API key for the Cohere API.
  • model: The name of the model to use. Available models are: [command, command-light, command-nightly, command-nightly-light].
  • streaming_callback: a callback function to be called with the streaming response.
  • api_base_url: the base URL of the Cohere API.
  • generation_kwargs: additional model parameters. These will be used during generation. Refer to https://docs.cohere.com/reference/chat for more details. Some of the parameters are:
  • 'chat_history': A list of previous messages between the user and the model, meant to give the model conversational context for responding to the user's message.
  • 'preamble_override': When specified, the default Cohere preamble will be replaced with the provided one.
  • 'conversation_id': An alternative to chat_history. Previous conversations can be resumed by providing the conversation's identifier. The contents of message and the model's response will be stored as part of this conversation.If a conversation with this id does not already exist, a new conversation will be created.
  • 'prompt_truncation': Defaults to AUTO when connectors are specified and OFF in all other cases. Dictates how the prompt will be constructed.
  • 'connectors': Accepts {"id": "web-search"}, and/or the "id" for a custom connector, if you've created one. When specified, the model's reply will be enriched with information found by quering each of the connectors (RAG).
  • 'documents': A list of relevant documents that the model can use to enrich its reply.
  • 'search_queries_only': Defaults to false. When true, the response will only contain a list of generated search queries, but no search will take place, and no reply from the model to the user's message will be generated.
  • 'citation_quality': Defaults to "accurate". Dictates the approach taken to generating citations as part of the RAG flow by allowing the user to specify whether they want "accurate" results or "fast" results.
  • 'temperature': A non-negative float that tunes the degree of randomness in generation. Lower temperatures mean less random generations.

CohereChatGenerator.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

CohereChatGenerator.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "CohereChatGenerator"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

CohereChatGenerator.run

@component.output_types(replies=List[ChatMessage])
def run(messages: List[ChatMessage],
        generation_kwargs: Optional[Dict[str, Any]] = None)

Invoke the text generation inference based on the provided messages and generation parameters.

Arguments:

  • messages: list of ChatMessage instances representing the input messages.
  • generation_kwargs: additional keyword arguments for text generation. These parameters will potentially override the parameters passed in the init method. For more details on the parameters supported by the Cohere API, refer to the Cohere documentation.

Returns:

A dictionary with the following keys:

  • replies: a list of ChatMessage instances representing the generated responses.

Module haystack_integrations.components.rankers.cohere.ranker

CohereRanker

Ranks Documents based on their similarity to the query using Cohere models.

Documents are indexed from most to least semantically relevant to the query.

Usage example:

from haystack import Document
from haystack.components.rankers import CohereRanker

ranker = CohereRanker(model="rerank-english-v2.0", top_k=2)

docs = [Document(content="Paris"), Document(content="Berlin")]
query = "What is the capital of germany?"
output = ranker.run(query=query, documents=docs)
docs = output["documents"]

CohereRanker.__init__

def __init__(model: str = "rerank-english-v2.0",
             top_k: int = 10,
             api_key: Secret = Secret.from_env_var(
                 ["COHERE_API_KEY", "CO_API_KEY"]),
             api_base_url: str = cohere.COHERE_API_URL,
             max_chunks_per_doc: Optional[int] = None,
             meta_fields_to_embed: Optional[List[str]] = None,
             meta_data_separator: str = "\n")

Creates an instance of the 'CohereRanker'.

Arguments:

  • model: Cohere model name. Check the list of supported models in the Cohere documentation.
  • top_k: The maximum number of documents to return.
  • api_key: Cohere API key.
  • api_base_url: the base URL of the Cohere API.
  • max_chunks_per_doc: If your document exceeds 512 tokens, this determines the maximum number of chunks a document can be split into. If None, the default of 10 is used. For example, if your document is 6000 tokens, with the default of 10, the document will be split into 10 chunks each of 512 tokens and the last 880 tokens will be disregarded. Check Cohere docs for more information.
  • meta_fields_to_embed: List of meta fields that should be concatenated with the document content for reranking.
  • meta_data_separator: Separator used to concatenate the meta fields to the Document content.

CohereRanker.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

CohereRanker.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "CohereRanker"

Deserializes the component from a dictionary.

Arguments:

  • data: The dictionary to deserialize from.

Returns:

The deserialized component.

CohereRanker.run

@component.output_types(documents=List[Document])
def run(query: str, documents: List[Document], top_k: Optional[int] = None)

Use the Cohere Reranker to re-rank the list of documents based on the query.

Arguments:

  • query: Query string.
  • documents: List of Documents.
  • top_k: The maximum number of Documents you want the Ranker to return.

Raises:

  • ValueError: If top_k is not > 0.

Returns:

A dictionary with the following keys:

  • documents: List of Documents most similar to the given query in descending order of similarity.