API Reference

A helper node with a variety of functions.

Module shaper


def rename(value: Any) -> Any

An identity function. You can use it to rename values in the invocation context without changing them.


assert rename(1) == 1


def current_datetime(format: str = "%H:%M:%S %d/%m/%y") -> str

Function that outputs the current time and/or date formatted according to the parameters.


assert current_datetime("%d.%m.%y %H:%M:%S") == 01.01.2023 12:30:10


def value_to_list(value: Any, target_list: List[Any]) -> List[Any]

Transforms a value into a list containing this value as many times as the length of the target list.


assert value_to_list(value=1, target_list=list(range(5))) == [1, 1, 1, 1, 1]


def join_lists(lists: List[List[Any]]) -> List[Any]

Joins the lists you pass to it into a single list.


assert join_lists(lists=[[1, 2, 3], [4, 5]]) == [1, 2, 3, 4, 5]


def join_strings(strings: List[str],
                 delimiter: str = " ",
                 str_replace: Optional[Dict[str, str]] = None) -> str

Transforms a list of strings into a single string. The content of this string is the content of all of the original strings separated by the delimiter you specify.


assert join_strings(strings=["first", "second", "third"], delimiter=" - ", str_replace={"r": "R"}) == "fiRst - second - thiRd"


def format_string(string: str,
                  str_replace: Optional[Dict[str, str]] = None) -> str

Replaces strings.


assert format_string(string="first", str_replace={"r": "R"}) == "fiRst"


def join_documents(
        documents: List[Document],
        delimiter: str = " ",
        pattern: Optional[str] = None,
        str_replace: Optional[Dict[str, str]] = None) -> List[Document]

Transforms a list of documents into a list containing a single document. The content of this document is the joined result of all original documents, separated by the delimiter you specify. Use regex in the pattern parameter to control how each document is represented. You can use the following placeholders:

  • $content: The content of the document.
  • $idx: The index of the document in the list.
  • $id: The ID of the document.
  • $META_FIELD: The value of the metadata field called 'META_FIELD'.

All metadata is dropped.


assert join_documents(
    delimiter=" - ",
    pattern="[$idx] $content",
    str_replace={"r": "R"}
) == [Document(content="[1] fiRst - [2] second - [3] thiRd")]


def join_documents_and_scores(
        documents: List[Document]) -> Tuple[List[Document]]

Transforms a list of documents with scores in their metadata into a list containing a single document. The resulting document contains the scores and the contents of all the original documents. All metadata is dropped.


assert join_documents_and_scores(
        Document(content="first", meta={"score": 0.9}),
        Document(content="second", meta={"score": 0.7}),
        Document(content="third", meta={"score": 0.5})
    delimiter=" - "
) == ([Document(content="-[0.9] first
-[0.7] second
-[0.5] third")], )


def format_document(document: Document,
                    pattern: Optional[str] = None,
                    str_replace: Optional[Dict[str, str]] = None,
                    idx: Optional[int] = None) -> str

Transforms a document into a single string. Use regex in the pattern parameter to control how the document is represented. You can use the following placeholders:

  • $content: The content of the document.
  • $idx: The index of the document in the list.
  • $id: The ID of the document.
  • $META_FIELD: The value of the metadata field called 'META_FIELD'.


assert format_document(
    pattern="prefix [$idx] $content",
    str_replace={"r": "R"},
) == "prefix [1] fiRst"


def format_answer(answer: Answer,
                  pattern: Optional[str] = None,
                  str_replace: Optional[Dict[str, str]] = None,
                  idx: Optional[int] = None) -> str

Transforms an answer into a single string. Use regex in the pattern parameter to control how the answer is represented. You can use the following placeholders:

  • $answer: The answer text.
  • $idx: The index of the answer in the list.
  • $META_FIELD: The value of the metadata field called 'META_FIELD'.


assert format_answer(
    pattern="prefix [$idx] $answer",
    str_replace={"r": "R"},
) == "prefix [1] fiRst"


def join_documents_to_string(
        documents: List[Document],
        delimiter: str = " ",
        pattern: Optional[str] = None,
        str_replace: Optional[Dict[str, str]] = None) -> str

Transforms a list of documents into a single string. The content of this string is the joined result of all original documents separated by the delimiter you specify. Use regex in the pattern parameter to control how the documents are represented. You can use the following placeholders:

  • $content: The content of the document.
  • $idx: The index of the document in the list.
  • $id: The ID of the document.
  • $META_FIELD: The value of the metadata field called 'META_FIELD'.


assert join_documents_to_string(
    delimiter=" - ",
    pattern="[$idx] $content",
    str_replace={"r": "R"}
) == "[1] fiRst - [2] second - [3] thiRd"


def strings_to_answers(
        strings: List[str],
        prompts: Optional[List[Union[str, List[Dict[str, str]]]]] = None,
        documents: Optional[List[Document]] = None,
        pattern: Optional[str] = None,
        reference_pattern: Optional[str] = None,
        reference_mode: Literal["index", "id", "meta"] = "index",
        reference_meta_field: Optional[str] = None) -> List[Answer]

Transforms a list of strings into a list of answers.

Specify reference_pattern to populate the answer's document_ids by extracting document references from the strings.

:param strings: The list of strings to transform.
:param prompts: The prompts used to generate the answers.
:param documents: The documents used to generate the answers.
:param pattern: The regex pattern to use for parsing the answer.
        `[^\n]+$` will find "this is an answer" in string "this is an argument.

this is an answer". Answer: (.*) will find "this is an answer" in string "this is an argument. Answer: this is an answer". If None, the whole string is used as the answer. If not None, the first group of the regex is used as the answer. If there is no group, the whole match is used as the answer. :param reference_pattern: The regex pattern to use for parsing the document references. Example: \[(\d+)\] will find "1" in string "this is an answer[1]". If None, no parsing is done and all documents are referenced. :param reference_mode: The mode used to reference documents. Supported modes are: - index: the document references are the one-based index of the document in the list of documents. Example: "this is an answer[1]" will reference the first document in the list of documents. - id: the document references are the document IDs. Example: "this is an answer[123]" will reference the document with id "123". - meta: the document references are the value of a metadata field of the document. Example: "this is an answer[123]" will reference the document with the value "123" in the metadata field specified by reference_meta_field. :param reference_meta_field: The name of the metadata field to use for document references in reference_mode "meta". :return: The list of answers.


Without reference parsing:
assert strings_to_answers(strings=["first", "second", "third"], prompt="prompt", documents=[Document(id="123", content="content")]) == [
        Answer(answer="first", type="generative", document_ids=["123"], meta={"prompt": "prompt"}),
        Answer(answer="second", type="generative", document_ids=["123"], meta={"prompt": "prompt"}),
        Answer(answer="third", type="generative", document_ids=["123"], meta={"prompt": "prompt"}),

With reference parsing:
assert strings_to_answers(strings=["first[1]", "second[2]", "third[1][3]"], prompt="prompt",
        documents=[Document(id="123", content="content"), Document(id="456", content="content"), Document(id="789", content="content")],
    ) == [
        Answer(answer="first", type="generative", document_ids=["123"], meta={"prompt": "prompt"}),
        Answer(answer="second", type="generative", document_ids=["456"], meta={"prompt": "prompt"}),
        Answer(answer="third", type="generative", document_ids=["123", "789"], meta={"prompt": "prompt"}),


def string_to_answer(string: str,
                     prompt: Optional[Union[str, List[Dict[str, str]]]],
                     documents: Optional[List[Document]],
                     pattern: Optional[str] = None,
                     reference_pattern: Optional[str] = None,
                     reference_mode: Literal["index", "id", "meta"] = "index",
                     reference_meta_field: Optional[str] = None) -> Answer

Transforms a string into an answer.

Specify reference_pattern to populate the answer's document_ids by extracting document references from the string.

:param string: The string to transform.
:param prompt: The prompt used to generate the answer.
:param documents: The documents used to generate the answer.
:param pattern: The regex pattern to use for parsing the answer.
        `[^\n]+$` will find "this is an answer" in string "this is an argument.

this is an answer". Answer: (.*) will find "this is an answer" in string "this is an argument. Answer: this is an answer". If None, the whole string is used as the answer. If not None, the first group of the regex is used as the answer. If there is no group, the whole match is used as the answer. :param reference_pattern: The regex pattern to use for parsing the document references. Example: \[(\d+)\] will find "1" in string "this is an answer[1]". If None, no parsing is done and all documents are referenced. :param reference_mode: The mode used to reference documents. Supported modes are: - index: the document references are the one-based index of the document in the list of documents. Example: "this is an answer[1]" will reference the first document in the list of documents. - id: the document references are the document IDs. Example: "this is an answer[123]" will reference the document with id "123". - meta: the document references are the value of a metadata field of the document. Example: "this is an answer[123]" will reference the document with the value "123" in the metadata field specified by reference_meta_field. :param reference_meta_field: The name of the metadata field to use for document references in reference_mode "meta". :return: The answer


def parse_references(
        string: str,
        reference_pattern: Optional[str] = None,
        candidates: Optional[Dict[str, str]] = None) -> Optional[List[str]]

Parses an answer string for document references and returns the document IDs of the referenced documents.


  • string: The string to parse.
  • reference_pattern: The regex pattern to use for parsing the document references. Example: \[(\d+)\] will find "1" in string "this is an answer[1]". If None, no parsing is done and all candidate document IDs are returned.
  • candidates: A dictionary of candidates to choose from. The keys are the reference strings and the values are the document IDs. If None, no parsing is done and None is returned.


A list of document IDs.


def answers_to_strings(
        answers: List[Answer],
        pattern: Optional[str] = None,
        str_replace: Optional[Dict[str, str]] = None) -> List[str]

Extracts the content field of answers and returns a list of strings.


assert answers_to_strings(
        pattern="[$idx] $answer",
        str_replace={"r": "R"}
    ) == ["[1] fiRst", "[2] second", "[3] thiRd"]


def strings_to_documents(
        strings: List[str],
        meta: Union[List[Optional[Dict[str, Any]]],
                    Optional[Dict[str, Any]]] = None,
        id_hash_keys: Optional[List[str]] = None) -> List[Document]

Transforms a list of strings into a list of documents. If you pass the metadata in a single dictionary, all documents get the same metadata. If you pass the metadata as a list, the length of this list must be the same as the length of the list of strings, and each document gets its own metadata. You can specify id_hash_keys only once and it gets assigned to all documents.


assert strings_to_documents(
        strings=["first", "second", "third"],
        meta=[{"position": i} for i in range(3)],
        id_hash_keys=['content', 'meta]
    ) == [
        Document(content="first", metadata={"position": 1}, id_hash_keys=['content', 'meta])]),
        Document(content="second", metadata={"position": 2}, id_hash_keys=['content', 'meta]),
        Document(content="third", metadata={"position": 3}, id_hash_keys=['content', 'meta])


def documents_to_strings(
        documents: List[Document],
        pattern: Optional[str] = None,
        str_replace: Optional[Dict[str, str]] = None) -> List[str]

Extracts the content field of documents and returns a list of strings. Use regext in the pattern parameter to control how the documents are represented.


assert documents_to_strings(
        pattern="[$idx] $content",
        str_replace={"r": "R"}
    ) == ["[1] fiRst", "[2] second", "[3] thiRd"]


class Shaper(BaseComponent)

Shaper is a component that can invoke arbitrary, registered functions on the invocation context (query, documents, and so on) of a pipeline. It then passes the new or modified variables further down the pipeline.

Using YAML configuration, the Shaper component is initialized with functions to invoke on pipeline invocation context.

For example, in the YAML snippet below:

    - name: shaper
      type: Shaper
        func: value_to_list
            value: query
            target_list: documents
        output: [questions]

the Shaper component is initialized with a directive to invoke function expand on the variable query and to store the result in the invocation context variable questions. All other invocation context variables are passed down the pipeline as they are.

You can use multiple Shaper components in a pipeline to modify the invocation context as needed.

Currently, Shaper supports the following functions:

  • rename
  • value_to_list
  • join_lists
  • join_strings
  • format_string
  • join_documents
  • join_documents_and_scores
  • format_document
  • format_answer
  • join_documents_to_string
  • strings_to_answers
  • string_to_answer
  • parse_references
  • answers_to_strings
  • join_lists
  • strings_to_documents
  • documents_to_strings

See their descriptions in the code for details about their inputs, outputs, and other parameters.


def __init__(func: str,
             outputs: List[str],
             inputs: Optional[Dict[str, Union[List[str], str]]] = None,
             params: Optional[Dict[str, Any]] = None,
             publish_outputs: Union[bool, List[str]] = True)

Initializes the Shaper component.

Some examples:

- name: shaper
  type: Shaper
  func: value_to_list
    value: query
    target_list: documents
    - questions

This node takes the content of query and creates a list that contains the value of query len(documents) times. This list is stored in the invocation context under the key questions.

- name: shaper
  type: Shaper
  func: join_documents
    value: documents
    delimiter: ' - '
    - documents

This node overwrites the content of documents in the invocation context with a list containing a single Document whose content is the concatenation of all the original Documents. So if documents contained [Document("A"), Document("B"), Document("C")], this shaper overwrites it with [Document("A - B - C")]

- name: shaper
  type: Shaper
  func: join_strings
    strings: ['a', 'b', 'c']
    delimiter: ' . '
    - single_string

- name: shaper
  type: Shaper
  func: strings_to_documents
    strings: single_string
      name: 'my_file.txt'
    - single_document

These two nodes, executed one after the other, first add a key in the invocation context called single_string that contains a . b . c, and then create another key called single_document that contains instead [Document(content="a . b . c", metadata={'name': 'my_file.txt'})].


  • func: The function to apply.
  • inputs: Maps the function's input kwargs to the key-value pairs in the invocation context. For example, value_to_list expects the value and target_list parameters, so inputs might contain: {'value': 'query', 'target_list': 'documents'}. It doesn't need to contain all keyword args, see params.
  • params: Maps the function's input kwargs to some fixed values. For example, value_to_list expects value and target_list parameters, so params might contain {'value': 'A', 'target_list': [1, 1, 1, 1]} and the node's output is ["A", "A", "A", "A"]. It doesn't need to contain all keyword args, see inputs. You can use params to provide fallback values for arguments of run that you're not sure exist. So if you need query to exist, you can provide a fallback value in the params, which will be used only if query is not passed to this node by the pipeline.
  • outputs: The key to store the outputs in the invocation context. The length of the outputs must match the number of outputs produced by the function invoked.
  • publish_outputs: Controls whether to publish the outputs to the pipeline's output. Set True (default value) to publishes all outputs or False to publish None. E.g. if outputs = ["documents"] result for publish_outputs = True looks like
        "invocation_context": {
            "documents": [...]
        "documents": [...]

For publish_outputs = False result looks like

        "invocation_context": {
            "documents": [...]

If you want to have finer-grained control, pass a list of the outputs you want to publish.