Module shaper

rename

def rename(value: Any) -> Any

An identity function. You can use it to rename values in the invocation context without changing them.

Example:

assert rename(1) == 1

current_datetime

def current_datetime(format: str = "%H:%M:%S %d/%m/%y") -> str

Function that outputs the current time and/or date formatted according to the parameters.

Example:

assert current_datetime("%d.%m.%y %H:%M:%S") == 01.01.2023 12:30:10

value_to_list

def value_to_list(value: Any, target_list: List[Any]) -> List[Any]

Transforms a value into a list containing this value as many times as the length of the target list.

Example:

assert value_to_list(value=1, target_list=list(range(5))) == [1, 1, 1, 1, 1]

join_lists

def join_lists(lists: List[List[Any]]) -> List[Any]

Joins the lists you pass to it into a single list.

Example:

assert join_lists(lists=[[1, 2, 3], [4, 5]]) == [1, 2, 3, 4, 5]

join_strings

def join_strings(strings: List[str],
                 delimiter: str = " ",
                 str_replace: Optional[Dict[str, str]] = None) -> str

Transforms a list of strings into a single string. The content of this string is the content of all of the original strings separated by the delimiter you specify.

Example:

assert join_strings(strings=["first", "second", "third"], delimiter=" - ", str_replace={"r": "R"}) == "fiRst - second - thiRd"

format_string

def format_string(string: str,
                  str_replace: Optional[Dict[str, str]] = None) -> str

Replaces strings.

Example:

assert format_string(string="first", str_replace={"r": "R"}) == "fiRst"

join_documents

def join_documents(
        documents: List[Document],
        delimiter: str = " ",
        pattern: Optional[str] = None,
        str_replace: Optional[Dict[str, str]] = None) -> List[Document]

Transforms a list of documents into a list containing a single document. The content of this document is the joined result of all original documents, separated by the delimiter you specify. Use regex in the pattern parameter to control how each document is represented. You can use the following placeholders:

$content: The content of the document.
$idx: The index of the document in the list.
$id: The ID of the document.
$META_FIELD: The value of the metadata field called 'META_FIELD'.

All metadata is dropped.

Example:

assert join_documents(
    documents=[
        Document(content="first"),
        Document(content="second"),
        Document(content="third")
    ],
    delimiter=" - ",
    pattern="[$idx] $content",
    str_replace={"r": "R"}
) == [Document(content="[1] fiRst - [2] second - [3] thiRd")]

join_documents_and_scores

def join_documents_and_scores(
        documents: List[Document]) -> Tuple[List[Document]]

Transforms a list of documents with scores in their metadata into a list containing a single document. The resulting document contains the scores and the contents of all the original documents. All metadata is dropped.

Example:

assert join_documents_and_scores(
    documents=[
        Document(content="first", meta={"score": 0.9}),
        Document(content="second", meta={"score": 0.7}),
        Document(content="third", meta={"score": 0.5})
    ],
    delimiter=" - "
) == ([Document(content="-[0.9] first
-[0.7] second
-[0.5] third")], )

format_document

def format_document(document: Document,
                    pattern: Optional[str] = None,
                    str_replace: Optional[Dict[str, str]] = None,
                    idx: Optional[int] = None) -> str

Transforms a document into a single string. Use regex in the pattern parameter to control how the document is represented. You can use the following placeholders:

$content: The content of the document.
$idx: The index of the document in the list.
$id: The ID of the document.
$META_FIELD: The value of the metadata field called 'META_FIELD'.

Example:

assert format_document(
    document=Document(content="first"),
    pattern="prefix [$idx] $content",
    str_replace={"r": "R"},
    idx=1,
) == "prefix [1] fiRst"

format_answer

def format_answer(answer: Answer,
                  pattern: Optional[str] = None,
                  str_replace: Optional[Dict[str, str]] = None,
                  idx: Optional[int] = None) -> str

Transforms an answer into a single string. Use regex in the pattern parameter to control how the answer is represented. You can use the following placeholders:

$answer: The answer text.
$idx: The index of the answer in the list.
$META_FIELD: The value of the metadata field called 'META_FIELD'.

Example:

assert format_answer(
    answer=Answer(answer="first"),
    pattern="prefix [$idx] $answer",
    str_replace={"r": "R"},
    idx=1,
) == "prefix [1] fiRst"

join_documents_to_string

def join_documents_to_string(
        documents: List[Document],
        delimiter: str = " ",
        pattern: Optional[str] = None,
        str_replace: Optional[Dict[str, str]] = None) -> str

Transforms a list of documents into a single string. The content of this string is the joined result of all original documents separated by the delimiter you specify. Use regex in the pattern parameter to control how the documents are represented. You can use the following placeholders:

$content: The content of the document.
$idx: The index of the document in the list.
$id: The ID of the document.
$META_FIELD: The value of the metadata field called 'META_FIELD'.

Example:

assert join_documents_to_string(
    documents=[
        Document(content="first"),
        Document(content="second"),
        Document(content="third")
    ],
    delimiter=" - ",
    pattern="[$idx] $content",
    str_replace={"r": "R"}
) == "[1] fiRst - [2] second - [3] thiRd"

strings_to_answers

def strings_to_answers(
        strings: List[str],
        prompts: Optional[List[Union[str, List[Dict[str, str]]]]] = None,
        documents: Optional[List[Document]] = None,
        pattern: Optional[str] = None,
        reference_pattern: Optional[str] = None,
        reference_mode: Literal["index", "id", "meta"] = "index",
        reference_meta_field: Optional[str] = None) -> List[Answer]

Transforms a list of strings into a list of answers.

Specify reference_pattern to populate the answer's document_ids by extracting document references from the strings.

:param strings: The list of strings to transform.
:param prompts: The prompts used to generate the answers.
:param documents: The documents used to generate the answers.
:param pattern: The regex pattern to use for parsing the answer.
    Examples:
        `[^\n]+$` will find "this is an answer" in string "this is an argument.

this is an answer". Answer: (.*) will find "this is an answer" in string "this is an argument. Answer: this is an answer". If None, the whole string is used as the answer. If not None, the first group of the regex is used as the answer. If there is no group, the whole match is used as the answer. :param reference_pattern: The regex pattern to use for parsing the document references. Example: \[(\d+)\] will find "1" in string "this is an answer[1]". If None, no parsing is done and all documents are referenced. :param reference_mode: The mode used to reference documents. Supported modes are: - index: the document references are the one-based index of the document in the list of documents. Example: "this is an answer[1]" will reference the first document in the list of documents. - id: the document references are the document IDs. Example: "this is an answer[123]" will reference the document with id "123". - meta: the document references are the value of a metadata field of the document. Example: "this is an answer[123]" will reference the document with the value "123" in the metadata field specified by reference_meta_field. :param reference_meta_field: The name of the metadata field to use for document references in reference_mode "meta". :return: The list of answers.

Examples:

Without reference parsing:
```python
assert strings_to_answers(strings=["first", "second", "third"], prompt="prompt", documents=[Document(id="123", content="content")]) == [
        Answer(answer="first", type="generative", document_ids=["123"], meta={"prompt": "prompt"}),
        Answer(answer="second", type="generative", document_ids=["123"], meta={"prompt": "prompt"}),
        Answer(answer="third", type="generative", document_ids=["123"], meta={"prompt": "prompt"}),
    ]
```

With reference parsing:
```python
assert strings_to_answers(strings=["first[1]", "second[2]", "third[1][3]"], prompt="prompt",
        documents=[Document(id="123", content="content"), Document(id="456", content="content"), Document(id="789", content="content")],
        reference_pattern=r"\[(\d+)\]",
        reference_mode="index"
    ) == [
        Answer(answer="first", type="generative", document_ids=["123"], meta={"prompt": "prompt"}),
        Answer(answer="second", type="generative", document_ids=["456"], meta={"prompt": "prompt"}),
        Answer(answer="third", type="generative", document_ids=["123", "789"], meta={"prompt": "prompt"}),
    ]
```

string_to_answer

def string_to_answer(string: str,
                     prompt: Optional[Union[str, List[Dict[str, str]]]],
                     documents: Optional[List[Document]],
                     pattern: Optional[str] = None,
                     reference_pattern: Optional[str] = None,
                     reference_mode: Literal["index", "id", "meta"] = "index",
                     reference_meta_field: Optional[str] = None) -> Answer

Transforms a string into an answer.

Specify reference_pattern to populate the answer's document_ids by extracting document references from the string.

:param string: The string to transform.
:param prompt: The prompt used to generate the answer.
:param documents: The documents used to generate the answer.
:param pattern: The regex pattern to use for parsing the answer.
    Examples:
        `[^\n]+$` will find "this is an answer" in string "this is an argument.

parse_references

def parse_references(
        string: str,
        reference_pattern: Optional[str] = None,
        candidates: Optional[Dict[str, str]] = None) -> Optional[List[str]]

Parses an answer string for document references and returns the document IDs of the referenced documents.

Arguments:

string: The string to parse.
reference_pattern: The regex pattern to use for parsing the document references. Example: \[(\d+)\] will find "1" in string "this is an answer[1]". If None, no parsing is done and all candidate document IDs are returned.
candidates: A dictionary of candidates to choose from. The keys are the reference strings and the values are the document IDs. If None, no parsing is done and None is returned.

Returns:

A list of document IDs.

answers_to_strings

def answers_to_strings(
        answers: List[Answer],
        pattern: Optional[str] = None,
        str_replace: Optional[Dict[str, str]] = None) -> List[str]

Extracts the content field of answers and returns a list of strings.

Example:

assert answers_to_strings(
        answers=[
            Answer(answer="first"),
            Answer(answer="second"),
            Answer(answer="third")
        ],
        pattern="[$idx] $answer",
        str_replace={"r": "R"}
    ) == ["[1] fiRst", "[2] second", "[3] thiRd"]

strings_to_documents

def strings_to_documents(
        strings: List[str],
        meta: Union[List[Optional[Dict[str, Any]]],
                    Optional[Dict[str, Any]]] = None,
        id_hash_keys: Optional[List[str]] = None) -> List[Document]

Transforms a list of strings into a list of documents. If you pass the metadata in a single dictionary, all documents get the same metadata. If you pass the metadata as a list, the length of this list must be the same as the length of the list of strings, and each document gets its own metadata. You can specify id_hash_keys only once and it gets assigned to all documents.

Example:

assert strings_to_documents(
        strings=["first", "second", "third"],
        meta=[{"position": i} for i in range(3)],
        id_hash_keys=['content', 'meta]
    ) == [
        Document(content="first", metadata={"position": 1}, id_hash_keys=['content', 'meta])]),
        Document(content="second", metadata={"position": 2}, id_hash_keys=['content', 'meta]),
        Document(content="third", metadata={"position": 3}, id_hash_keys=['content', 'meta])
    ]

documents_to_strings

def documents_to_strings(
        documents: List[Document],
        pattern: Optional[str] = None,
        str_replace: Optional[Dict[str, str]] = None) -> List[str]

Extracts the content field of documents and returns a list of strings. Use regext in the pattern parameter to control how the documents are represented.

Example:

assert documents_to_strings(
        documents=[
            Document(content="first"),
            Document(content="second"),
            Document(content="third")
        ],
        pattern="[$idx] $content",
        str_replace={"r": "R"}
    ) == ["[1] fiRst", "[2] second", "[3] thiRd"]

Shaper

class Shaper(BaseComponent)

Shaper is a component that can invoke arbitrary, registered functions on the invocation context (query, documents, and so on) of a pipeline. It then passes the new or modified variables further down the pipeline.

Using YAML configuration, the Shaper component is initialized with functions to invoke on pipeline invocation context.

For example, in the YAML snippet below:

    components:
    - name: shaper
      type: Shaper
      params:
        func: value_to_list
        inputs:
            value: query
            target_list: documents
        output: [questions]

the Shaper component is initialized with a directive to invoke function expand on the variable query and to store the result in the invocation context variable questions. All other invocation context variables are passed down the pipeline as they are.

You can use multiple Shaper components in a pipeline to modify the invocation context as needed.

Currently, Shaper supports the following functions:

rename
value_to_list
join_lists
join_strings
format_string
join_documents
join_documents_and_scores
format_document
format_answer
join_documents_to_string
strings_to_answers
string_to_answer
parse_references
answers_to_strings
join_lists
strings_to_documents
documents_to_strings

See their descriptions in the code for details about their inputs, outputs, and other parameters.

Shaper.init

def __init__(func: str,
             outputs: List[str],
             inputs: Optional[Dict[str, Union[List[str], str]]] = None,
             params: Optional[Dict[str, Any]] = None,
             publish_outputs: Union[bool, List[str]] = True)

Initializes the Shaper component.

Some examples:

- name: shaper
  type: Shaper
  params:
  func: value_to_list
  inputs:
    value: query
    target_list: documents
  outputs:
    - questions

This node takes the content of query and creates a list that contains the value of query len(documents) times. This list is stored in the invocation context under the key questions.

- name: shaper
  type: Shaper
  params:
  func: join_documents
  inputs:
    value: documents
  params:
    delimiter: ' - '
  outputs:
    - documents

This node overwrites the content of documents in the invocation context with a list containing a single Document whose content is the concatenation of all the original Documents. So if documents contained [Document("A"), Document("B"), Document("C")], this shaper overwrites it with [Document("A - B - C")]

- name: shaper
  type: Shaper
  params:
  func: join_strings
  params:
    strings: ['a', 'b', 'c']
    delimiter: ' . '
  outputs:
    - single_string

- name: shaper
  type: Shaper
  params:
  func: strings_to_documents
  inputs:
    strings: single_string
    metadata:
      name: 'my_file.txt'
  outputs:
    - single_document

These two nodes, executed one after the other, first add a key in the invocation context called single_string that contains a . b . c, and then create another key called single_document that contains instead [Document(content="a . b . c", metadata={'name': 'my_file.txt'})].

Arguments:

func: The function to apply.
inputs: Maps the function's input kwargs to the key-value pairs in the invocation context. For example, value_to_list expects the value and target_list parameters, so inputs might contain: {'value': 'query', 'target_list': 'documents'}. It doesn't need to contain all keyword args, see params.
params: Maps the function's input kwargs to some fixed values. For example, value_to_list expects value and target_list parameters, so params might contain {'value': 'A', 'target_list': [1, 1, 1, 1]} and the node's output is ["A", "A", "A", "A"]. It doesn't need to contain all keyword args, see inputs. You can use params to provide fallback values for arguments of run that you're not sure exist. So if you need query to exist, you can provide a fallback value in the params, which will be used only if query is not passed to this node by the pipeline.
outputs: The key to store the outputs in the invocation context. The length of the outputs must match the number of outputs produced by the function invoked.
publish_outputs: Controls whether to publish the outputs to the pipeline's output. Set True (default value) to publishes all outputs or False to publish None. E.g. if outputs = ["documents"] result for publish_outputs = True looks like

    {
        "invocation_context": {
            "documents": [...]
        },
        "documents": [...]
    }

For publish_outputs = False result looks like

    {
        "invocation_context": {
            "documents": [...]
        },
    }

If you want to have finer-grained control, pass a list of the outputs you want to publish.

Module shaper

rename

current_datetime

value_to_list

join_lists

join_strings

format_string

join_documents

join_documents_and_scores

format_document

format_answer

join_documents_to_string

strings_to_answers

string_to_answer

parse_references

answers_to_strings

strings_to_documents

documents_to_strings

Shaper

Shaper.__init__

Shaper.init