HomeDocumentationAPI ReferenceTutorials
Haystack Homepage

A helper node with a variety of functions.

Module shaper

rename

def rename(value: Any) -> Tuple[Any]

Identity function. Can be used to rename values in the invocation context without changing them.

Example:

assert rename(1) == (1, )

value_to_list

def value_to_list(value: Any, target_list: List[Any]) -> Tuple[List[Any]]

Transforms a value into a list containing this value as many times as the length of the target list.

Example:

assert value_to_list(value=1, target_list=list(range(5))) == ([1, 1, 1, 1, 1], )

join_lists

def join_lists(lists: List[List[Any]]) -> Tuple[List[Any]]

Joins the passed lists into a single one.

Example:

assert join_lists(lists=[[1, 2, 3], [4, 5]]) == ([1, 2, 3, 4, 5], )

join_strings

def join_strings(strings: List[str], delimiter: str = " ") -> Tuple[str]

Transforms a list of strings into a single string. The content of this string is the content of all original strings separated by the delimiter you specify.

Example:

assert join_strings(strings=["first", "second", "third"], delimiter=" - ") == ("first - second - third", )

join_documents

def join_documents(documents: List[Document],
                   delimiter: str = " ") -> Tuple[List[Document]]

Transforms a list of documents into a list containing a single Document. The content of this list is the content of all original documents separated by the delimiter you specify.

All metadata is dropped. (TODO: fix)

Example:

assert join_documents(
    documents=[
        Document(content="first"),
        Document(content="second"),
        Document(content="third")
    ],
    delimiter=" - "
) == ([Document(content="first - second - third")], )

strings_to_answers

def strings_to_answers(strings: List[str]) -> Tuple[List[Answer]]

Transforms a list of strings into a list of Answers.

Example:

assert strings_to_answers(strings=["first", "second", "third"]) == ([
        Answer(answer="first"),
        Answer(answer="second"),
        Answer(answer="third"),
    ], )

answers_to_strings

def answers_to_strings(answers: List[Answer]) -> Tuple[List[str]]

Extracts the content field of Documents and returns a list of strings.

Example:

assert answers_to_strings(
        answers=[
            Answer(answer="first"),
            Answer(answer="second"),
            Answer(answer="third")
        ]
    ) == (["first", "second", "third"],)

strings_to_documents

def strings_to_documents(
        strings: List[str],
        meta: Union[List[Optional[Dict[str, Any]]],
                    Optional[Dict[str, Any]]] = None,
        id_hash_keys: Optional[List[str]] = None) -> Tuple[List[Document]]

Transforms a list of strings into a list of Documents. If you pass the metadata in a single dictionary, all Documents get the same metadata. If you pass the metadata as a list, the length of this list must be the same as the length of the list of strings, and each Document gets its own metadata. You can specify id_hash_keys only once and it gets assigned to all Documents.

Example:

assert strings_to_documents(
        strings=["first", "second", "third"],
        meta=[{"position": i} for i in range(3)],
        id_hash_keys=['content', 'meta]
    ) == ([
        Document(content="first", metadata={"position": 1}, id_hash_keys=['content', 'meta])]),
        Document(content="second", metadata={"position": 2}, id_hash_keys=['content', 'meta]),
        Document(content="third", metadata={"position": 3}, id_hash_keys=['content', 'meta])
    ], )

documents_to_strings

def documents_to_strings(documents: List[Document]) -> Tuple[List[str]]

Extracts the content field of Documents and returns a list of strings.

Example:

assert documents_to_strings(
        documents=[
            Document(content="first"),
            Document(content="second"),
            Document(content="third")
        ]
    ) == (["first", "second", "third"],)

Shaper

class Shaper(BaseComponent)

Shaper is a component that can invoke arbitrary, registered functions on the invocation context (query, documents, and so on) of a pipeline. It then passes the new or modified variables further down the pipeline.

Using YAML configuration, the Shaper component is initialized with functions to invoke on pipeline invocation context.

For example, in the YAML snippet below:

    components:
    - name: shaper
      type: Shaper
      params:
        func: value_to_list
        inputs:
            value: query
            target_list: documents
        output: [questions]

Shaper component is initialized with a directive to invoke function expand on the variable query and to store the result in the invocation context variable questions. All other invocation context variables are passed down the pipeline as they are.

Shaper is especially useful for pipelines with PromptNodes, where we need to modify the invocation context to match the templates of PromptNodes.

You can use multiple Shaper components in a pipeline to modify the invocation context as needed.

Shaper supports the current functions:

  • value_to_list
  • join_strings
  • join_documents
  • join_lists
  • strings_to_documents
  • documents_to_strings

See their descriptions in the code for details about their inputs, outputs, and other parameters.

Shaper.__init__

def __init__(func: str,
             outputs: List[str],
             inputs: Optional[Dict[str, Union[List[str], str]]] = None,
             params: Optional[Dict[str, Any]] = None,
             publish_outputs: Union[bool, List[str]] = True)

Initializes the Shaper component.

Some examples:

- name: shaper
  type: Shaper
  params:
  func: value_to_list
  inputs:
    value: query
    target_list: documents
  outputs:
    - questions

This node takes the content of query and creates a list that contains the value of query len(documents) times. This list is stored in the invocation context under the key questions.

- name: shaper
  type: Shaper
  params:
  func: join_documents
  inputs:
    value: documents
  params:
    delimiter: ' - '
  outputs:
    - documents

This node overwrites the content of documents in the invocation context with a list containing a single Document whose content is the concatenation of all the original Documents. So if documents contained [Document("A"), Document("B"), Document("C")], this shaper overwrites it with [Document("A - B - C")]

- name: shaper
  type: Shaper
  params:
  func: join_strings
  params:
    strings: ['a', 'b', 'c']
    delimiter: ' . '
  outputs:
    - single_string

- name: shaper
  type: Shaper
  params:
  func: strings_to_documents
  inputs:
    strings: single_string
    metadata:
      name: 'my_file.txt'
  outputs:
    - single_document

These two nodes, executed one after the other, first add a key in the invocation context called single_string that contains a . b . c, and then create another key called single_document that contains instead [Document(content="a . b . c", metadata={'name': 'my_file.txt'})].

Arguments:

  • func: The function to apply.
  • inputs: Maps the function's input kwargs to the key-value pairs in the invocation context. For example, value_to_list expects the value and target_list parameters, so inputs might contain: {'value': 'query', 'target_list': 'documents'}. It doesn't need to contain all keyword args, see params.
  • params: Maps the function's input kwargs to some fixed values. For example, value_to_list expects value and target_list parameters, so params might contain {'value': 'A', 'target_list': [1, 1, 1, 1]} and the node's output is ["A", "A", "A", "A"]. It doesn't need to contain all keyword args, see inputs. You can use params to provide fallback values for arguments of run that you're not sure exist. So if you need query to exist, you can provide a fallback value in the params, which will be used only if query is not passed to this node by the pipeline.
  • outputs: The key to store the outputs in the invocation context. The length of the outputs must match the number of outputs produced by the function invoked.
  • publish_outputs: Controls whether to publish the outputs to the pipeline's output. Set True (default value) to publishes all outputs or False to publish None. E.g. if outputs = ["documents"] result for publish_outputs = True looks like
    {
        "invocation_context": {
            "documents": [...]
        },
        "documents": [...]
    }

For publish_outputs = False result looks like

    {
        "invocation_context": {
            "documents": [...]
        },
    }

If you want to have finer-grained control, pass a list of the outputs you want to publish.