A helper node with a variety of functions.
Module shaper
rename
def rename(value: Any) -> Any
An identity function. You can use it to rename values in the invocation context without changing them.
Example:
assert rename(1) == 1
current_datetime
def current_datetime(format: str = "%H:%M:%S %d/%m/%y") -> str
Function that outputs the current time and/or date formatted according to the parameters.
Example:
assert current_datetime("%d.%m.%y %H:%M:%S") == 01.01.2023 12:30:10
value_to_list
def value_to_list(value: Any, target_list: List[Any]) -> List[Any]
Transforms a value into a list containing this value as many times as the length of the target list.
Example:
assert value_to_list(value=1, target_list=list(range(5))) == [1, 1, 1, 1, 1]
join_lists
def join_lists(lists: List[List[Any]]) -> List[Any]
Joins the lists you pass to it into a single list.
Example:
assert join_lists(lists=[[1, 2, 3], [4, 5]]) == [1, 2, 3, 4, 5]
join_strings
def join_strings(strings: List[str],
delimiter: str = " ",
str_replace: Optional[Dict[str, str]] = None) -> str
Transforms a list of strings into a single string. The content of this string is the content of all of the original strings separated by the delimiter you specify.
Example:
assert join_strings(strings=["first", "second", "third"], delimiter=" - ", str_replace={"r": "R"}) == "fiRst - second - thiRd"
format_string
def format_string(string: str,
str_replace: Optional[Dict[str, str]] = None) -> str
Replaces strings.
Example:
assert format_string(string="first", str_replace={"r": "R"}) == "fiRst"
join_documents
def join_documents(
documents: List[Document],
delimiter: str = " ",
pattern: Optional[str] = None,
str_replace: Optional[Dict[str, str]] = None) -> List[Document]
Transforms a list of documents into a list containing a single document. The content of this document
is the joined result of all original documents, separated by the delimiter you specify.
Use regex in the pattern
parameter to control how each document is represented.
You can use the following placeholders:
- $content: The content of the document.
- $idx: The index of the document in the list.
- $id: The ID of the document.
- $META_FIELD: The value of the metadata field called 'META_FIELD'.
All metadata is dropped.
Example:
assert join_documents(
documents=[
Document(content="first"),
Document(content="second"),
Document(content="third")
],
delimiter=" - ",
pattern="[$idx] $content",
str_replace={"r": "R"}
) == [Document(content="[1] fiRst - [2] second - [3] thiRd")]
join_documents_and_scores
def join_documents_and_scores(
documents: List[Document]) -> Tuple[List[Document]]
Transforms a list of documents with scores in their metadata into a list containing a single document. The resulting document contains the scores and the contents of all the original documents. All metadata is dropped.
Example:
assert join_documents_and_scores(
documents=[
Document(content="first", meta={"score": 0.9}),
Document(content="second", meta={"score": 0.7}),
Document(content="third", meta={"score": 0.5})
],
delimiter=" - "
) == ([Document(content="-[0.9] first
-[0.7] second
-[0.5] third")], )
format_document
def format_document(document: Document,
pattern: Optional[str] = None,
str_replace: Optional[Dict[str, str]] = None,
idx: Optional[int] = None) -> str
Transforms a document into a single string.
Use regex in the pattern
parameter to control how the document is represented.
You can use the following placeholders:
- $content: The content of the document.
- $idx: The index of the document in the list.
- $id: The ID of the document.
- $META_FIELD: The value of the metadata field called 'META_FIELD'.
Example:
assert format_document(
document=Document(content="first"),
pattern="prefix [$idx] $content",
str_replace={"r": "R"},
idx=1,
) == "prefix [1] fiRst"
format_answer
def format_answer(answer: Answer,
pattern: Optional[str] = None,
str_replace: Optional[Dict[str, str]] = None,
idx: Optional[int] = None) -> str
Transforms an answer into a single string.
Use regex in the pattern
parameter to control how the answer is represented.
You can use the following placeholders:
- $answer: The answer text.
- $idx: The index of the answer in the list.
- $META_FIELD: The value of the metadata field called 'META_FIELD'.
Example:
assert format_answer(
answer=Answer(answer="first"),
pattern="prefix [$idx] $answer",
str_replace={"r": "R"},
idx=1,
) == "prefix [1] fiRst"
join_documents_to_string
def join_documents_to_string(
documents: List[Document],
delimiter: str = " ",
pattern: Optional[str] = None,
str_replace: Optional[Dict[str, str]] = None) -> str
Transforms a list of documents into a single string. The content of this string
is the joined result of all original documents separated by the delimiter you specify.
Use regex in the pattern
parameter to control how the documents are represented.
You can use the following placeholders:
- $content: The content of the document.
- $idx: The index of the document in the list.
- $id: The ID of the document.
- $META_FIELD: The value of the metadata field called 'META_FIELD'.
Example:
assert join_documents_to_string(
documents=[
Document(content="first"),
Document(content="second"),
Document(content="third")
],
delimiter=" - ",
pattern="[$idx] $content",
str_replace={"r": "R"}
) == "[1] fiRst - [2] second - [3] thiRd"
strings_to_answers
def strings_to_answers(
strings: List[str],
prompts: Optional[List[Union[str, List[Dict[str, str]]]]] = None,
documents: Optional[List[Document]] = None,
pattern: Optional[str] = None,
reference_pattern: Optional[str] = None,
reference_mode: Literal["index", "id", "meta"] = "index",
reference_meta_field: Optional[str] = None) -> List[Answer]
Transforms a list of strings into a list of answers.
Specify reference_pattern
to populate the answer's document_ids
by extracting document references from the strings.
:param strings: The list of strings to transform.
:param prompts: The prompts used to generate the answers.
:param documents: The documents used to generate the answers.
:param pattern: The regex pattern to use for parsing the answer.
Examples:
`[^\n]+$` will find "this is an answer" in string "this is an argument.
this is an answer".
Answer: (.*)
will find "this is an answer" in string "this is an argument. Answer: this is an answer".
If None, the whole string is used as the answer. If not None, the first group of the regex is used as the answer. If there is no group, the whole match is used as the answer.
:param reference_pattern: The regex pattern to use for parsing the document references.
Example: \[(\d+)\]
will find "1" in string "this is an answer[1]".
If None, no parsing is done and all documents are referenced.
:param reference_mode: The mode used to reference documents. Supported modes are:
- index: the document references are the one-based index of the document in the list of documents.
Example: "this is an answer[1]" will reference the first document in the list of documents.
- id: the document references are the document IDs.
Example: "this is an answer[123]" will reference the document with id "123".
- meta: the document references are the value of a metadata field of the document.
Example: "this is an answer[123]" will reference the document with the value "123" in the metadata field specified by reference_meta_field.
:param reference_meta_field: The name of the metadata field to use for document references in reference_mode "meta".
:return: The list of answers.
Examples:
Without reference parsing:
```python
assert strings_to_answers(strings=["first", "second", "third"], prompt="prompt", documents=[Document(id="123", content="content")]) == [
Answer(answer="first", type="generative", document_ids=["123"], meta={"prompt": "prompt"}),
Answer(answer="second", type="generative", document_ids=["123"], meta={"prompt": "prompt"}),
Answer(answer="third", type="generative", document_ids=["123"], meta={"prompt": "prompt"}),
]
```
With reference parsing:
```python
assert strings_to_answers(strings=["first[1]", "second[2]", "third[1][3]"], prompt="prompt",
documents=[Document(id="123", content="content"), Document(id="456", content="content"), Document(id="789", content="content")],
reference_pattern=r"\[(\d+)\]",
reference_mode="index"
) == [
Answer(answer="first", type="generative", document_ids=["123"], meta={"prompt": "prompt"}),
Answer(answer="second", type="generative", document_ids=["456"], meta={"prompt": "prompt"}),
Answer(answer="third", type="generative", document_ids=["123", "789"], meta={"prompt": "prompt"}),
]
```
string_to_answer
def string_to_answer(string: str,
prompt: Optional[Union[str, List[Dict[str, str]]]],
documents: Optional[List[Document]],
pattern: Optional[str] = None,
reference_pattern: Optional[str] = None,
reference_mode: Literal["index", "id", "meta"] = "index",
reference_meta_field: Optional[str] = None) -> Answer
Transforms a string into an answer.
Specify reference_pattern
to populate the answer's document_ids
by extracting document references from the string.
:param string: The string to transform.
:param prompt: The prompt used to generate the answer.
:param documents: The documents used to generate the answer.
:param pattern: The regex pattern to use for parsing the answer.
Examples:
`[^\n]+$` will find "this is an answer" in string "this is an argument.
this is an answer".
Answer: (.*)
will find "this is an answer" in string "this is an argument. Answer: this is an answer".
If None, the whole string is used as the answer. If not None, the first group of the regex is used as the answer. If there is no group, the whole match is used as the answer.
:param reference_pattern: The regex pattern to use for parsing the document references.
Example: \[(\d+)\]
will find "1" in string "this is an answer[1]".
If None, no parsing is done and all documents are referenced.
:param reference_mode: The mode used to reference documents. Supported modes are:
- index: the document references are the one-based index of the document in the list of documents.
Example: "this is an answer[1]" will reference the first document in the list of documents.
- id: the document references are the document IDs.
Example: "this is an answer[123]" will reference the document with id "123".
- meta: the document references are the value of a metadata field of the document.
Example: "this is an answer[123]" will reference the document with the value "123" in the metadata field specified by reference_meta_field.
:param reference_meta_field: The name of the metadata field to use for document references in reference_mode "meta".
:return: The answer
parse_references
def parse_references(
string: str,
reference_pattern: Optional[str] = None,
candidates: Optional[Dict[str, str]] = None) -> Optional[List[str]]
Parses an answer string for document references and returns the document IDs of the referenced documents.
Arguments:
string
: The string to parse.reference_pattern
: The regex pattern to use for parsing the document references. Example:\[(\d+)\]
will find "1" in string "this is an answer[1]". If None, no parsing is done and all candidate document IDs are returned.candidates
: A dictionary of candidates to choose from. The keys are the reference strings and the values are the document IDs. If None, no parsing is done and None is returned.
Returns:
A list of document IDs.
answers_to_strings
def answers_to_strings(
answers: List[Answer],
pattern: Optional[str] = None,
str_replace: Optional[Dict[str, str]] = None) -> List[str]
Extracts the content field of answers and returns a list of strings.
Example:
assert answers_to_strings(
answers=[
Answer(answer="first"),
Answer(answer="second"),
Answer(answer="third")
],
pattern="[$idx] $answer",
str_replace={"r": "R"}
) == ["[1] fiRst", "[2] second", "[3] thiRd"]
strings_to_documents
def strings_to_documents(
strings: List[str],
meta: Union[List[Optional[Dict[str, Any]]],
Optional[Dict[str, Any]]] = None,
id_hash_keys: Optional[List[str]] = None) -> List[Document]
Transforms a list of strings into a list of documents. If you pass the metadata in a single
dictionary, all documents get the same metadata. If you pass the metadata as a list, the length of this list
must be the same as the length of the list of strings, and each document gets its own metadata.
You can specify id_hash_keys
only once and it gets assigned to all documents.
Example:
assert strings_to_documents(
strings=["first", "second", "third"],
meta=[{"position": i} for i in range(3)],
id_hash_keys=['content', 'meta]
) == [
Document(content="first", metadata={"position": 1}, id_hash_keys=['content', 'meta])]),
Document(content="second", metadata={"position": 2}, id_hash_keys=['content', 'meta]),
Document(content="third", metadata={"position": 3}, id_hash_keys=['content', 'meta])
]
documents_to_strings
def documents_to_strings(
documents: List[Document],
pattern: Optional[str] = None,
str_replace: Optional[Dict[str, str]] = None) -> List[str]
Extracts the content field of documents and returns a list of strings. Use regext in the pattern
parameter to control how the documents are represented.
Example:
assert documents_to_strings(
documents=[
Document(content="first"),
Document(content="second"),
Document(content="third")
],
pattern="[$idx] $content",
str_replace={"r": "R"}
) == ["[1] fiRst", "[2] second", "[3] thiRd"]
Shaper
class Shaper(BaseComponent)
Shaper is a component that can invoke arbitrary, registered functions on the invocation context (query, documents, and so on) of a pipeline. It then passes the new or modified variables further down the pipeline.
Using YAML configuration, the Shaper component is initialized with functions to invoke on pipeline invocation context.
For example, in the YAML snippet below:
components:
- name: shaper
type: Shaper
params:
func: value_to_list
inputs:
value: query
target_list: documents
output: [questions]
the Shaper component is initialized with a directive to invoke function expand on the variable query and to store the result in the invocation context variable questions. All other invocation context variables are passed down the pipeline as they are.
You can use multiple Shaper components in a pipeline to modify the invocation context as needed.
Currently, Shaper
supports the following functions:
rename
value_to_list
join_lists
join_strings
format_string
join_documents
join_documents_and_scores
format_document
format_answer
join_documents_to_string
strings_to_answers
string_to_answer
parse_references
answers_to_strings
join_lists
strings_to_documents
documents_to_strings
See their descriptions in the code for details about their inputs, outputs, and other parameters.
Shaper.__init__
def __init__(func: str,
outputs: List[str],
inputs: Optional[Dict[str, Union[List[str], str]]] = None,
params: Optional[Dict[str, Any]] = None,
publish_outputs: Union[bool, List[str]] = True)
Initializes the Shaper component.
Some examples:
- name: shaper
type: Shaper
params:
func: value_to_list
inputs:
value: query
target_list: documents
outputs:
- questions
This node takes the content of query
and creates a list that contains the value of query
len(documents)
times.
This list is stored in the invocation context under the key questions
.
- name: shaper
type: Shaper
params:
func: join_documents
inputs:
value: documents
params:
delimiter: ' - '
outputs:
- documents
This node overwrites the content of documents
in the invocation context with a list containing a single Document
whose content is the concatenation of all the original Documents. So if documents
contained
[Document("A"), Document("B"), Document("C")]
, this shaper overwrites it with [Document("A - B - C")]
- name: shaper
type: Shaper
params:
func: join_strings
params:
strings: ['a', 'b', 'c']
delimiter: ' . '
outputs:
- single_string
- name: shaper
type: Shaper
params:
func: strings_to_documents
inputs:
strings: single_string
metadata:
name: 'my_file.txt'
outputs:
- single_document
These two nodes, executed one after the other, first add a key in the invocation context called single_string
that contains a . b . c
, and then create another key called single_document
that contains instead
[Document(content="a . b . c", metadata={'name': 'my_file.txt'})]
.
Arguments:
func
: The function to apply.inputs
: Maps the function's input kwargs to the key-value pairs in the invocation context. For example,value_to_list
expects thevalue
andtarget_list
parameters, soinputs
might contain:{'value': 'query', 'target_list': 'documents'}
. It doesn't need to contain all keyword args, seeparams
.params
: Maps the function's input kwargs to some fixed values. For example,value_to_list
expectsvalue
andtarget_list
parameters, soparams
might contain{'value': 'A', 'target_list': [1, 1, 1, 1]}
and the node's output is["A", "A", "A", "A"]
. It doesn't need to contain all keyword args, seeinputs
. You can use params to provide fallback values for arguments ofrun
that you're not sure exist. So if you needquery
to exist, you can provide a fallback value in the params, which will be used only ifquery
is not passed to this node by the pipeline.outputs
: The key to store the outputs in the invocation context. The length of the outputs must match the number of outputs produced by the function invoked.publish_outputs
: Controls whether to publish the outputs to the pipeline's output. SetTrue
(default value) to publishes all outputs orFalse
to publish None. E.g. ifoutputs = ["documents"]
result forpublish_outputs = True
looks like
{
"invocation_context": {
"documents": [...]
},
"documents": [...]
}
For publish_outputs = False
result looks like
{
"invocation_context": {
"documents": [...]
},
}
If you want to have finer-grained control, pass a list of the outputs you want to publish.