Creating Custom Components
You can define custom components and incorporate them into your search engine. Create your own components and use them standalone or in Pipelines.
You can create custom components that perform various tasks, such as preprocessing data or filtering search results. They are useful when you want to customize the behavior of your search system to better meet your needs.
Haystack custom components are implemented as Python classes that inherit from the BaseComponent
class. You can use them just like the regular component - in pipelines (including the pipeline YAMLs) or stand-alone.
Creating a Custom Component
Here's how you do it:
- Create a new class that inherits from
BaseComponent
. - If your component's output is routed to a fixed number of components, set
outgoing_edges
as a class attribute.
Most components have one outgoing edge. Decision components have more than one outgoing edge. If your component has a variable number of outgoing edges, defineCustomComponent._calculate_outgoing_edges()
to return that number. SeeFileClassifier._calculate_outgoing_edges()
for an example. - Define a
run()
method that is executed when the Pipeline calls your component. The input arguments should consist of all configuration parameters that you want to allow and the data arguments that you expect as input from a previous node. For example, data parameters can bedocuments
,query
,file_paths
, and so on. - Set
run()
to return a tuple. The first element of this tuple is an output dictionary of the data you want to pass to the next component. The second element in the tuple is the name of the outgoing edge (usuallyoutput_1
). - Define a
run_batch()
method that makes it possible for query pipelines to accept more than one query as input. You can define the same input arguments for it as you did for therun()
method. - Set
run_batch()
to return a tuple. The first element of the tuple is a dictionary with keys such as Answers or Documents (depending on the component). The second element of the tuple is the name of the outgoing edge (usuallyoutput_
) - Optional: Add any custom debug information to
output["_debug"]
. You can access this information in the pipeline output if you enable the debug mode.
Template:
Here's a template for creating a custom component:
from haystack.nodes.base import BaseComponent
class NodeTemplate(BaseComponent):
# If it's not a decision component, there is only one outgoing edge
outgoing_edges = 1
def run(self, query: str, my_arg: Optional[int] = 10):
# Insert code here to manipulate the input and produce an output dictionary
...
output={
"documents": ...,
"_debug": {"anything": "you want"}
}
return output, "output_1"
def run_batch(self, queries: List[str], my_arg: Optional[int] = 10):
# Insert code here to manipulate the input and produce an output dictionary
...
output={
"documents": ...,
}
return output, "output_1"
A regular component has one outgoing edge and one return value. If you're creating a decision component, it typically has more than one outgoing edge. The run()
and run_batch()
methods of a decision component consist of a decision function that determines the path in the graph for sending the component's input. Such a function has more than one possible return value, and each return value is named accordingly, for example: output_1
and output_2
.
Example
Let's say we wanted to add a custom translation module to our pipeline. Instead of just translating into one predefined language, our component should be able to return a summary in any language we want (that is, any language we have a trained model for). To that end, we define a CustomTranslator class. As there's no decision function involved, we set outgoing_edges = 1
:
class CustomTranslator(BaseComponent):
outgoing_edges = 1
def __init__(some_param):
# Store all init params in component's config so that we can easily save and load it via YAML
self.translator = TransformersTranslator(model_name_or_path=f'Helsinki-NLP/opus-mt-en-{language}')
Within a pipeline component, the run()
function is where all the action happens. Our run function receives a language argument that tells the translator which translation model to initialize:
def run(self, language='fr', **kwargs):
return self.translator.run(kwargs['documents'])
We initialize this component directly when adding it to the pipeline. As usual, we specify a name and the input for this component:
pipeline.add_node(component=CustomTranslator(), name='CustomTranslator', inputs=['Summarizer'])
We can now call the pipeline with any Helsinki-NLP translation model from Hugging Face with English as a source language. Pipeline arguments are simply propagated through the pipeline. This means that if we want to pass a language value to our custom component, we can specify it in our call to the pipeline. Let's look at the French summary of a popular wizard sport:
query = "What's the history of Quidditch?"
result = pipeline.run(query=query, params={"retriever": {"top_k": 30}, "ranker": {"top_k": 20}, "language": "fr"})
result['documents'][0].text
>>> "''Quidditch'' a obtenu son nom du marais queerditch, l'emplacement du premier jeu enregistré. le jeu a été basé sur un jeu joué par une sorcière au 11ème siècle. un snitch d'or a été introduit à la suite d'un jeu 1269 joué en kent. on pense qu'une version balai du jeu peut avoir inspiré le mouvement du jeu moderne 'harlem shuffle'"
Now, how about Ukrainian?
result = pipeline.run(query=query, params={"retriever": {"top_k": 30}, "ranker": {"top_k": 30}, "language": "uk"})
result['documents'][0].text
>>> '" Quuiditch " ΠΎΡΡΠΈΠΌΠ°Π»Π° ΡΠ²ΠΎΡ Π½Π°Π·Π²Ρ Π²ΡΠ΄ Π΄ΠΈΠ²Π½ΠΎΠ³ΠΎ Π±ΠΎΠ»ΠΎΡΠ°, ΠΌΡΡΡΡ ΠΏΠ΅ΡΡΠΎΡ Π² ΡΡΡΠΎΡΡΡ Π·Π°ΠΏΠΈΡΠ°Π½ΠΎΡ Π³ΡΠΈ. ΠΡΡ Π±ΡΠ»ΠΎ Π·Π°ΡΠ½ΠΎΠ²Π°Π½ΠΎ Π½Π° Π³ΡΡ, ΡΠΊΡ Π³ΡΠ°Π»Π° Π²ΡΠ΄ΡΠΌΠ° Ρ XI ΡΡΠΎΠ»ΡΡΡΡ. ΠΠΎΠ»ΠΎΡΠΈΠΉ ΡΡΡΠΊΠ°Ρ Π±ΡΠ»ΠΎ Π²Π²Π΅Π΄Π΅Π½ΠΎ Ρ Π³ΡΡ 1269 Π³ΡΠΈ Π² ΠΊΠ΅Π½ΡΡ. ΠΠ²Π°ΠΆΠ°ΡΡΡΡΡ, ΡΠΎ Π²Π΅ΡΡΡΡ ΠΌΡΡΠ»Π° Ρ Π³ΡΡ, ΠΌΠΎΠΆΠ»ΠΈΠ²ΠΎ, Π½Π°Π΄ΠΈΡ
Π½ΡΠ»Π° ΡΡΡΠ°ΡΠ½Ρ Π³ΡΡ Π½Π° " Π·Π°ΠΏΠ»ΡΡΡΠ²Π°Π½Π½Ρ " move " Π³ΡΠΈ'
Updated 9 months ago