DocumentationAPI ReferenceπŸ““ TutorialsπŸ§‘β€πŸ³ Cookbook🀝 IntegrationsπŸ’œ Discord

Creating Custom Components

You can define custom components and incorporate them into your search engine. Create your own components and use them standalone or in Pipelines.

You can create custom components that perform various tasks, such as preprocessing data or filtering search results. They are useful when you want to customize the behavior of your search system to better meet your needs.

Haystack custom components are implemented as Python classes that inherit from the BaseComponent class. You can use them just like the regular component - in pipelines (including the pipeline YAMLs) or stand-alone.

Creating a Custom Component

Here's how you do it:

  1. Create a new class that inherits from BaseComponent.
  2. If your component's output is routed to a fixed number of components, set outgoing_edges as a class attribute.
    Most components have one outgoing edge. Decision components have more than one outgoing edge. If your component has a variable number of outgoing edges, define CustomComponent._calculate_outgoing_edges() to return that number. See FileClassifier._calculate_outgoing_edges() for an example.
  3. Define a run() method that is executed when the Pipeline calls your component. The input arguments should consist of all configuration parameters that you want to allow and the data arguments that you expect as input from a previous node. For example, data parameters can be documents, query, file_paths, and so on.
  4. Set run() to return a tuple. The first element of this tuple is an output dictionary of the data you want to pass to the next component. The second element in the tuple is the name of the outgoing edge (usually output_1).
  5. Define a run_batch() method that makes it possible for query pipelines to accept more than one query as input. You can define the same input arguments for it as you did for the run() method.
  6. Set run_batch() to return a tuple. The first element of the tuple is a dictionary with keys such as Answers or Documents (depending on the component). The second element of the tuple is the name of the outgoing edge (usually output_)
  7. Optional: Add any custom debug information to output["_debug"]. You can access this information in the pipeline output if you enable the debug mode.

Template:

Here's a template for creating a custom component:

from haystack.nodes.base import BaseComponent

class NodeTemplate(BaseComponent):
    # If it's not a decision component, there is only one outgoing edge
    outgoing_edges = 1

    def run(self, query: str, my_arg: Optional[int] = 10):
        # Insert code here to manipulate the input and produce an output dictionary
        ...
        output={
            "documents": ...,
            "_debug": {"anything": "you want"}
        }
        return output, "output_1"

    def run_batch(self, queries: List[str], my_arg: Optional[int] = 10):
        # Insert code here to manipulate the input and produce an output dictionary
        ...
        output={
            "documents": ...,
        }
        return output, "output_1"
        

A regular component has one outgoing edge and one return value. If you're creating a decision component, it typically has more than one outgoing edge. The run() and run_batch() methods of a decision component consist of a decision function that determines the path in the graph for sending the component's input. Such a function has more than one possible return value, and each return value is named accordingly, for example: output_1 and output_2.

Example

Let's say we wanted to add a custom translation module to our pipeline. Instead of just translating into one predefined language, our component should be able to return a summary in any language we want (that is, any language we have a trained model for). To that end, we define a CustomTranslator class. As there's no decision function involved, we set outgoing_edges = 1:

class CustomTranslator(BaseComponent):
    outgoing_edges = 1

    def __init__(some_param):
        # Store all init params in component's config so that we can easily save and load it via YAML
        self.translator = TransformersTranslator(model_name_or_path=f'Helsinki-NLP/opus-mt-en-{language}')

Within a pipeline component, the run() function is where all the action happens. Our run function receives a language argument that tells the translator which translation model to initialize:

def run(self, language='fr', **kwargs):
    return self.translator.run(kwargs['documents'])

We initialize this component directly when adding it to the pipeline. As usual, we specify a name and the input for this component:

pipeline.add_node(component=CustomTranslator(), name='CustomTranslator', inputs=['Summarizer'])

We can now call the pipeline with any Helsinki-NLP translation model from Hugging Face with English as a source language. Pipeline arguments are simply propagated through the pipeline. This means that if we want to pass a language value to our custom component, we can specify it in our call to the pipeline. Let's look at the French summary of a popular wizard sport:

query = "What's the history of Quidditch?"
result = pipeline.run(query=query, params={"retriever": {"top_k": 30}, "ranker": {"top_k": 20}, "language": "fr"})
result['documents'][0].text

>>> "''Quidditch'' a obtenu son nom du marais queerditch, l'emplacement du premier jeu enregistré. le jeu a été basé sur un jeu joué par une sorcière au 11ème siècle. un snitch d'or a été introduit à la suite d'un jeu 1269 joué en kent. on pense qu'une version balai du jeu peut avoir inspiré le mouvement du jeu moderne 'harlem shuffle'"

Now, how about Ukrainian?

result = pipeline.run(query=query, params={"retriever": {"top_k": 30}, "ranker": {"top_k": 30}, "language": "uk"})
result['documents'][0].text

>>> '" Quuiditch " ΠΎΡ‚Ρ€ΠΈΠΌΠ°Π»Π° свою Π½Π°Π·Π²Ρƒ Π²Ρ–Π΄ Π΄ΠΈΠ²Π½ΠΎΠ³ΠΎ Π±ΠΎΠ»ΠΎΡ‚Π°, місця ΠΏΠ΅Ρ€ΡˆΠΎΡ— Π² історії записаної Π³Ρ€ΠΈ. Π“Ρ€Ρƒ Π±ΡƒΠ»ΠΎ засновано Π½Π° Π³Ρ€Ρ–, яку Π³Ρ€Π°Π»Π° Π²Ρ–Π΄ΡŒΠΌΠ° Ρƒ XI столітті. Π—ΠΎΠ»ΠΎΡ‚ΠΈΠΉ стукач Π±ΡƒΠ»ΠΎ Π²Π²Π΅Π΄Π΅Π½ΠΎ Ρƒ Π³Ρ€Ρƒ 1269 Π³Ρ€ΠΈ Π² ΠΊΠ΅Π½Ρ‚Ρ–. Π’Π²Π°ΠΆΠ°Ρ”Ρ‚ΡŒΡΡ, Ρ‰ΠΎ вСрсія ΠΌΡ–Ρ‚Π»Π° Ρƒ Π³Ρ€Ρ–, ΠΌΠΎΠΆΠ»ΠΈΠ²ΠΎ, Π½Π°Π΄ΠΈΡ…Π½ΡƒΠ»Π° сучасну Π³Ρ€Ρƒ Π½Π° " заплутування " move " Π³Ρ€ΠΈ'

Related Links