Pipeline Components Overview
Components are the building blocks of a pipeline. They perform tasks such as preprocessing, retrieving, or summarizing text while routing queries through different branches of a pipeline. This page is a summary of all component types available in Haystack.
Components are chained together using a pipeline, and they function like building blocks that can be easily switched out for each other. A component takes the output of the previous component (or components) as input.
Components are also sometimes called nodes.
Available Components
We grouped the components to give you an overview of what functions they can perform in a pipeline:
- Custom Components - these are the components you can create yourself.
- Data Handling - these are all the components you can use to preprocess and handle your data.
- Semantic Search - these are the components that are best for semantic search pipelines.
- Prompts and LLMs - these are all the components that bring the power of large language models to your search systems.
- Routing- these components route data to other components in the pipeline.
- Utility Components - these are all the helper components, such as the components used to join answers or documents, summarize documents, translate them, and more.
- Extras - these components are not part of the Haystack core. They live in a separate repo called haystack-extras.
Decision Components
You can add decision components where only one "branch" is executed afterwards. You can use decision components to classify an incoming query and, depending on the result, route it to different modules. To find a ready-made example of a decision node, have a look at QueryClassifier.
You can also create a custom decision component. To do this, create a class that looks like this:
class QueryClassifier(BaseComponent):
outgoing_edges = 2
def run(self, query):
if "?" in query:
return {}, "output_1"
else:
return {}, "output_2"
pipe = Pipeline()
pipe.add_node(component=QueryClassifier(), name="QueryClassifier", inputs=["Query"])
pipe.add_node(component=es_retriever, name="ESRetriever", inputs=["QueryClassifier.output_1"])
pipe.add_node(component=dpr_retriever, name="DPRRetriever", inputs=["QueryClassifier.output_2"])
pipe.add_node(component=JoinDocuments(join_mode="concatenate"), name="JoinResults",
inputs=["ESRetriever", "DPRRetriever"])
pipe.add_node(component=reader, name="QAReader", inputs=["JoinResults"])
res = p.run(query="What did Einstein work on?", params={"ESRetriever": {"top_k": 1}, "DPRRetriever": {"top_k": 3}})
Usage
All components are designed to be usable within a pipeline. When you add a component to a pipeline and call Pipeline.run()
, it calls each component's run()
method in the predefined sequence. The same is true for Pipeline.run_batch()
that you can use if you want to ask multiple queries. It calls each component's run_batch()
method. For more information, see the pipelines page.
Alternatively, you can also call the components outside of the pipeline. See each individual component documentation page to learn more about its available methods.
Updated over 1 year ago