Preprocessors
Module haystack_experimental.components.preprocessors.md_header_level_inferrer
MarkdownHeaderLevelInferrer
Infers and rewrites header levels in Markdown text to normalize hierarchy.
First header → Always becomes level 1 (#) Subsequent headers → Level increases if no content between headers, stays same if content exists Maximum level → Capped at 6 (######)
Usage example
python
from haystack import Document
from haystack_experimental.components.preprocessors import MarkdownHeaderLevelInferrer
# Create a document with uniform header levels
text = "## Title
## Subheader
Section
## Subheader
More Content"
doc = Document(content=text)
# Initialize the inferrer and process the document
inferrer = MarkdownHeaderLevelInferrer()
result = inferrer.run([doc])
# The headers are now normalized with proper hierarchy
print(result["documents"][0].content)
> # Title
## Subheader
Section
## Subheader
More Content
MarkdownHeaderLevelInferrer.__init__
Initializes the MarkdownHeaderLevelInferrer.
MarkdownHeaderLevelInferrer.run
Infers and rewrites the header levels in the content for documents that use uniform header levels.
Arguments:
documents: list of Document objects to process.
Returns:
dict: a dictionary with the key 'documents' containing the processed Document objects.