DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio
API Reference

Distinguishes between text, PDF, Markdown, Docx and HTML files and routes them to the appropriate File Converter in an indexing pipeline.

Module file_type

FileTypeClassifier

class FileTypeClassifier(BaseComponent)

Route files in an Indexing Pipeline to corresponding file converters.

FileTypeClassifier.__init__

def __init__(supported_types: Optional[List[str]] = None)

Node that sends out files on a different output edge depending on their extension.

Arguments:

  • supported_types: The file types that this node can distinguish between. If no value is provided, the value created by default comprises: txt, pdf, md, docx, and html. Lists with duplicate elements are not allowed.

FileTypeClassifier.run

def run(file_paths: Union[Path, List[Path], str, List[str], List[Union[Path,
                                                                       str]]])

Sends out files on a different output edge depending on their extension.

Arguments:

  • file_paths: paths to route on different edges.