Module file_type

FileTypeClassifier

class FileTypeClassifier(BaseComponent)

Route files in an Indexing Pipeline to corresponding file converters.

FileTypeClassifier.init

def __init__(supported_types: Optional[List[str]] = None,
             full_analysis: bool = False,
             raise_on_error: bool = True)

Node that sends out files on a different output edge depending on their extension.

Arguments:

supported_types: The file types this node distinguishes. Optional. If you don't provide any value, the default is: txt, pdf, md, docx, and html. You can't use lists with duplicate elements.
full_analysis: If True, the whole file is analyzed to determine the file type. If False, only the first 2049 bytes are analyzed.
raise_on_error: If True, the node will raise an exception if the file type is not supported.

FileTypeClassifier.run

def run(file_paths: Union[Path, List[Path], str, List[str], List[Union[Path,
                                                                       str]]])

Sends out files on a different output edge depending on their extension.

Arguments:

file_paths: paths to route on different edges.

Module file_type

FileTypeClassifier

FileTypeClassifier.__init__

FileTypeClassifier.run

FileTypeClassifier.init