DocumentationAPI ReferenceπŸ““ TutorialsπŸ§‘β€πŸ³ Cookbook🀝 IntegrationsπŸ’œ Discord

Abstract class for Document Language Classifiers.

Module base

BaseDocumentLanguageClassifier

class BaseDocumentLanguageClassifier(BaseComponent)

Abstract class for Document Language Classifiers.

BaseDocumentLanguageClassifier.__init__

def __init__(route_by_language: bool = True,
             languages_to_route: Optional[List[str]] = None)

Arguments:

  • route_by_language: Routes Documents to a different output edge depending on their language.
  • languages_to_route: A list of languages in ISO code, each corresponding to a different output edge (see langdetect documentation).

BaseDocumentLanguageClassifier.run

def run(documents: List[Document]) -> Tuple[Dict[str, List[Document]], str]

Run language document classifier on a list of documents.

Arguments:

  • documents: A list of documents whose language you want to detect.

BaseDocumentLanguageClassifier.run_batch

def run_batch(documents: List[List[Document]],
              batch_size: Optional[int] = None) -> Tuple[Dict, str]

Run language document classifier on batches of documents.

Arguments:

  • documents: A list of lists of documents whose language you want to detect.