Abstract class for Document Language Classifiers.
Module base
BaseDocumentLanguageClassifier
class BaseDocumentLanguageClassifier(BaseComponent)
Abstract class for Document Language Classifiers.
BaseDocumentLanguageClassifier.__init__
def __init__(route_by_language: bool = True,
languages_to_route: Optional[List[str]] = None)
Arguments:
route_by_language
: Routes Documents to a different output edge depending on their language.languages_to_route
: A list of languages in ISO code, each corresponding to a different output edge (see langdetect documentation).
BaseDocumentLanguageClassifier.run
def run(documents: List[Document]) -> Tuple[Dict[str, List[Document]], str]
Run language document classifier on a list of documents.
Arguments:
documents
: A list of documents whose language you want to detect.
BaseDocumentLanguageClassifier.run_batch
def run_batch(documents: List[List[Document]],
batch_size: Optional[int] = None) -> Tuple[Dict, str]
Run language document classifier on batches of documents.
Arguments:
documents
: A list of lists of documents whose language you want to detect.