Position in a Pipeline	After the PreProcessor in an indexing pipeline or after a Retriever in a query pipeline.
Input	Documents
Output	Documents
Classes	LangdetectDocumentLanguageClassifier TransformersDocumentLanguageClassifier

DocumentLanguageClassifier detects the language of the Documents you pass to it and attaches it to the Document's metadata like this:

'meta': {'name': 'document1.txt', 'language': 'en'}``

This node has multiple outgoing edges whose number corresponds to the number of languages you specify. You can use the languages to route parameter to add a list of languages you want DocumentLanguageClassifier to detect in your Documents. By default, the languages are: en (English), de (German), es (Spanish), cs(Czech), and nl (Dutch).

📘
It's important that all your Documents are in one of the languages you specify. If even one Document is in another language, DocumentLanguageClassifier breaks.

Available Classes

There are two classes of DocumentLanguageClassifier, here's how they differ:

LangdetectDocumentLanguageClassifier - Uses fast and lightweight langdetect library for detecting document language.
TransformersDocumentLanguageClassifier - Uses a transformer-based model for language classification. You can choose the model to use with this classifier.

Usage

You can use the node in a pipeline or on its own.

Stand-Alone

To initialize the node, run:

from haystack.nodes import LangdetectDocumentLanguageClassifier

doc_classifier = LangdetectDocumentLanguageClassifier()

In a Pipeline

pipeline = Pipeline()
pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"])
pipeline.add_node(component=doc_classifier, name='DocClassifier', inputs=['Retriever'])