Module base

BaseQueryClassifier

class BaseQueryClassifier(BaseComponent)

Abstract class for Query Classifiers

Module sklearn

SklearnQueryClassifier

class SklearnQueryClassifier(BaseQueryClassifier)

A node to classify an incoming query into one of two categories using a lightweight sklearn model. Depending on the result, the query flows to a different branch in your pipeline and the further processing can be customized. You can define this by connecting the further pipeline to either output_1 or output_2 from this node.

Example:

pipe = Pipeline()
pipe.add_node(component=SklearnQueryClassifier(), name="QueryClassifier", inputs=["Query"])
pipe.add_node(component=bm25_retriever, name="BM25Retriever", inputs=["QueryClassifier.output_2"])
pipe.add_node(component=dpr_retriever, name="DPRRetriever", inputs=["QueryClassifier.output_1"])

# Keyword queries will use the BM25Retriever
pipe.run("kubernetes aws")

# Semantic queries (questions, statements, sentences ...) will leverage the DPR retriever
pipe.run("How to manage kubernetes on aws")

Models:

Pass your own Sklearn binary classification model or use one of the following pretrained ones:

Keywords vs. Questions/Statements (Default) query_classifier can be found here query_vectorizer can be found here output_1 => question/statement output_2 => keyword query Readme
Questions vs. Statements query_classifier can be found here query_vectorizer can be found here output_1 => question output_2 => statement Readme

SklearnQueryClassifier.init

def __init__(
        model_name_or_path:
    Union[
        str,
        Any] = "https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier_2022/model.pickle",
        vectorizer_name_or_path:
    Union[
        str,
        Any] = "https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier_2022/vectorizer.pickle",
        batch_size: Optional[int] = None,
        progress_bar: bool = True)

Arguments:

model_name_or_path: Gradient boosting based binary classifier to classify between keyword vs statement/question queries or statement vs question queries.
vectorizer_name_or_path: A ngram based Tfidf vectorizer for extracting features from query.
batch_size: Number of queries to process at a time.
progress_bar: Whether to show a progress bar.

Module transformers

TransformersQueryClassifier

class TransformersQueryClassifier(BaseQueryClassifier)

A node to classify an incoming query into categories using a transformer model. Depending on the result, the query flows to a different branch in your pipeline and the further processing can be customized. You can define this by connecting the further pipeline to output_1, output_2, ..., output_n from this node. This node also supports zero-shot-classification.