Version: 2.25

FastEmbed

haystack_integrations.components.embedders.fastembed.fastembed_document_embedder

FastembedDocumentEmbedder

FastembedDocumentEmbedder computes Document embeddings using Fastembed embedding models.

The embedding of each Document is stored in the embedding field of the Document.

Usage example:

python

# To use this component, install the "fastembed-haystack" package.
# pip install fastembed-haystack

from haystack_integrations.components.embedders.fastembed import FastembedDocumentEmbedder
from haystack.dataclasses import Document

doc_embedder = FastembedDocumentEmbedder(
    model="BAAI/bge-small-en-v1.5",
    batch_size=256,
)

# Text taken from PubMed QA Dataset (https://huggingface.co/datasets/pubmed_qa)
document_list = [
    Document(
        content=("Oxidative stress generated within inflammatory joints can produce autoimmune phenomena and joint "
                 "destruction. Radical species with oxidative activity, including reactive nitrogen species, "
                 "represent mediators of inflammation and cartilage damage."),
        meta={
            "pubid": "25,445,628",
            "long_answer": "yes",
        },
    ),
    Document(
        content=("Plasma levels of pancreatic polypeptide (PP) rise upon food intake. Although other pancreatic "
                 "islet hormones, such as insulin and glucagon, have been extensively investigated, PP secretion "
                 "and actions are still poorly understood."),
        meta={
            "pubid": "25,445,712",
            "long_answer": "yes",
        },
    ),
]

result = doc_embedder.run(document_list)
print(f"Document Text: {result['documents'][0].content}")
print(f"Document Embedding: {result['documents'][0].embedding}")
print(f"Embedding Dimension: {len(result['documents'][0].embedding)}")

init

python

__init__(
    model: str = "BAAI/bge-small-en-v1.5",
    cache_dir: str | None = None,
    threads: int | None = None,
    prefix: str = "",
    suffix: str = "",
    batch_size: int = 256,
    progress_bar: bool = True,
    parallel: int | None = None,
    local_files_only: bool = False,
    meta_fields_to_embed: list[str] | None = None,
    embedding_separator: str = "\n",
) -> None

Create an FastembedDocumentEmbedder component.

Parameters:

model (str) – Local path or name of the model in Hugging Face's model hub, such as BAAI/bge-small-en-v1.5.
cache_dir (str | None) – The path to the cache directory. Can be set using the FASTEMBED_CACHE_PATH env variable. Defaults to fastembed_cache in the system's temp directory.
threads (int | None) – The number of threads single onnxruntime session can use. Defaults to None.
prefix (str) – A string to add to the beginning of each text.
suffix (str) – A string to add to the end of each text.
batch_size (int) – Number of strings to encode at once.
progress_bar (bool) – If True, displays progress bar during embedding.
parallel (int | None) – If > 1, data-parallel encoding will be used, recommended for offline encoding of large datasets. If 0, use all available cores. If None, don't use data-parallel processing, use default onnxruntime threading instead.
local_files_only (bool) – If True, only use the model files in the cache_dir.
meta_fields_to_embed (list[str] | None) – List of meta fields that should be embedded along with the Document content.
embedding_separator (str) – Separator used to concatenate the meta fields to the Document content.

to_dict

python

to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

dict[str, Any] – Dictionary with serialized data.

warm_up

python

warm_up() -> None

Initializes the component.

run

python

run(documents: list[Document]) -> dict[str, list[Document]]

Embeds a list of Documents.

Parameters:

documents (list[Document]) – List of Documents to embed.

Returns:

dict[str, list[Document]] – A dictionary with the following keys:
documents: List of Documents with each Document's embedding field set to the computed embeddings.

Raises:

TypeError – If the input is not a list of Documents.

haystack_integrations.components.embedders.fastembed.fastembed_sparse_document_embedder

FastembedSparseDocumentEmbedder

FastembedSparseDocumentEmbedder computes Document embeddings using Fastembed sparse models.

Usage example:

python

from haystack_integrations.components.embedders.fastembed import FastembedSparseDocumentEmbedder
from haystack.dataclasses import Document

sparse_doc_embedder = FastembedSparseDocumentEmbedder(
    model="prithivida/Splade_PP_en_v1",
    batch_size=32,
)

# Text taken from PubMed QA Dataset (https://huggingface.co/datasets/pubmed_qa)
document_list = [
    Document(
        content=("Oxidative stress generated within inflammatory joints can produce autoimmune phenomena and joint "
                 "destruction. Radical species with oxidative activity, including reactive nitrogen species, "
                 "represent mediators of inflammation and cartilage damage."),
        meta={
            "pubid": "25,445,628",
            "long_answer": "yes",
        },
    ),
    Document(
        content=("Plasma levels of pancreatic polypeptide (PP) rise upon food intake. Although other pancreatic "
                 "islet hormones, such as insulin and glucagon, have been extensively investigated, PP secretion "
                 "and actions are still poorly understood."),
        meta={
            "pubid": "25,445,712",
            "long_answer": "yes",
        },
    ),
]

result = sparse_doc_embedder.run(document_list)
print(f"Document Text: {result['documents'][0].content}")
print(f"Document Sparse Embedding: {result['documents'][0].sparse_embedding}")
print(f"Sparse Embedding Dimension: {len(result['documents'][0].sparse_embedding)}")

init

python

__init__(
    model: str = "prithivida/Splade_PP_en_v1",
    cache_dir: str | None = None,
    threads: int | None = None,
    batch_size: int = 32,
    progress_bar: bool = True,
    parallel: int | None = None,
    local_files_only: bool = False,
    meta_fields_to_embed: list[str] | None = None,
    embedding_separator: str = "\n",
    model_kwargs: dict[str, Any] | None = None,
) -> None

Create an FastembedDocumentEmbedder component.

Parameters:

model (str) – Local path or name of the model in Hugging Face's model hub, such as prithivida/Splade_PP_en_v1.
cache_dir (str | None) – The path to the cache directory. Can be set using the FASTEMBED_CACHE_PATH env variable. Defaults to fastembed_cache in the system's temp directory.
threads (int | None) – The number of threads single onnxruntime session can use.
batch_size (int) – Number of strings to encode at once.
progress_bar (bool) – If True, displays progress bar during embedding.
parallel (int | None) – If > 1, data-parallel encoding will be used, recommended for offline encoding of large datasets. If 0, use all available cores. If None, don't use data-parallel processing, use default onnxruntime threading instead.
local_files_only (bool) – If True, only use the model files in the cache_dir.
meta_fields_to_embed (list[str] | None) – List of meta fields that should be embedded along with the Document content.
embedding_separator (str) – Separator used to concatenate the meta fields to the Document content.
model_kwargs (dict[str, Any] | None) – Dictionary containing model parameters such as k, b, avg_len, language.

to_dict

python

to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

dict[str, Any] – Dictionary with serialized data.

warm_up

python

warm_up() -> None

Initializes the component.

run

python

run(documents: list[Document]) -> dict[str, list[Document]]

Embeds a list of Documents.

Parameters:

documents (list[Document]) – List of Documents to embed.

Returns:

dict[str, list[Document]] – A dictionary with the following keys:
documents: List of Documents with each Document's sparse_embedding field set to the computed embeddings.

Raises:

TypeError – If the input is not a list of Documents.

haystack_integrations.components.embedders.fastembed.fastembed_sparse_text_embedder

FastembedSparseTextEmbedder

FastembedSparseTextEmbedder computes string embedding using fastembed sparse models.

Usage example:

python

from haystack_integrations.components.embedders.fastembed import FastembedSparseTextEmbedder

text = ("It clearly says online this will work on a Mac OS system. "
        "The disk comes and it does not, only Windows. Do Not order this if you have a Mac!!")

sparse_text_embedder = FastembedSparseTextEmbedder(
    model="prithivida/Splade_PP_en_v1"
)

sparse_embedding = sparse_text_embedder.run(text)["sparse_embedding"]

init

python

__init__(
    model: str = "prithivida/Splade_PP_en_v1",
    cache_dir: str | None = None,
    threads: int | None = None,
    progress_bar: bool = True,
    parallel: int | None = None,
    local_files_only: bool = False,
    model_kwargs: dict[str, Any] | None = None,
) -> None

Create a FastembedSparseTextEmbedder component.

Parameters:

model (str) – Local path or name of the model in Fastembed's model hub, such as prithivida/Splade_PP_en_v1
cache_dir (str | None) – The path to the cache directory. Can be set using the FASTEMBED_CACHE_PATH env variable. Defaults to fastembed_cache in the system's temp directory.
threads (int | None) – The number of threads single onnxruntime session can use. Defaults to None.
progress_bar (bool) – If True, displays progress bar during embedding.
parallel (int | None) – If > 1, data-parallel encoding will be used, recommended for offline encoding of large datasets. If 0, use all available cores. If None, don't use data-parallel processing, use default onnxruntime threading instead.
local_files_only (bool) – If True, only use the model files in the cache_dir.
model_kwargs (dict[str, Any] | None) – Dictionary containing model parameters such as k, b, avg_len, language.

to_dict

python

to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

dict[str, Any] – Dictionary with serialized data.

warm_up

python

warm_up() -> None

Initializes the component.

run

python

run(text: str) -> dict[str, SparseEmbedding]

Embeds text using the Fastembed model.

Parameters:

text (str) – A string to embed.

Returns:

dict[str, SparseEmbedding] – A dictionary with the following keys:
embedding: A list of floats representing the embedding of the input text.

Raises:

TypeError – If the input is not a string.

haystack_integrations.components.embedders.fastembed.fastembed_text_embedder

FastembedTextEmbedder

FastembedTextEmbedder computes string embedding using fastembed embedding models.

Usage example:

python

from haystack_integrations.components.embedders.fastembed import FastembedTextEmbedder

text = ("It clearly says online this will work on a Mac OS system. "
        "The disk comes and it does not, only Windows. Do Not order this if you have a Mac!!")

text_embedder = FastembedTextEmbedder(
    model="BAAI/bge-small-en-v1.5"
)

embedding = text_embedder.run(text)["embedding"]

init

python

__init__(
    model: str = "BAAI/bge-small-en-v1.5",
    cache_dir: str | None = None,
    threads: int | None = None,
    prefix: str = "",
    suffix: str = "",
    progress_bar: bool = True,
    parallel: int | None = None,
    local_files_only: bool = False,
) -> None

Create a FastembedTextEmbedder component.

Parameters:

model (str) – Local path or name of the model in Fastembed's model hub, such as BAAI/bge-small-en-v1.5
cache_dir (str | None) – The path to the cache directory. Can be set using the FASTEMBED_CACHE_PATH env variable. Defaults to fastembed_cache in the system's temp directory.
threads (int | None) – The number of threads single onnxruntime session can use. Defaults to None.
prefix (str) – A string to add to the beginning of each text.
suffix (str) – A string to add to the end of each text.
progress_bar (bool) – If True, displays progress bar during embedding.
parallel (int | None) – If > 1, data-parallel encoding will be used, recommended for offline encoding of large datasets. If 0, use all available cores. If None, don't use data-parallel processing, use default onnxruntime threading instead.
local_files_only (bool) – If True, only use the model files in the cache_dir.

to_dict

python

to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

dict[str, Any] – Dictionary with serialized data.

warm_up

python

warm_up() -> None

Initializes the component.

run

python

run(text: str) -> dict[str, list[float]]

Embeds text using the Fastembed model.

Parameters:

text (str) – A string to embed.

Returns:

dict[str, list[float]] – A dictionary with the following keys:
embedding: A list of floats representing the embedding of the input text.

Raises:

TypeError – If the input is not a string.

haystack_integrations.components.rankers.fastembed.late_interaction_ranker

FastembedLateInteractionRanker

Ranks Documents based on their similarity to the query using ColBERT models via Fastembed.

Uses late interaction (MaxSim) scoring to compute token-level similarity between query and document embeddings, then ranks documents accordingly.

See https://qdrant.github.io/fastembed/examples/Supported_Models/ for supported models.

Usage example:

python

from haystack import Document
from haystack_integrations.components.rankers.fastembed import FastembedLateInteractionRanker

ranker = FastembedLateInteractionRanker(model_name="colbert-ir/colbertv2.0", top_k=2)

docs = [Document(content="Paris"), Document(content="Berlin")]
query = "What is the capital of germany?"
output = ranker.run(query=query, documents=docs)
print(output["documents"][0].content)

# Berlin

init

python

__init__(
    model_name: str = "colbert-ir/colbertv2.0",
    top_k: int = 10,
    cache_dir: str | None = None,
    threads: int | None = None,
    batch_size: int = 64,
    parallel: int | None = None,
    local_files_only: bool = False,
    meta_fields_to_embed: list[str] | None = None,
    meta_data_separator: str = "\n",
    score_threshold: float | None = None,
) -> None

Creates an instance of the 'FastembedLateInteractionRanker'.

Parameters:

model_name (str) – Fastembed ColBERT model name. Check the list of supported models in the Fastembed documentation.
top_k (int) – The maximum number of documents to return.
cache_dir (str | None) – The path to the cache directory. Can be set using the FASTEMBED_CACHE_PATH env variable. Defaults to fastembed_cache in the system's temp directory.
threads (int | None) – The number of threads single onnxruntime session can use. Defaults to None.
batch_size (int) – Number of strings to encode at once.
parallel (int | None) – If > 1, data-parallel encoding will be used, recommended for offline encoding of large datasets. If 0, use all available cores. If None, don't use data-parallel processing, use default onnxruntime threading instead.
local_files_only (bool) – If True, only use the model files in the cache_dir.
meta_fields_to_embed (list[str] | None) – List of meta fields that should be concatenated with the document content for reranking.
meta_data_separator (str) – Separator used to concatenate the meta fields to the Document content.
score_threshold (float | None) – If provided, only documents with a score above the threshold are returned. Note that ColBERT scores are unnormalized sums and typically range from 3 to 25.

to_dict

python

to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

dict[str, Any] – Dictionary with serialized data.

from_dict

python

from_dict(data: dict[str, Any]) -> FastembedLateInteractionRanker

Deserializes the component from a dictionary.

Parameters:

data (dict[str, Any]) – The dictionary to deserialize from.

Returns:

FastembedLateInteractionRanker – The deserialized component.

warm_up

python

warm_up() -> None

Initializes the component.

run

python

run(
    query: str, documents: list[Document], top_k: int | None = None
) -> dict[str, list[Document]]

Returns a list of documents ranked by their similarity to the given query using ColBERT MaxSim scoring.

Parameters:

query (str) – The input query to compare the documents to.
documents (list[Document]) – A list of documents to be ranked.
top_k (int | None) – The maximum number of documents to return.

Returns:

dict[str, list[Document]] – A dictionary with the following keys:
documents: A list of documents closest to the query, sorted from most similar to least similar.

Raises:

ValueError – If top_k is not > 0.

haystack_integrations.components.rankers.fastembed.ranker

FastembedRanker

Ranks Documents based on their similarity to the query using Fastembed models.

See https://qdrant.github.io/fastembed/examples/Supported_Models/ for supported models.

Documents are indexed from most to least semantically relevant to the query.

Usage example:

python

from haystack import Document
from haystack_integrations.components.rankers.fastembed import FastembedRanker

ranker = FastembedRanker(model_name="Xenova/ms-marco-MiniLM-L-6-v2", top_k=2)

docs = [Document(content="Paris"), Document(content="Berlin")]
query = "What is the capital of germany?"
output = ranker.run(query=query, documents=docs)
print(output["documents"][0].content)

# Berlin

init

python

__init__(
    model_name: str = "Xenova/ms-marco-MiniLM-L-6-v2",
    top_k: int = 10,
    cache_dir: str | None = None,
    threads: int | None = None,
    batch_size: int = 64,
    parallel: int | None = None,
    local_files_only: bool = False,
    meta_fields_to_embed: list[str] | None = None,
    meta_data_separator: str = "\n",
    score_threshold: float | None = None,
) -> None

Creates an instance of the 'FastembedRanker'.

Parameters:

model_name (str) – Fastembed model name. Check the list of supported models in the Fastembed documentation.
top_k (int) – The maximum number of documents to return.
cache_dir (str | None) – The path to the cache directory. Can be set using the FASTEMBED_CACHE_PATH env variable. Defaults to fastembed_cache in the system's temp directory.
threads (int | None) – The number of threads single onnxruntime session can use. Defaults to None.
batch_size (int) – Number of strings to encode at once.
parallel (int | None) – If > 1, data-parallel encoding will be used, recommended for offline encoding of large datasets. If 0, use all available cores. If None, don't use data-parallel processing, use default onnxruntime threading instead.
local_files_only (bool) – If True, only use the model files in the cache_dir.
meta_fields_to_embed (list[str] | None) – List of meta fields that should be concatenated with the document content for reranking.
meta_data_separator (str) – Separator used to concatenate the meta fields to the Document content.
score_threshold (float | None) – If provided, only documents with a score above the threshold are returned. Applied after top_k, so the output may contain fewer than top_k documents.

to_dict

python

to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

dict[str, Any] – Dictionary with serialized data.

from_dict

python

from_dict(data: dict[str, Any]) -> FastembedRanker

Deserializes the component from a dictionary.

Parameters:

data (dict[str, Any]) – The dictionary to deserialize from.

Returns:

FastembedRanker – The deserialized component.

warm_up

python

warm_up() -> None

Initializes the component.

run

python

run(
    query: str, documents: list[Document], top_k: int | None = None
) -> dict[str, list[Document]]

Returns a list of documents ranked by their similarity to the given query, using FastEmbed.

Parameters:

query (str) – The input query to compare the documents to.
documents (list[Document]) – A list of documents to be ranked.
top_k (int | None) – The maximum number of documents to return.

Returns:

dict[str, list[Document]] – A dictionary with the following keys:
documents: A list of documents closest to the query, sorted from most similar to least similar.

Raises:

ValueError – If top_k is not > 0.

haystack_integrations.components.embedders.fastembed.fastembed_document_embedder​

FastembedDocumentEmbedder​

init​

to_dict​

warm_up​

run​

haystack_integrations.components.embedders.fastembed.fastembed_sparse_document_embedder​

FastembedSparseDocumentEmbedder​

init​

to_dict​

warm_up​

run​

haystack_integrations.components.embedders.fastembed.fastembed_sparse_text_embedder​

FastembedSparseTextEmbedder​

init​

to_dict​

warm_up​

run​

haystack_integrations.components.embedders.fastembed.fastembed_text_embedder​

FastembedTextEmbedder​

init​

to_dict​

warm_up​

run​

haystack_integrations.components.rankers.fastembed.late_interaction_ranker​

FastembedLateInteractionRanker​

init​

to_dict​

from_dict​

warm_up​

run​

haystack_integrations.components.rankers.fastembed.ranker​

FastembedRanker​

init​

to_dict​

from_dict​

warm_up​

run​

haystack_integrations.components.embedders.fastembed.fastembed_document_embedder

FastembedDocumentEmbedder

init

to_dict

warm_up

run

haystack_integrations.components.embedders.fastembed.fastembed_sparse_document_embedder

FastembedSparseDocumentEmbedder

init

to_dict

warm_up

run

haystack_integrations.components.embedders.fastembed.fastembed_sparse_text_embedder

FastembedSparseTextEmbedder

init

to_dict

warm_up

run

haystack_integrations.components.embedders.fastembed.fastembed_text_embedder

FastembedTextEmbedder

init

to_dict

warm_up

run

haystack_integrations.components.rankers.fastembed.late_interaction_ranker

FastembedLateInteractionRanker

init

to_dict

from_dict

warm_up

run

haystack_integrations.components.rankers.fastembed.ranker

FastembedRanker

init

to_dict

from_dict

warm_up

run