DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio
API Reference

Sweeps through a Document Store and returns a set of candidate Documents that are relevant to the query.

Module in_memory/bm25_retriever

InMemoryBM25Retriever

Retrieves documents using the BM25 algorithm.

Usage example:

from haystack import Document
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.document_stores.in_memory import InMemoryDocumentStore

docs = [
    Document(content="Python is a popular programming language"),
    Document(content="python ist eine beliebte Programmiersprache"),
]

doc_store = InMemoryDocumentStore()
doc_store.write_documents(docs)
retriever = InMemoryBM25Retriever(doc_store)

result = retriever.run(query="Programmiersprache")

print(result["documents"])

InMemoryBM25Retriever.__init__

def __init__(document_store: InMemoryDocumentStore,
             filters: Optional[Dict[str, Any]] = None,
             top_k: int = 10,
             scale_score: bool = False,
             filter_policy: FilterPolicy = FilterPolicy.REPLACE)

Create the InMemoryBM25Retriever component.

Arguments:

  • document_store: An instance of InMemoryDocumentStore.
  • filters: A dictionary with filters to narrow down the search space.
  • top_k: The maximum number of documents to retrieve.
  • scale_score: Scales the BM25 score to a unit interval in the range of 0 to 1, where 1 means extremely relevant. If set to False, uses raw similarity scores.
  • filter_policy: The filter policy to apply during retrieval.

Raises:

  • ValueError: If the specified top_k is not > 0.

InMemoryBM25Retriever.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

InMemoryBM25Retriever.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "InMemoryBM25Retriever"

Deserializes the component from a dictionary.

Arguments:

  • data: The dictionary to deserialize from.

Returns:

The deserialized component.

InMemoryBM25Retriever.run

@component.output_types(documents=List[Document])
def run(query: str,
        filters: Optional[Dict[str, Any]] = None,
        top_k: Optional[int] = None,
        scale_score: Optional[bool] = None)

Run the InMemoryBM25Retriever on the given input data.

Arguments:

  • query: The query string for the Retriever.
  • filters: A dictionary with filters to narrow down the search space.
  • top_k: The maximum number of documents to return.
  • scale_score: Scales the BM25 score to a unit interval in the range of 0 to 1, where 1 means extremely relevant. If set to False, uses raw similarity scores. If not specified, the value provided at initialization is used.

Raises:

  • ValueError: If the specified DocumentStore is not found or is not a InMemoryDocumentStore instance.

Returns:

The retrieved documents.

Module in_memory/embedding_retriever

InMemoryEmbeddingRetriever

Retrieves documents using vector similarity.

Usage example:

from haystack import Document
from haystack.components.embedders import SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.document_stores.in_memory import InMemoryDocumentStore

docs = [
    Document(content="Python is a popular programming language"),
    Document(content="python ist eine beliebte Programmiersprache"),
]
doc_embedder = SentenceTransformersDocumentEmbedder()
doc_embedder.warm_up()
docs_with_embeddings = doc_embedder.run(docs)["documents"]

doc_store = InMemoryDocumentStore()
doc_store.write_documents(docs_with_embeddings)
retriever = InMemoryEmbeddingRetriever(doc_store)

query="Programmiersprache"
text_embedder = SentenceTransformersTextEmbedder()
text_embedder.warm_up()
query_embedding = text_embedder.run(query)["embedding"]

result = retriever.run(query_embedding=query_embedding)

print(result["documents"])

InMemoryEmbeddingRetriever.__init__

def __init__(document_store: InMemoryDocumentStore,
             filters: Optional[Dict[str, Any]] = None,
             top_k: int = 10,
             scale_score: bool = False,
             return_embedding: bool = False,
             filter_policy: FilterPolicy = FilterPolicy.REPLACE)

Create the InMemoryEmbeddingRetriever component.

Arguments:

  • document_store: An instance of InMemoryDocumentStore.
  • filters: A dictionary with filters to narrow down the search space.
  • top_k: The maximum number of documents to retrieve.
  • scale_score: Scales the BM25 score to a unit interval in the range of 0 to 1, where 1 means extremely relevant. If set to False, uses raw similarity scores.
  • return_embedding: Whether to return the embedding of the retrieved Documents.
  • filter_policy: The filter policy to apply during retrieval.

Raises:

  • ValueError: If the specified top_k is not > 0.

InMemoryEmbeddingRetriever.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

InMemoryEmbeddingRetriever.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "InMemoryEmbeddingRetriever"

Deserializes the component from a dictionary.

Arguments:

  • data: The dictionary to deserialize from.

Returns:

The deserialized component.

InMemoryEmbeddingRetriever.run

@component.output_types(documents=List[Document])
def run(query_embedding: List[float],
        filters: Optional[Dict[str, Any]] = None,
        top_k: Optional[int] = None,
        scale_score: Optional[bool] = None,
        return_embedding: Optional[bool] = None)

Run the InMemoryEmbeddingRetriever on the given input data.

Arguments:

  • query_embedding: Embedding of the query.
  • filters: A dictionary with filters to narrow down the search space.
  • top_k: The maximum number of documents to return.
  • scale_score: Scales the similarity score to a unit interval in the range of 0 to 1, where 1 means extremely relevant. If set to False, uses raw similarity scores. If not specified, the value provided at initialization is used.
  • return_embedding: Whether to return the embedding of the retrieved Documents.

Raises:

  • ValueError: If the specified DocumentStore is not found or is not an InMemoryDocumentStore instance.

Returns:

The retrieved documents.

Module filter_retriever

FilterRetriever

Retrieves documents that match the provided filters.

Usage example:

from haystack import Document
from haystack.components.retrievers import FilterRetriever
from haystack.document_stores.in_memory import InMemoryDocumentStore

docs = [
    Document(content="Python is a popular programming language", meta={"lang": "en"}),
    Document(content="python ist eine beliebte Programmiersprache", meta={"lang": "de"}),
]

doc_store = InMemoryDocumentStore()
doc_store.write_documents(docs)
retriever = FilterRetriever(doc_store, filters={"field": "lang", "operator": "==", "value": "en"})

# if passed in the run method, filters will override those provided at initialization
result = retriever.run(filters={"field": "lang", "operator": "==", "value": "de"})

print(result["documents"])

FilterRetriever.__init__

def __init__(document_store: DocumentStore,
             filters: Optional[Dict[str, Any]] = None)

Create the FilterRetriever component.

Arguments:

  • document_store: An instance of a DocumentStore.
  • filters: A dictionary with filters to narrow down the search space.

FilterRetriever.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

FilterRetriever.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "FilterRetriever"

Deserializes the component from a dictionary.

Arguments:

  • data: The dictionary to deserialize from.

Returns:

The deserialized component.

FilterRetriever.run

@component.output_types(documents=List[Document])
def run(filters: Optional[Dict[str, Any]] = None)

Run the FilterRetriever on the given input data.

Arguments:

  • filters: A dictionary with filters to narrow down the search space. If not specified, the FilterRetriever uses the value provided at initialization.

Returns:

The retrieved documents.