Sweeps through a Document Store and returns a set of candidate Documents that are relevant to the query.
Module in_memory/bm25_retriever
InMemoryBM25Retriever
Retrieves documents using the BM25 algorithm.
Usage example:
from haystack import Document
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
docs = [
Document(content="Python is a popular programming language"),
Document(content="python ist eine beliebte Programmiersprache"),
]
doc_store = InMemoryDocumentStore()
doc_store.write_documents(docs)
retriever = InMemoryBM25Retriever(doc_store)
result = retriever.run(query="Programmiersprache")
print(result["documents"])
InMemoryBM25Retriever.__init__
def __init__(document_store: InMemoryDocumentStore,
filters: Optional[Dict[str, Any]] = None,
top_k: int = 10,
scale_score: bool = False,
filter_policy: FilterPolicy = FilterPolicy.REPLACE)
Create the InMemoryBM25Retriever component.
Arguments:
document_store
: An instance of InMemoryDocumentStore.filters
: A dictionary with filters to narrow down the search space.top_k
: The maximum number of documents to retrieve.scale_score
: Scales the BM25 score to a unit interval in the range of 0 to 1, where 1 means extremely relevant. If set toFalse
, uses raw similarity scores.filter_policy
: The filter policy to apply during retrieval.
Raises:
ValueError
: If the specifiedtop_k
is not > 0.
InMemoryBM25Retriever.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
InMemoryBM25Retriever.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "InMemoryBM25Retriever"
Deserializes the component from a dictionary.
Arguments:
data
: The dictionary to deserialize from.
Returns:
The deserialized component.
InMemoryBM25Retriever.run
@component.output_types(documents=List[Document])
def run(query: str,
filters: Optional[Dict[str, Any]] = None,
top_k: Optional[int] = None,
scale_score: Optional[bool] = None)
Run the InMemoryBM25Retriever on the given input data.
Arguments:
query
: The query string for the Retriever.filters
: A dictionary with filters to narrow down the search space.top_k
: The maximum number of documents to return.scale_score
: Scales the BM25 score to a unit interval in the range of 0 to 1, where 1 means extremely relevant. If set toFalse
, uses raw similarity scores. If not specified, the value provided at initialization is used.
Raises:
ValueError
: If the specified DocumentStore is not found or is not a InMemoryDocumentStore instance.
Returns:
The retrieved documents.
Module in_memory/embedding_retriever
InMemoryEmbeddingRetriever
Retrieves documents using vector similarity.
Usage example:
from haystack import Document
from haystack.components.embedders import SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
docs = [
Document(content="Python is a popular programming language"),
Document(content="python ist eine beliebte Programmiersprache"),
]
doc_embedder = SentenceTransformersDocumentEmbedder()
doc_embedder.warm_up()
docs_with_embeddings = doc_embedder.run(docs)["documents"]
doc_store = InMemoryDocumentStore()
doc_store.write_documents(docs_with_embeddings)
retriever = InMemoryEmbeddingRetriever(doc_store)
query="Programmiersprache"
text_embedder = SentenceTransformersTextEmbedder()
text_embedder.warm_up()
query_embedding = text_embedder.run(query)["embedding"]
result = retriever.run(query_embedding=query_embedding)
print(result["documents"])
InMemoryEmbeddingRetriever.__init__
def __init__(document_store: InMemoryDocumentStore,
filters: Optional[Dict[str, Any]] = None,
top_k: int = 10,
scale_score: bool = False,
return_embedding: bool = False,
filter_policy: FilterPolicy = FilterPolicy.REPLACE)
Create the InMemoryEmbeddingRetriever component.
Arguments:
document_store
: An instance of InMemoryDocumentStore.filters
: A dictionary with filters to narrow down the search space.top_k
: The maximum number of documents to retrieve.scale_score
: Scales the BM25 score to a unit interval in the range of 0 to 1, where 1 means extremely relevant. If set toFalse
, uses raw similarity scores.return_embedding
: Whether to return the embedding of the retrieved Documents.filter_policy
: The filter policy to apply during retrieval.
Raises:
ValueError
: If the specified top_k is not > 0.
InMemoryEmbeddingRetriever.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
InMemoryEmbeddingRetriever.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "InMemoryEmbeddingRetriever"
Deserializes the component from a dictionary.
Arguments:
data
: The dictionary to deserialize from.
Returns:
The deserialized component.
InMemoryEmbeddingRetriever.run
@component.output_types(documents=List[Document])
def run(query_embedding: List[float],
filters: Optional[Dict[str, Any]] = None,
top_k: Optional[int] = None,
scale_score: Optional[bool] = None,
return_embedding: Optional[bool] = None)
Run the InMemoryEmbeddingRetriever on the given input data.
Arguments:
query_embedding
: Embedding of the query.filters
: A dictionary with filters to narrow down the search space.top_k
: The maximum number of documents to return.scale_score
: Scales the similarity score to a unit interval in the range of 0 to 1, where 1 means extremely relevant. If set toFalse
, uses raw similarity scores. If not specified, the value provided at initialization is used.return_embedding
: Whether to return the embedding of the retrieved Documents.
Raises:
ValueError
: If the specified DocumentStore is not found or is not an InMemoryDocumentStore instance.
Returns:
The retrieved documents.
Module filter_retriever
FilterRetriever
Retrieves documents that match the provided filters.
Usage example:
from haystack import Document
from haystack.components.retrievers import FilterRetriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
docs = [
Document(content="Python is a popular programming language", meta={"lang": "en"}),
Document(content="python ist eine beliebte Programmiersprache", meta={"lang": "de"}),
]
doc_store = InMemoryDocumentStore()
doc_store.write_documents(docs)
retriever = FilterRetriever(doc_store, filters={"field": "lang", "operator": "==", "value": "en"})
# if passed in the run method, filters will override those provided at initialization
result = retriever.run(filters={"field": "lang", "operator": "==", "value": "de"})
print(result["documents"])
FilterRetriever.__init__
def __init__(document_store: DocumentStore,
filters: Optional[Dict[str, Any]] = None)
Create the FilterRetriever component.
Arguments:
document_store
: An instance of a DocumentStore.filters
: A dictionary with filters to narrow down the search space.
FilterRetriever.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
FilterRetriever.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "FilterRetriever"
Deserializes the component from a dictionary.
Arguments:
data
: The dictionary to deserialize from.
Returns:
The deserialized component.
FilterRetriever.run
@component.output_types(documents=List[Document])
def run(filters: Optional[Dict[str, Any]] = None)
Run the FilterRetriever on the given input data.
Arguments:
filters
: A dictionary with filters to narrow down the search space. If not specified, the FilterRetriever uses the value provided at initialization.
Returns:
The retrieved documents.