DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio (Waitlist)
Documentation

SentenceWindowRetrieval

Use this component to retrieve neighboring sentences around relevant sentences to get the full context.

NameSentenceWindowRetrieval
Folder path/retrievers/
Most common position in a pipelineUsed after the main Retriever component, like the InMemoryEmbeddingRetriever or any other Retriever.
Mandatory input variables"retrieved_documents": A list of already retrieved documents for which you want to get a context window
Output variables“context_windows”: A list of strings

Overview

The "sentence window" is a retrieval technique that allows for the retrieval of the context around relevant sentences.

During indexing, documents are broken into smaller chunks or sentences and indexed. During retrieval, the sentences most relevant to a given query, based on a certain similarity metric, are retrieved.

Once we have the relevant sentences, we can retrieve neighboring sentences to provide full context. The number of neighboring sentences to retrieve is defined by a fixed number of sentences before and after the relevant sentence.

This component is meant to be used with other Retrievers, such as the InMemoryEmbeddingRetriever. These Retrievers find relevant sentences by comparing a query against indexed sentences using a similarity metric. Then, the SentenceWindowRetrieval component retrieves neighboring sentences around the relevant ones by leveraging metadata stored in the Document object.

Usage

On its own

splitter = DocumentSplitter(split_length=10, split_overlap=5, split_by="word")  
text = ("This is a text with some words. There is a second sentence. And there is also a third sentence. "  
        "It also contains a fourth sentence. And a fifth sentence. And a sixth sentence. And a seventh sentence")
doc = Document(content=text)

docs = splitter.run([doc])
doc_store = InMemoryDocumentStore()
doc_store.write_documents(docs["documents"])

retriever = SentenceWindowRetrieval(document_store=doc_store, window_size=3)

In a Pipeline

from haystack import Document, Pipeline  
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever  
from haystack.components.retrievers import SentenceWindowRetrieval  
from haystack.components.preprocessors import DocumentSplitter  
from haystack.document_stores.in_memory import InMemoryDocumentStore
    
splitter = DocumentSplitter(split_length=10, split_overlap=5, split_by="word")
text = (
        "This is a text with some words. There is a second sentence. And there is also a third sentence. "
        "It also contains a fourth sentence. And a fifth sentence. And a sixth sentence. And a seventh sentence"
)
doc = Document(content=text)
docs = splitter.run([doc])
doc_store = InMemoryDocumentStore()
doc_store.write_documents(docs["documents"])


rag = Pipeline()
rag.add_component("bm25_retriever", InMemoryBM25Retriever(doc_store, top_k=1))
rag.add_component("sentence_window_retriever", SentenceWindowRetrieval(document_store=doc_store, window_size=3))
rag.connect("bm25_retriever", "sentence_window_retriever")

rag.run({'bm25_retriever': {"query":"third"}})

Related Links

See the parameters details in our API reference: