DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio (Waitlist)
Documentation

InMemoryEmbeddingRetriever

Use this Retriever with the InMemoryDocumentStore if you're looking for embedding-based retrieval.

Most common position in a pipelineIn query pipelines:
In a RAG pipeline, before a PromptBuilder
In a semantic search pipeline, as the last component
In an extractive QA pipeline, after a Tex tEmbedder and before an ExtractiveReader
Mandatory init variables"document_store": An instance of InMemoryDocumentStore
Mandatory run variables"query_embedding": A list of floating point numbers
Output variables"documents": A list of documents
API referenceRetrievers
GitHub linkhttps://github.com/deepset-ai/haystack/blob/main/haystack/components/retrievers/in_memory/embedding_retriever.py

Overview

The InMemoryEmbeddingRetriever is an embedding-based Retriever compatible with the InMemoryDocumentStore. It compares the query and Document embeddings and fetches the Documents most relevant to the query from the InMemoryDocumentStore based on the outcome.

When using the InMemoryEmbeddingRetriever in your NLP system, make sure it has the query and Document embeddings available. You can do so by adding a DocumentEmbedder to your indexing pipeline and a Text Embedder to your query pipeline. For details, see Embedders.

In addition to the query_embedding, the InMemoryEmbeddingRetriever accepts other optional parameters, including top_k (the maximum number of Documents to retrieve) and filters to narrow down the search space.

The embedding_similarity_function to use for embedding retrieval must be defined when the correspondingInMemoryDocumentStore is initialized.

Usage

In a pipeline

Use this Retriever in a query pipeline like this:

from haystack import Document, Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder
from haystack.components.retrievers import InMemoryEmbeddingRetriever

document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")

documents = [Document(content="There are over 7,000 languages spoken around the world today."),
						Document(content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors."),
						Document(content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.")]

document_embedder = SentenceTransformersDocumentEmbedder()
document_embedder.warm_up()

documents_with_embeddings = document_embedder.run(documents)["documents"]
document_store.write_documents(documents_with_embeddings)

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
query_pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

query = "How many languages are there?"

result = query_pipeline.run({"text_embedder": {"text": query}})

print(result['retriever']['documents'][0])

Related Links