Qdrant integration for Haystack
Module haystack_integrations.components.retrievers.qdrant.retriever
QdrantEmbeddingRetriever
A component for retrieving documents from an QdrantDocumentStore using dense vectors.
Usage example:
from haystack_integrations.components.retrievers.qdrant import QdrantEmbeddingRetriever
from haystack_integrations.document_stores.qdrant import QdrantDocumentStore
document_store = QdrantDocumentStore(
":memory:",
recreate_index=True,
return_embedding=True,
)
document_store.write_documents([Document(content="test", embedding=[0.5]*768)])
retriever = QdrantEmbeddingRetriever(document_store=document_store)
# using a fake vector to keep the example simple
retriever.run(query_embedding=[0.1]*768)
QdrantEmbeddingRetriever.__init__
def __init__(document_store: QdrantDocumentStore,
filters: Optional[Dict[str, Any]] = None,
top_k: int = 10,
scale_score: bool = True,
return_embedding: bool = False)
Create a QdrantEmbeddingRetriever component.
Arguments:
document_store
: An instance of QdrantDocumentStore.filters
: A dictionary with filters to narrow down the search space. Default is None.top_k
: The maximum number of documents to retrieve. Default is 10.scale_score
: Whether to scale the scores of the retrieved documents or not. Default is True.return_embedding
: Whether to return the embedding of the retrieved Documents. Default is False.
Raises:
ValueError
: If 'document_store' is not an instance of QdrantDocumentStore.
QdrantEmbeddingRetriever.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
QdrantEmbeddingRetriever.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "QdrantEmbeddingRetriever"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
QdrantEmbeddingRetriever.run
@component.output_types(documents=List[Document])
def run(query_embedding: List[float],
filters: Optional[Dict[str, Any]] = None,
top_k: Optional[int] = None,
scale_score: Optional[bool] = None,
return_embedding: Optional[bool] = None)
Run the Embedding Retriever on the given input data.
Arguments:
query_embedding
: Embedding of the query.filters
: A dictionary with filters to narrow down the search space.top_k
: The maximum number of documents to return.scale_score
: Whether to scale the scores of the retrieved documents or not.return_embedding
: Whether to return the embedding of the retrieved Documents.
Returns:
The retrieved documents.
QdrantSparseEmbeddingRetriever
A component for retrieving documents from an QdrantDocumentStore using sparse vectors.
Usage example:
from haystack_integrations.components.retrievers.qdrant import QdrantSparseEmbeddingRetriever
from haystack_integrations.document_stores.qdrant import QdrantDocumentStore
from haystack.dataclasses.sparse_embedding import SparseEmbedding
document_store = QdrantDocumentStore(
":memory:",
use_sparse_embeddings=True,
recreate_index=True,
return_embedding=True,
)
doc = Document(content="test", sparse_embedding=SparseEmbedding(indices=[0, 3, 5], values=[0.1, 0.5, 0.12]))
document_store.write_documents([doc])
retriever = QdrantSparseEmbeddingRetriever(document_store=document_store)
sparse_embedding = SparseEmbedding(indices=[0, 1, 2, 3], values=[0.1, 0.8, 0.05, 0.33])
retriever.run(query_sparse_embedding=sparse_embedding)
QdrantSparseEmbeddingRetriever.__init__
def __init__(document_store: QdrantDocumentStore,
filters: Optional[Dict[str, Any]] = None,
top_k: int = 10,
scale_score: bool = True,
return_embedding: bool = False)
Create a QdrantSparseEmbeddingRetriever component.
Arguments:
document_store
: An instance of QdrantDocumentStore.filters
: A dictionary with filters to narrow down the search space. Default is None.top_k
: The maximum number of documents to retrieve. Default is 10.scale_score
: Whether to scale the scores of the retrieved documents or not. Default is True.return_embedding
: Whether to return the sparse embedding of the retrieved Documents. Default is False.
Raises:
ValueError
: If 'document_store' is not an instance of QdrantDocumentStore.
QdrantSparseEmbeddingRetriever.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
QdrantSparseEmbeddingRetriever.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "QdrantSparseEmbeddingRetriever"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
QdrantSparseEmbeddingRetriever.run
@component.output_types(documents=List[Document])
def run(query_sparse_embedding: SparseEmbedding,
filters: Optional[Dict[str, Any]] = None,
top_k: Optional[int] = None,
scale_score: Optional[bool] = None,
return_embedding: Optional[bool] = None)
Run the Sparse Embedding Retriever on the given input data.
Arguments:
query_sparse_embedding
: Sparse Embedding of the query.filters
: A dictionary with filters to narrow down the search space.top_k
: The maximum number of documents to return.scale_score
: Whether to scale the scores of the retrieved documents or not.return_embedding
: Whether to return the embedding of the retrieved Documents.
Returns:
The retrieved documents.
QdrantHybridRetriever
A component for retrieving documents from an QdrantDocumentStore using both dense and sparse vectors and fusing the results using Reciprocal Rank Fusion.
Usage example:
from haystack_integrations.components.retrievers.qdrant import QdrantHybridRetriever
from haystack_integrations.document_stores.qdrant import QdrantDocumentStore
from haystack.dataclasses.sparse_embedding import SparseEmbedding
document_store = QdrantDocumentStore(
":memory:",
use_sparse_embeddings=True,
recreate_index=True,
return_embedding=True,
wait_result_from_api=True,
)
doc = Document(content="test",
embedding=[0.5]*768,
sparse_embedding=SparseEmbedding(indices=[0, 3, 5], values=[0.1, 0.5, 0.12]))
document_store.write_documents([doc])
retriever = QdrantHybridRetriever(document_store=document_store)
embedding = [0.1]*768
sparse_embedding = SparseEmbedding(indices=[0, 1, 2, 3], values=[0.1, 0.8, 0.05, 0.33])
retriever.run(query_embedding=embedding, query_sparse_embedding=sparse_embedding)
QdrantHybridRetriever.__init__
def __init__(document_store: QdrantDocumentStore,
filters: Optional[Dict[str, Any]] = None,
top_k: int = 10,
return_embedding: bool = False)
Create a QdrantHybridRetriever component.
Arguments:
document_store
: An instance of QdrantDocumentStore.filters
: A dictionary with filters to narrow down the search space.top_k
: The maximum number of documents to retrieve.return_embedding
: Whether to return the embeddings of the retrieved Documents.
Raises:
ValueError
: If 'document_store' is not an instance of QdrantDocumentStore.
QdrantHybridRetriever.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
QdrantHybridRetriever.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "QdrantHybridRetriever"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
QdrantHybridRetriever.run
@component.output_types(documents=List[Document])
def run(query_embedding: List[float],
query_sparse_embedding: SparseEmbedding,
filters: Optional[Dict[str, Any]] = None,
top_k: Optional[int] = None,
return_embedding: Optional[bool] = None)
Run the Sparse Embedding Retriever on the given input data.
Arguments:
query_embedding
: Dense embedding of the query.query_sparse_embedding
: Sparse embedding of the query.filters
: A dictionary with filters to narrow down the search space.top_k
: The maximum number of documents to return.return_embedding
: Whether to return the embedding of the retrieved Documents.
Returns:
The retrieved documents.
Module haystack_integrations.document_stores.qdrant.document_store
get_batches_from_generator
def get_batches_from_generator(iterable, n)
Batch elements of an iterable into fixed-length chunks or blocks.
Module haystack_integrations.document_stores.qdrant.migrate_to_sparse
migrate_to_sparse_embeddings_support
def migrate_to_sparse_embeddings_support(
old_document_store: QdrantDocumentStore, new_index: str)
Utility function to migrate an existing QdrantDocumentStore
to a new one with support for sparse embeddings.
With qdrant-hasytack v3.3.0, support for sparse embeddings has been added to QdrantDocumentStore
.
This feature is disabled by default and can be enabled by setting use_sparse_embeddings=True
in the init
parameters. To store sparse embeddings, Document stores/collections created with this feature disabled must be
migrated to a new collection with the feature enabled.
This utility function applies to on-premise and cloud instances of Qdrant. It does not work for local in-memory/disk-persisted instances.
The utility function merely migrates the existing documents so that they are ready to store sparse embeddings. It does not compute sparse embeddings. To do this, you need to use a Sparse Embedder component.
Example usage:
from haystack_integrations.document_stores.qdrant import QdrantDocumentStore
from haystack_integrations.document_stores.qdrant import migrate_to_sparse_embeddings_support
old_document_store = QdrantDocumentStore(url="http://localhost:6333",
index="Document",
use_sparse_embeddings=False)
new_index = "Document_sparse"
migrate_to_sparse_embeddings_support(old_document_store, new_index)
# now you can use the new document store with sparse embeddings support
new_document_store = QdrantDocumentStore(url="http://localhost:6333",
index=new_index,
use_sparse_embeddings=True)
Arguments:
old_document_store
: The existing QdrantDocumentStore instance to migrate from.new_index
: The name of the new index/collection to create with sparse embeddings support.