FAISS
haystack_integrations.components.retrievers.faiss.embedding_retriever
FAISSEmbeddingRetriever
Retrieves documents from the FAISSDocumentStore, based on their dense embeddings.
Example usage:
from haystack import Document, Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder
from haystack.document_stores.types import DuplicatePolicy
from haystack_integrations.document_stores.faiss import FAISSDocumentStore
from haystack_integrations.components.retrievers.faiss import FAISSEmbeddingRetriever
document_store = FAISSDocumentStore(embedding_dim=768)
documents = [
Document(content="There are over 7,000 languages spoken around the world today."),
Document(content="Elephants have been observed to behave in a way that indicates a high level of intelligence."),
Document(content="In certain places, you can witness the phenomenon of bioluminescent waves."),
]
document_embedder = SentenceTransformersDocumentEmbedder()
document_embedder.warm_up()
documents_with_embeddings = document_embedder.run(documents)["documents"]
document_store.write_documents(documents_with_embeddings, policy=DuplicatePolicy.OVERWRITE)
query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
query_pipeline.add_component("retriever", FAISSEmbeddingRetriever(document_store=document_store))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
query = "How many languages are there?"
res = query_pipeline.run({"text_embedder": {"text": query}})
assert res["retriever"]["documents"][0].content == "There are over 7,000 languages spoken around the world today."
init
__init__(
*,
document_store: FAISSDocumentStore,
filters: dict[str, Any] | None = None,
top_k: int = 10,
filter_policy: str | FilterPolicy = FilterPolicy.REPLACE
)
Parameters:
- document_store (
FAISSDocumentStore) – An instance ofFAISSDocumentStore. - filters (
dict[str, Any] | None) – Filters applied to the retrieved Documents at initialisation time. At runtime, these are merged with any runtime filters according to thefilter_policy. - top_k (
int) – Maximum number of Documents to return. - filter_policy (
str | FilterPolicy) – Policy to determine how init-time and runtime filters are combined. SeeFilterPolicyfor details. Defaults toFilterPolicy.REPLACE.
Raises:
ValueError– Ifdocument_storeis not an instance ofFAISSDocumentStore.
to_dict
Serializes the component to a dictionary.
Returns:
dict[str, Any]– Dictionary with serialized data.
from_dict
Deserializes the component from a dictionary.
Parameters:
- data (
dict[str, Any]) – Dictionary to deserialize from.
Returns:
FAISSEmbeddingRetriever– Deserialized component.
run
run(
query_embedding: list[float],
filters: dict[str, Any] | None = None,
top_k: int | None = None,
) -> dict[str, list[Document]]
Retrieve documents from the FAISSDocumentStore, based on their embeddings.
Parameters:
- query_embedding (
list[float]) – Embedding of the query. - filters (
dict[str, Any] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on thefilter_policychosen at retriever initialization. See init method docstring for more details. - top_k (
int | None) – Maximum number of Documents to return. Overrides the value set at initialization.
Returns:
dict[str, list[Document]]– A dictionary with the following keys:documents: List ofDocuments that are similar toquery_embedding.
run_async
run_async(
query_embedding: list[float],
filters: dict[str, Any] | None = None,
top_k: int | None = None,
) -> dict[str, list[Document]]
Asynchronously retrieve documents from the FAISSDocumentStore, based on their embeddings.
Since FAISS search is CPU-bound and fully in-memory, this delegates directly to the synchronous
run() method. No I/O or network calls are involved.
Parameters:
- query_embedding (
list[float]) – Embedding of the query. - filters (
dict[str, Any] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on thefilter_policychosen at retriever initialization. See init method docstring for more details. - top_k (
int | None) – Maximum number of Documents to return. Overrides the value set at initialization.
Returns:
dict[str, list[Document]]– A dictionary with the following keys:documents: List ofDocuments that are similar toquery_embedding.
haystack_integrations.document_stores.faiss.document_store
FAISSDocumentStore
A Document Store using FAISS for vector search and a simple JSON file for metadata storage.
This Document Store is suitable for small to medium-sized datasets where simplicity is preferred over scalability.
It supports basic persistence by saving the FAISS index to a .faiss file and documents to a .json file.
init
__init__(
index_path: str | None = None,
index_string: str = "Flat",
embedding_dim: int = 768,
)
Initializes the FAISSDocumentStore.
Parameters:
- index_path (
str | None) – Path to save/load the index and documents. If None, the store is in-memory only. - index_string (
str) – The FAISS index factory string. Default is "Flat". - embedding_dim (
int) – The dimension of the embeddings. Default is 768.
Raises:
DocumentStoreError– If the FAISS index cannot be initialized.ValueError– Ifindex_pathpoints to a missing.faissfile when loading persisted data.
count_documents
Returns the number of documents in the store.
filter_documents
Returns documents that match the provided filters.
Parameters:
- filters (
dict[str, Any] | None) – A dictionary of filters to apply.
Returns:
list[Document]– A list of matching Documents.
Raises:
FilterError– If the filter structure is invalid.
write_documents
write_documents(
documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.FAIL
) -> int
Writes documents to the store.
Parameters:
- documents (
list[Document]) – The list of documents to write. - policy (
DuplicatePolicy) – The policy to handle duplicate documents.
Returns:
int– The number of documents written.
Raises:
ValueError– Ifdocumentsis not an iterable ofDocumentobjects.DuplicateDocumentError– If a duplicate document is found andpolicyisDuplicatePolicy.FAIL.DocumentStoreError– If the FAISS index is unexpectedly unavailable when adding embeddings.
delete_documents
Deletes documents from the store.
Raises:
DocumentStoreError– If the FAISS index is unexpectedly unavailable when removing embeddings.
delete_all_documents
Deletes all documents from the store.
search
search(
query_embedding: list[float],
top_k: int = 10,
filters: dict[str, Any] | None = None,
) -> list[Document]
Performs a vector search.
Parameters:
- query_embedding (
list[float]) – The query embedding. - top_k (
int) – The number of results to return. - filters (
dict[str, Any] | None) – Filters to apply.
Returns:
list[Document]– A list of matching Documents.
Raises:
FilterError– If the filter structure is invalid.
delete_by_filter
Deletes documents that match the provided filters from the store.
Parameters:
- filters (
dict[str, Any]) – A dictionary of filters to apply to find documents to delete.
Returns:
int– The number of documents deleted.
Raises:
FilterError– If the filter structure is invalid.DocumentStoreError– If the FAISS index is unexpectedly unavailable when removing embeddings.
count_documents_by_filter
Returns the number of documents that match the provided filters.
Parameters:
- filters (
dict[str, Any]) – A dictionary of filters to apply.
Returns:
int– The number of matching documents.
Raises:
FilterError– If the filter structure is invalid.
update_by_filter
Updates documents that match the provided filters with the new metadata.
Note: Updates are performed in-memory only. To persist these changes,
you must explicitly call save() after updating.
Parameters:
- filters (
dict[str, Any]) – A dictionary of filters to apply to find documents to update. - meta (
dict[str, Any]) – A dictionary of metadata key-value pairs to update in the matching documents.
Returns:
int– The number of documents updated.
Raises:
FilterError– If the filter structure is invalid.
get_metadata_fields_info
Infers and returns the types of all metadata fields from the stored documents.
Returns:
dict[str, dict[str, Any]]– A dictionary mapping field names to dictionaries with a "type" key (e.g.{"field": {"type": "long"}}).
get_metadata_field_min_max
Returns the minimum and maximum values for a specific metadata field.
Parameters:
- field_name (
str) – The name of the metadata field.
Returns:
dict[str, Any]– A dictionary with keys "min" and "max" containing the respective min and max values.
get_metadata_field_unique_values
Returns all unique values for a specific metadata field.
Parameters:
- field_name (
str) – The name of the metadata field.
Returns:
list[Any]– A list of unique values for the specified field.
count_unique_metadata_by_filter
count_unique_metadata_by_filter(
filters: dict[str, Any], fields: list[str]
) -> dict[str, int]
Returns a count of unique values for multiple metadata fields, optionally scoped by a filter.
Parameters:
- filters (
dict[str, Any]) – A dictionary of filters to apply. - fields (
list[str]) – A list of metadata field names to count unique values for.
Returns:
dict[str, int]– A dictionary mapping each field name to the count of its unique values.
to_dict
Serializes the store to a dictionary.
from_dict
Deserializes the store from a dictionary.
save
Saves the index and documents to disk.
Raises:
DocumentStoreError– If the FAISS index is unexpectedly unavailable.
load
Loads the index and documents from disk.
Raises:
ValueError– If the.faissfile does not exist.