Skip to main content
Version: 2.25-unstable

Document Writers

document_writer

DocumentWriter

Writes documents to a DocumentStore.

Usage example

python
from haystack import Document
from haystack.components.writers import DocumentWriter
from haystack.document_stores.in_memory import InMemoryDocumentStore
docs = [
Document(content="Python is a popular programming language"),
]
doc_store = InMemoryDocumentStore()
writer = DocumentWriter(document_store=doc_store)
writer.run(docs)

init

python
__init__(
document_store: DocumentStore,
policy: DuplicatePolicy = DuplicatePolicy.NONE,
)

Create a DocumentWriter component.

Parameters:

  • document_store (DocumentStore) – The instance of the document store where you want to store your documents.
  • policy (DuplicatePolicy) – The policy to apply when a Document with the same ID already exists in the DocumentStore.
  • DuplicatePolicy.NONE: Default policy, relies on the DocumentStore settings.
  • DuplicatePolicy.SKIP: Skips documents with the same ID and doesn't write them to the DocumentStore.
  • DuplicatePolicy.OVERWRITE: Overwrites documents with the same ID.
  • DuplicatePolicy.FAIL: Raises an error if a Document with the same ID is already in the DocumentStore.

to_dict

python
to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

  • dict[str, Any] – Dictionary with serialized data.

from_dict

python
from_dict(data: dict[str, Any]) -> DocumentWriter

Deserializes the component from a dictionary.

Parameters:

  • data (dict[str, Any]) – The dictionary to deserialize from.

Returns:

  • DocumentWriter – The deserialized component.

Raises:

  • DeserializationError – If the document store is not properly specified in the serialization data or its type cannot be imported.

run

python
run(
documents: list[Document], policy: DuplicatePolicy | None = None
) -> dict[str, int]

Run the DocumentWriter on the given input data.

Parameters:

  • documents (list[Document]) – A list of documents to write to the document store.
  • policy (DuplicatePolicy | None) – The policy to use when encountering duplicate documents.

Returns:

  • dict[str, int] – Number of documents written to the document store.

Raises:

  • ValueError – If the specified document store is not found.

run_async

python
run_async(
documents: list[Document], policy: DuplicatePolicy | None = None
) -> dict[str, int]

Asynchronously run the DocumentWriter on the given input data.

This is the asynchronous version of the run method. It has the same parameters and return values but can be used with await in async code.

Parameters:

  • documents (list[Document]) – A list of documents to write to the document store.
  • policy (DuplicatePolicy | None) – The policy to use when encountering duplicate documents.

Returns:

  • dict[str, int] – Number of documents written to the document store.

Raises:

  • ValueError – If the specified document store is not found.
  • TypeError – If the specified document store does not implement write_documents_async.