DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio
API Reference

Writes Documents to a DocumentStore.

Module document_writer

DocumentWriter

Writes documents to a DocumentStore.

Usage example:

from haystack import Document
from haystack.components.writers import DocumentWriter
from haystack.document_stores.in_memory import InMemoryDocumentStore
docs = [
    Document(content="Python is a popular programming language"),
]
doc_store = InMemoryDocumentStore()
doc_store.write_documents(docs)

DocumentWriter.__init__

def __init__(document_store: DocumentStore,
             policy: DuplicatePolicy = DuplicatePolicy.NONE)

Create a DocumentWriter component.

Arguments:

  • document_store: The DocumentStore where the documents are to be written.
  • policy: The policy to apply when a Document with the same id already exists in the DocumentStore.
  • DuplicatePolicy.NONE: Default policy, behaviour depends on the Document Store.
  • DuplicatePolicy.SKIP: If a Document with the same id already exists, it is skipped and not written.
  • DuplicatePolicy.OVERWRITE: If a Document with the same id already exists, it is overwritten.
  • DuplicatePolicy.FAIL: If a Document with the same id already exists, an error is raised.

DocumentWriter.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

DocumentWriter.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "DocumentWriter"

Deserializes the component from a dictionary.

Arguments:

  • data: The dictionary to deserialize from.

Raises:

  • DeserializationError: If the document store is not properly specified in the serialization data or its type cannot be imported.

Returns:

The deserialized component.

DocumentWriter.run

@component.output_types(documents_written=int)
def run(documents: List[Document], policy: Optional[DuplicatePolicy] = None)

Run the DocumentWriter on the given input data.

Arguments:

  • documents: A list of documents to write to the store.
  • policy: The policy to use when encountering duplicate documents.

Raises:

  • ValueError: If the specified document store is not found.

Returns:

Number of documents written