DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio
API Reference

Writes Documents to a DocumentStore.

Module document_writer

DocumentWriter

Writes documents to a DocumentStore.

Usage example

from haystack import Document
from haystack.components.writers import DocumentWriter
from haystack.document_stores.in_memory import InMemoryDocumentStore
docs = [
    Document(content="Python is a popular programming language"),
]
doc_store = InMemoryDocumentStore()
doc_store.write_documents(docs)

DocumentWriter.__init__

def __init__(document_store: DocumentStore,
             policy: DuplicatePolicy = DuplicatePolicy.NONE)

Create a DocumentWriter component.

Arguments:

  • document_store: The instance of the document store where you want to store your documents.
  • policy: The policy to apply when a Document with the same ID already exists in the DocumentStore.
  • DuplicatePolicy.NONE: Default policy, relies on the DocumentStore settings.
  • DuplicatePolicy.SKIP: Skips documents with the same ID and doesn't write them to the DocumentStore.
  • DuplicatePolicy.OVERWRITE: Overwrites documents with the same ID.
  • DuplicatePolicy.FAIL: Raises an error if a Document with the same ID is already in the DocumentStore.

DocumentWriter.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

DocumentWriter.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "DocumentWriter"

Deserializes the component from a dictionary.

Arguments:

  • data: The dictionary to deserialize from.

Raises:

  • DeserializationError: If the document store is not properly specified in the serialization data or its type cannot be imported.

Returns:

The deserialized component.

DocumentWriter.run

@component.output_types(documents_written=int)
def run(documents: List[Document], policy: Optional[DuplicatePolicy] = None)

Run the DocumentWriter on the given input data.

Arguments:

  • documents: A list of documents to write to the document store.
  • policy: The policy to use when encountering duplicate documents.

Raises:

  • ValueError: If the specified document store is not found.

Returns:

Number of documents written to the document store.