Docling Serve
haystack_integrations.components.converters.docling_serve.converter
ExportType
Bases: str, Enum
Enumeration of export formats supported by DoclingServe.
MARKDOWN: Converts documents to Markdown format.TEXT: Extracts plain text.JSON: Returns the full Docling document as a JSON string.
DoclingServeConverter
Converts documents to Haystack Documents using a DoclingServe server.
See DoclingServe.
DoclingServe hosts Docling in a scalable HTTP server, supporting PDFs, Office documents, HTML, and many other
formats. Unlike the local DoclingConverter, this component has no heavy ML dependencies — all processing
happens on the remote server.
Local files and ByteStreams are uploaded via the /v1/convert/file endpoint. URL strings are sent to
/v1/convert/source.
Supports both synchronous (run) and asynchronous (run_async) execution.
Usage example
from haystack_integrations.components.converters.docling_serve import DoclingServeConverter
converter = DoclingServeConverter(base_url="http://localhost:5001")
result = converter.run(sources=["https://arxiv.org/pdf/2206.01062"])
print(result["documents"][0].content[:200])
init
__init__(
*,
base_url: str = "http://localhost:5001",
export_type: ExportType = ExportType.MARKDOWN,
convert_options: dict[str, Any] | None = None,
timeout: float = 120.0,
api_key: Secret | None = Secret.from_env_var(
"DOCLING_SERVE_API_KEY", strict=False
)
) -> None
Initializes the DoclingServeConverter.
Parameters:
- base_url (
str) – Base URL of the DoclingServe instance. Defaults to"http://localhost:5001". - export_type (
ExportType) – The output format for converted documents. One ofExportType.MARKDOWN(default),ExportType.TEXT, orExportType.JSON. - convert_options (
dict[str, Any] | None) – Optional dictionary of conversion options passed directly to the DoclingServe API (e.g.{"do_ocr": True, "ocr_engine": "tesseract"}). See DoclingServe options. Note:to_formatsis set automatically based onexport_typeand should not be included here. - timeout (
float) – HTTP request timeout in seconds. Defaults to120.0. - api_key (
Secret | None) – API key for authenticating with a secured DoclingServe instance. Reads from theDOCLING_SERVE_API_KEYenvironment variable by default. Set toNoneto disable authentication.
to_dict
Serializes the component to a dictionary.
Returns:
dict[str, Any]– A dictionary representation of the component.
from_dict
Deserializes the component from a dictionary.
Parameters:
- data (
dict[str, Any]) – Dictionary representation of the component.
Returns:
DoclingServeConverter– A newDoclingServeConverterinstance.
run
run(
sources: list[str | Path | ByteStream],
meta: dict[str, Any] | list[dict[str, Any]] | None = None,
) -> dict[str, list[Document]]
Converts documents by sending them to DoclingServe and returns Haystack Documents.
Parameters:
- sources (
list[str | Path | ByteStream]) – List of sources to convert. Each item can be a URL string, a local file path, or aByteStream. URL strings are sent to/v1/convert/source; all other sources are uploaded to/v1/convert/file. - meta (
dict[str, Any] | list[dict[str, Any]] | None) – Optional metadata to attach to the output Documents. Can be a single dict applied to all documents, or a list of dicts with one entry per source.
Returns:
dict[str, list[Document]]– A dictionary with key"documents"containing the converted Haystack Documents.
run_async
run_async(
sources: list[str | Path | ByteStream],
meta: dict[str, Any] | list[dict[str, Any]] | None = None,
) -> dict[str, list[Document]]
Asynchronously converts documents by sending them to DoclingServe.
This is the async equivalent of run(), useful when DoclingServe requests should not
block the event loop.
Parameters:
- sources (
list[str | Path | ByteStream]) – List of sources to convert. Each item can be a URL string, a local file path, or aByteStream. URL strings are sent to/v1/convert/source; all other sources are uploaded to/v1/convert/file. - meta (
dict[str, Any] | list[dict[str, Any]] | None) – Optional metadata to attach to the output Documents.
Returns:
dict[str, list[Document]]– A dictionary with key"documents"containing the converted Haystack Documents.