Data Classes

Haystack 2.0 uses data classes to help components communicate with each other in a simple and modular way. By doing this, data flows seamlessly through the Haystack Pipelines. This page goes over the available data classes in Haystack 2.0: ByteStream, Answer (along with its variants ExtractedAnswer and GeneratedAnswer), ChatMessage, Document, and StreamingChunk, explaining how they contribute to the Haystack ecosystem.

ByteStream

Overview

ByteStream represents binary object abstraction in the Haystack framework and is crucial for handling various binary data formats.

Key Features

Holds binary data and associated metadata.
Optional MIME type specification for flexibility.
File interaction methods (to_file, from_file_path, from_string) for easy data manipulation.

Attributes

@dataclass(frozen=True)
class ByteStream:
    data: bytes
    metadata: Dict[str, Any] = field(default_factory=dict, hash=False)
    mime_type: Optional[str] = field(default=None)

Answer

Overview

The Answer class serves as the base for responses generated within Haystack, containing the answer's data, the originating query, and additional metadata.

Key Features

Adaptable data handling, accommodating any data type (data).
Query tracking for contextual relevance (query).
Extensive metadata support for detailed answer description.

Attributes

@dataclass(frozen=True)
class Answer:
    data: Any
    query: str
    meta: Dict[str, Any]

ExtractedAnswer

Overview

ExtractedAnswer is a subclass of Answer that deals explicitly with answers derived from Documents, offering more detailed attributes.

Key Features

Includes reference to the originating Document.
Score attribute to quantify the answer's confidence level.
Optional start and end indices for pinpointing answer location within the source.

Attributes

@dataclass
class ExtractedAnswer:
    query: str
    score: float
    data: Optional[str] = None
    document: Optional[Document] = None
    context: Optional[str] = None
    document_offset: Optional["Span"] = None
    context_offset: Optional["Span"] = None
    meta: Dict[str, Any] = field(default_factory=dict)

ExtractedTableAnswer

Overview

ExtractedTableAnswer is a subclass of Answer focused on table question answering. It deals explicitly with answers derived from Documents, offering more detailed attributes.

Key Features

Includes reference to the originating Document.
Score attribute to quantify the answer's confidence level.
Cells attributes to work with table question answering.

Attributes

@dataclass
class ExtractedTableAnswer:
    query: str
    score: float
    data: Optional[str] = None
    document: Optional[Document] = None
    context: Optional[DataFrame] = None
    document_cells: List["Cell"] = field(default_factory=list)
    context_cells: List["Cell"] = field(default_factory=list)
    meta: Dict[str, Any] = field(default_factory=dict)

GeneratedAnswer

Overview

GeneratedAnswer extends the Answer class to accommodate answers generated from multiple Documents.

Key Features

Handles string-type data.
Links to a list of Document objects, enhancing answer traceability.

Attributes

@dataclass
class GeneratedAnswer:
    data: str
    query: str
    documents: List[Document]
    meta: Dict[str, Any] = field(default_factory=dict)

ChatMessage

Overview

ChatMessage is the central abstraction for chat-based LLM that encompasses message content, sender's role, and additional attributes. It follows the ChatML format.

Key Features

Detailed role categorization using ChatRole enum (assistant, user, system, function).
Optional name attribute for function calls.
Metadata support for enriched message context.
Role-specific message creation methods (e.g., from_assistant, from_user).

Attributes

@dataclass
class ChatMessage:
    content: str
    role: ChatRole
    name: Optional[str]
    metadata: Dict[str, Any] = field(default_factory=dict, hash=False)

Document

Overview

Document represents a central data abstraction in Haystack, capable of holding text, tables, and binary data.

Key Features

Unique ID for each document.
Multiple content types are supported: text, dataframe, binary (blob).
Custom metadata and scoring for advanced document management.
Optional embedding for AI-based applications.

Attributes

@dataclass
class Document(metaclass=_BackwardCompatible):
    id: str = field(default="")
    content: Optional[str] = field(default=None)
    dataframe: Optional[pandas.DataFrame] = field(default=None)
    blob: Optional[ByteStream] = field(default=None)
    meta: Dict[str, Any] = field(default_factory=dict)
    score: Optional[float] = field(default=None)
    embedding: Optional[List[float]] = field(default=None)

StreamingChunk

Overview

StreamingChunk represents a partially streamed LLM response, enabling real-time LLM response.

Key Features

String-based content representation.
Accompanying metadata for additional context and management.

Attributes

class StreamingChunk:
    content: str
    metadata: Dict[str, Any] = field(default_factory=dict, hash=False)

SparseEmbedding

Overview

The SparseEmbedding class represents a sparse embedding: a vector where most values are zeros.

Attributes

indices: List of indices of non-zero elements in the embedding.
values: List of values of non-zero elements in the embedding.