Data Classes
In Haystack, there are a handful of core classes that are regularly used in many different places. These are classes that carry data through the system and you are likely to interact with these as either the input or output of your pipeline.
Haystack 2.0 uses data classes to help components communicate with each other in a simple and modular way. By doing this, data flows seamlessly through the Haystack Pipelines. This page goes over the available data classes in Haystack 2.0: ByteStream, Answer (along with its variants ExtractedAnswer and GeneratedAnswer), ChatMessage, Document, and StreamingChunk, explaining how they contribute to the Haystack ecosystem.
ByteStream
Overview
ByteStream
represents binary object abstraction in the Haystack framework and is crucial for handling various binary data formats.
Key Features
- Holds binary data and associated metadata.
- Optional MIME type specification for flexibility.
- File interaction methods (
to_file
,from_file_path
,from_string
) for easy data manipulation.
Attributes
@dataclass(frozen=True)
class ByteStream:
data: bytes
metadata: Dict[str, Any] = field(default_factory=dict, hash=False)
mime_type: Optional[str] = field(default=None)
Answer
Overview
The Answer
class serves as the base for responses generated within Haystack, containing the answer's data, the originating query, and additional metadata.
Key Features
- Adaptable data handling, accommodating any data type (
data
). - Query tracking for contextual relevance (
query
). - Extensive metadata support for detailed answer description.
Attributes
@dataclass(frozen=True)
class Answer:
data: Any
query: str
meta: Dict[str, Any]
ExtractedAnswer
Overview
ExtractedAnswer
is a subclass of Answer
that deals explicitly with answers derived from Documents, offering more detailed attributes.
Key Features
- Includes reference to the originating
Document
. - Score attribute to quantify the answer's confidence level.
- Optional start and end indices for pinpointing answer location within the source.
Attributes
@dataclass
class ExtractedAnswer:
query: str
score: float
data: Optional[str] = None
document: Optional[Document] = None
context: Optional[str] = None
document_offset: Optional["Span"] = None
context_offset: Optional["Span"] = None
meta: Dict[str, Any] = field(default_factory=dict)
ExtractedTableAnswer
Overview
ExtractedTableAnswer
is a subclass of Answer
focused on table question answering. It deals explicitly with answers derived from Documents, offering more detailed attributes.
Key Features
- Includes reference to the originating
Document
. - Score attribute to quantify the answer's confidence level.
- Cells attributes to work with table question answering.
Attributes
@dataclass
class ExtractedTableAnswer:
query: str
score: float
data: Optional[str] = None
document: Optional[Document] = None
context: Optional[DataFrame] = None
document_cells: List["Cell"] = field(default_factory=list)
context_cells: List["Cell"] = field(default_factory=list)
meta: Dict[str, Any] = field(default_factory=dict)
GeneratedAnswer
Overview
GeneratedAnswer
extends the Answer
class to accommodate answers generated from multiple Documents.
Key Features
- Handles string-type data.
- Links to a list of
Document
objects, enhancing answer traceability.
Attributes
@dataclass
class GeneratedAnswer:
data: str
query: str
documents: List[Document]
meta: Dict[str, Any] = field(default_factory=dict)
ChatMessage
Overview
ChatMessage
is the central abstraction for chat-based LLM that encompasses message content, sender's role, and additional attributes. It follows the ChatML format.
Key Features
- Detailed role categorization using
ChatRole
enum (assistant, user, system, function). - Optional name attribute for function calls.
- Metadata support for enriched message context.
- Role-specific message creation methods (e.g.,
from_assistant
,from_user
).
Attributes
@dataclass
class ChatMessage:
content: str
role: ChatRole
name: Optional[str]
metadata: Dict[str, Any] = field(default_factory=dict, hash=False)
Document
Overview
Document
represents a central data abstraction in Haystack, capable of holding text, tables, and binary data.
Key Features
- Unique ID for each document.
- Multiple content types are supported: text, dataframe, binary (
blob
). - Custom metadata and scoring for advanced document management.
- Optional embedding for AI-based applications.
Attributes
@dataclass
class Document(metaclass=_BackwardCompatible):
id: str = field(default="")
content: Optional[str] = field(default=None)
dataframe: Optional[pandas.DataFrame] = field(default=None)
blob: Optional[ByteStream] = field(default=None)
meta: Dict[str, Any] = field(default_factory=dict)
score: Optional[float] = field(default=None)
embedding: Optional[List[float]] = field(default=None)
StreamingChunk
Overview
StreamingChunk
represents a partially streamed LLM response, enabling real-time LLM response.
Key Features
- String-based content representation.
- Accompanying metadata for additional context and management.
Attributes
class StreamingChunk:
content: str
metadata: Dict[str, Any] = field(default_factory=dict, hash=False)
SparseEmbedding
Overview
The SparseEmbedding
class represents a sparse embedding: a vector where most values are zeros.
Attributes
indices
: List of indices of non-zero elements in the embedding.values
: List of values of non-zero elements in the embedding.
Updated 7 months ago