TwelveLabs
haystack_integrations.components.converters.twelvelabs.video_converter
TwelveLabsVideoConverter
Converts videos to Haystack Documents using TwelveLabs Pegasus.
Pegasus is a video-language model that analyzes a video on the fly (its visuals and its own audio ASR) and returns text. Each source video becomes one Document whose content is Pegasus's analysis (e.g. a description plus a transcript) — no frame extraction or separate transcription step.
Sources may be publicly accessible direct video URLs or local file paths (uploaded to TwelveLabs, up to 200 MB).
Usage example
from haystack_integrations.components.converters.twelvelabs import TwelveLabsVideoConverter
# Set the TWELVELABS_API_KEY environment variable
converter = TwelveLabsVideoConverter()
result = converter.run(sources=["https://example.com/clip.mp4"])
print(result["documents"][0].content)
init
__init__(
*,
api_key: Secret = Secret.from_env_var("TWELVELABS_API_KEY"),
model: str = DEFAULT_MODEL,
prompt: str = DEFAULT_PROMPT,
temperature: float = 0.2,
max_tokens: int = 16384
) -> None
Create a TwelveLabsVideoConverter.
Parameters:
- api_key (
Secret) – The TwelveLabs API key. Read from theTWELVELABS_API_KEYenvironment variable by default. - model (
str) – The Pegasus model name (pegasus1.5orpegasus1.2). - prompt (
str) – The analysis prompt sent to Pegasus for each video. - temperature (
float) – Sampling temperature (0-1). - max_tokens (
int) – Maximum output tokens per analysis.
to_dict
Serializes the component to a dictionary.
Returns:
dict[str, Any]– Dictionary with serialized data.
from_dict
Deserializes the component from a dictionary.
Parameters:
- data (
dict[str, Any]) – Dictionary to deserialize from.
Returns:
TwelveLabsVideoConverter– Deserialized component.
run
run(
sources: list[str],
meta: dict[str, Any] | list[dict[str, Any]] | None = None,
) -> dict[str, list[Document]]
Convert videos to Documents with Pegasus.
Parameters:
- sources (
list[str]) – Video sources — publicly accessible direct video URLs or local file paths. - meta (
dict[str, Any] | list[dict[str, Any]] | None) – Optional metadata to attach to the produced Documents. Either a single dict applied to all, or a list aligned withsources.
Returns:
dict[str, list[Document]]– A dictionary with keydocuments: the produced Documents.
haystack_integrations.components.embedders.twelvelabs.document_embedder
TwelveLabsDocumentEmbedder
Embeds the text content of Documents using TwelveLabs Marengo.
Computes a Marengo embedding for each Document's content and stores it on
Document.embedding. Because Marengo embeds text, images, audio, and video
into one shared space, these embeddings support cross-modal retrieval.
Usage example
from haystack import Document
from haystack_integrations.components.embedders.twelvelabs import TwelveLabsDocumentEmbedder
# Set the TWELVELABS_API_KEY environment variable
doc_embedder = TwelveLabsDocumentEmbedder()
docs = [Document(content="a cat playing piano")]
docs = doc_embedder.run(documents=docs)["documents"]
print(docs[0].embedding)
init
__init__(
*,
api_key: Secret = Secret.from_env_var("TWELVELABS_API_KEY"),
model: str = DEFAULT_MODEL,
prefix: str = "",
suffix: str = "",
batch_size: int = 32,
progress_bar: bool = True,
meta_fields_to_embed: list[str] | None = None,
embedding_separator: str = "\n"
) -> None
Create a TwelveLabsDocumentEmbedder.
Parameters:
- api_key (
Secret) – The TwelveLabs API key. Read from theTWELVELABS_API_KEYenvironment variable by default. - model (
str) – The Marengo model name. - prefix (
str) – A string to add to the beginning of each text before embedding. - suffix (
str) – A string to add to the end of each text before embedding. - batch_size (
int) – Number of Documents per batch; within a batchrun_asyncembeds concurrently. - progress_bar (
bool) – Whether to show a progress bar while embedding. Can be helpful to disable in production deployments to keep the logs clean. - meta_fields_to_embed (
list[str] | None) – List of meta fields that should be embedded along with the Document text. - embedding_separator (
str) – Separator used to concatenate the meta fields to the Document text.
to_dict
Serializes the component to a dictionary.
Returns:
dict[str, Any]– Dictionary with serialized data.
from_dict
Deserializes the component from a dictionary.
Parameters:
- data (
dict[str, Any]) – Dictionary to deserialize from.
Returns:
TwelveLabsDocumentEmbedder– Deserialized component.
run
Embed a list of Documents.
Parameters:
- documents (
list[Document]) – The Documents to embed (theircontentis embedded).
Returns:
dict[str, Any]– A dictionary with keys:documents: New Documents that are copies of the inputs withembeddingpopulated.meta: Metadata about the request (the model used).
Raises:
TypeError– If the input is not a list of Documents.
run_async
Asynchronously embed a list of Documents.
Documents within each batch of batch_size are embedded concurrently.
Parameters:
- documents (
list[Document]) – The Documents to embed.
Returns:
dict[str, Any]– A dictionary with keysdocuments(copies withembeddingpopulated) andmeta.
Raises:
TypeError– If the input is not a list of Documents.
haystack_integrations.components.embedders.twelvelabs.text_embedder
TwelveLabsTextEmbedder
Embeds strings using TwelveLabs Marengo.
Marengo embeds text, images, audio, and video into a single shared vector space, so embeddings from this component are directly comparable (cosine similarity) with image/video embeddings from the same model — enabling cross-modal retrieval. Use it to embed a query before searching a document store populated with Marengo embeddings.
Usage example
from haystack_integrations.components.embedders.twelvelabs import TwelveLabsTextEmbedder
# Set the TWELVELABS_API_KEY environment variable
text_embedder = TwelveLabsTextEmbedder()
result = text_embedder.run(text="a cat playing piano")
print(result["embedding"])
init
__init__(
*,
api_key: Secret = Secret.from_env_var("TWELVELABS_API_KEY"),
model: str = DEFAULT_MODEL,
prefix: str = "",
suffix: str = ""
) -> None
Create a TwelveLabsTextEmbedder.
Parameters:
- api_key (
Secret) – The TwelveLabs API key. Read from theTWELVELABS_API_KEYenvironment variable by default. - model (
str) – The Marengo model name. - prefix (
str) – A string to add to the beginning of the text before embedding. - suffix (
str) – A string to add to the end of the text before embedding.
to_dict
Serializes the component to a dictionary.
Returns:
dict[str, Any]– Dictionary with serialized data.
from_dict
Deserializes the component from a dictionary.
Parameters:
- data (
dict[str, Any]) – Dictionary to deserialize from.
Returns:
TwelveLabsTextEmbedder– Deserialized component.
run
Embed a single string.
Parameters:
- text (
str) – The string to embed.
Returns:
dict[str, Any]– A dictionary with keys:embedding: The embedding vector for the input string.meta: Metadata about the request (the model used).
Raises:
TypeError– If the input is not a string.
run_async
Asynchronously embed a single string.
Parameters:
- text (
str) – The string to embed.
Returns:
dict[str, Any]– A dictionary with keysembeddingandmeta.
Raises:
TypeError– If the input is not a string.