Skip to main content
Version: 2.30

TwelveLabs

haystack_integrations.components.converters.twelvelabs.video_converter

TwelveLabsVideoConverter

Converts videos to Haystack Documents using TwelveLabs Pegasus.

Pegasus is a video-language model that analyzes a video on the fly (its visuals and its own audio ASR) and returns text. Each source video becomes one Document whose content is Pegasus's analysis (e.g. a description plus a transcript) — no frame extraction or separate transcription step.

Sources may be publicly accessible direct video URLs or local file paths (uploaded to TwelveLabs, up to 200 MB).

Usage example

python
from haystack_integrations.components.converters.twelvelabs import TwelveLabsVideoConverter

# Set the TWELVELABS_API_KEY environment variable
converter = TwelveLabsVideoConverter()
result = converter.run(sources=["https://example.com/clip.mp4"])
print(result["documents"][0].content)

init

python
__init__(
*,
api_key: Secret = Secret.from_env_var("TWELVELABS_API_KEY"),
model: str = DEFAULT_MODEL,
prompt: str = DEFAULT_PROMPT,
temperature: float = 0.2,
max_tokens: int = 16384
) -> None

Create a TwelveLabsVideoConverter.

Parameters:

  • api_key (Secret) – The TwelveLabs API key. Read from the TWELVELABS_API_KEY environment variable by default.
  • model (str) – The Pegasus model name (pegasus1.5 or pegasus1.2).
  • prompt (str) – The analysis prompt sent to Pegasus for each video.
  • temperature (float) – Sampling temperature (0-1).
  • max_tokens (int) – Maximum output tokens per analysis.

to_dict

python
to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

  • dict[str, Any] – Dictionary with serialized data.

from_dict

python
from_dict(data: dict[str, Any]) -> TwelveLabsVideoConverter

Deserializes the component from a dictionary.

Parameters:

  • data (dict[str, Any]) – Dictionary to deserialize from.

Returns:

  • TwelveLabsVideoConverter – Deserialized component.

run

python
run(
sources: list[str],
meta: dict[str, Any] | list[dict[str, Any]] | None = None,
) -> dict[str, list[Document]]

Convert videos to Documents with Pegasus.

Parameters:

  • sources (list[str]) – Video sources — publicly accessible direct video URLs or local file paths.
  • meta (dict[str, Any] | list[dict[str, Any]] | None) – Optional metadata to attach to the produced Documents. Either a single dict applied to all, or a list aligned with sources.

Returns:

  • dict[str, list[Document]] – A dictionary with key documents: the produced Documents.

haystack_integrations.components.embedders.twelvelabs.document_embedder

TwelveLabsDocumentEmbedder

Embeds the text content of Documents using TwelveLabs Marengo.

Computes a Marengo embedding for each Document's content and stores it on Document.embedding. Because Marengo embeds text, images, audio, and video into one shared space, these embeddings support cross-modal retrieval.

Usage example

python
from haystack import Document
from haystack_integrations.components.embedders.twelvelabs import TwelveLabsDocumentEmbedder

# Set the TWELVELABS_API_KEY environment variable
doc_embedder = TwelveLabsDocumentEmbedder()
docs = [Document(content="a cat playing piano")]
docs = doc_embedder.run(documents=docs)["documents"]
print(docs[0].embedding)

init

python
__init__(
*,
api_key: Secret = Secret.from_env_var("TWELVELABS_API_KEY"),
model: str = DEFAULT_MODEL,
prefix: str = "",
suffix: str = "",
batch_size: int = 32,
progress_bar: bool = True,
meta_fields_to_embed: list[str] | None = None,
embedding_separator: str = "\n"
) -> None

Create a TwelveLabsDocumentEmbedder.

Parameters:

  • api_key (Secret) – The TwelveLabs API key. Read from the TWELVELABS_API_KEY environment variable by default.
  • model (str) – The Marengo model name.
  • prefix (str) – A string to add to the beginning of each text before embedding.
  • suffix (str) – A string to add to the end of each text before embedding.
  • batch_size (int) – Number of Documents per batch; within a batch run_async embeds concurrently.
  • progress_bar (bool) – Whether to show a progress bar while embedding. Can be helpful to disable in production deployments to keep the logs clean.
  • meta_fields_to_embed (list[str] | None) – List of meta fields that should be embedded along with the Document text.
  • embedding_separator (str) – Separator used to concatenate the meta fields to the Document text.

to_dict

python
to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

  • dict[str, Any] – Dictionary with serialized data.

from_dict

python
from_dict(data: dict[str, Any]) -> TwelveLabsDocumentEmbedder

Deserializes the component from a dictionary.

Parameters:

  • data (dict[str, Any]) – Dictionary to deserialize from.

Returns:

  • TwelveLabsDocumentEmbedder – Deserialized component.

run

python
run(documents: list[Document]) -> dict[str, Any]

Embed a list of Documents.

Parameters:

  • documents (list[Document]) – The Documents to embed (their content is embedded).

Returns:

  • dict[str, Any] – A dictionary with keys:
  • documents: New Documents that are copies of the inputs with embedding populated.
  • meta: Metadata about the request (the model used).

Raises:

  • TypeError – If the input is not a list of Documents.

run_async

python
run_async(documents: list[Document]) -> dict[str, Any]

Asynchronously embed a list of Documents.

Documents within each batch of batch_size are embedded concurrently.

Parameters:

  • documents (list[Document]) – The Documents to embed.

Returns:

  • dict[str, Any] – A dictionary with keys documents (copies with embedding populated) and meta.

Raises:

  • TypeError – If the input is not a list of Documents.

haystack_integrations.components.embedders.twelvelabs.text_embedder

TwelveLabsTextEmbedder

Embeds strings using TwelveLabs Marengo.

Marengo embeds text, images, audio, and video into a single shared vector space, so embeddings from this component are directly comparable (cosine similarity) with image/video embeddings from the same model — enabling cross-modal retrieval. Use it to embed a query before searching a document store populated with Marengo embeddings.

Usage example

python
from haystack_integrations.components.embedders.twelvelabs import TwelveLabsTextEmbedder

# Set the TWELVELABS_API_KEY environment variable
text_embedder = TwelveLabsTextEmbedder()
result = text_embedder.run(text="a cat playing piano")
print(result["embedding"])

init

python
__init__(
*,
api_key: Secret = Secret.from_env_var("TWELVELABS_API_KEY"),
model: str = DEFAULT_MODEL,
prefix: str = "",
suffix: str = ""
) -> None

Create a TwelveLabsTextEmbedder.

Parameters:

  • api_key (Secret) – The TwelveLabs API key. Read from the TWELVELABS_API_KEY environment variable by default.
  • model (str) – The Marengo model name.
  • prefix (str) – A string to add to the beginning of the text before embedding.
  • suffix (str) – A string to add to the end of the text before embedding.

to_dict

python
to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

  • dict[str, Any] – Dictionary with serialized data.

from_dict

python
from_dict(data: dict[str, Any]) -> TwelveLabsTextEmbedder

Deserializes the component from a dictionary.

Parameters:

  • data (dict[str, Any]) – Dictionary to deserialize from.

Returns:

  • TwelveLabsTextEmbedder – Deserialized component.

run

python
run(text: str) -> dict[str, Any]

Embed a single string.

Parameters:

  • text (str) – The string to embed.

Returns:

  • dict[str, Any] – A dictionary with keys:
  • embedding: The embedding vector for the input string.
  • meta: Metadata about the request (the model used).

Raises:

  • TypeError – If the input is not a string.

run_async

python
run_async(text: str) -> dict[str, Any]

Asynchronously embed a single string.

Parameters:

  • text (str) – The string to embed.

Returns:

  • dict[str, Any] – A dictionary with keys embedding and meta.

Raises:

  • TypeError – If the input is not a string.