Version: 3.0-unstable

TwelveLabsTextEmbedder

This component transforms a string into a vector using the TwelveLabs Marengo multimodal embedding model. Because Marengo embeds text, images, audio, and video into one shared vector space, the resulting embeddings support cross-modal retrieval (for example, searching a video collection with a text query). Use this component to embed a query before searching with an embedding Retriever.


Most common position in a pipeline	Before an embedding Retriever in a query/RAG pipeline
Mandatory init variables	`api_key`: The TwelveLabs API key. Can be set with `TWELVELABS_API_KEY` env var.
Mandatory run variables	`text`: A string
Output variables	`embedding`: A list of float numbers `meta`: A dictionary of metadata
API reference	TwelveLabs
GitHub link	https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/twelvelabs
Package name	`twelvelabs-haystack`

Overview

TwelveLabsTextEmbedder embeds a simple string (such as a query) into a vector. For embedding lists of documents, use the TwelveLabsDocumentEmbedder, which enriches each document with the computed embedding. The default model is marengo3.0.

Because Marengo embeds into a single shared space, embeddings produced from text are directly comparable (cosine similarity) with embeddings of images, audio, and video from the same model.

To start using this integration with Haystack, install the package with:

shell

pip install twelvelabs-haystack

The component uses a TWELVELABS_API_KEY environment variable by default. Otherwise, you can pass an API key at initialization with api_key:

python

from haystack.utils import Secret
from haystack_integrations.components.embedders.twelvelabs import TwelveLabsTextEmbedder

embedder = TwelveLabsTextEmbedder(api_key=Secret.from_token("<your-api-key>"))

To get an API key, head to playground.twelvelabs.io.

Usage

On its own

Here is how you can use the component on its own:

python

from haystack_integrations.components.embedders.twelvelabs import TwelveLabsTextEmbedder

text_embedder = TwelveLabsTextEmbedder()

result = text_embedder.run(text="a cat playing piano")
print(result["embedding"])

# [-0.043398008, -0.025287028, -0.0061081843, ...]
print(result["meta"])

# {'model': 'marengo3.0'}

info

We recommend setting TWELVELABS_API_KEY as an environment variable instead of setting it as a parameter.

In a pipeline

python

from haystack import Document, Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.writers import DocumentWriter
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack_integrations.components.embedders.twelvelabs import (
    TwelveLabsDocumentEmbedder,
    TwelveLabsTextEmbedder,
)

document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")

documents = [
    Document(content="a cat playing piano"),
    Document(content="a dog catching a frisbee at the beach"),
    Document(content="a timelapse of a city skyline at night"),
]

indexing_pipeline = Pipeline()
indexing_pipeline.add_component("embedder", TwelveLabsDocumentEmbedder())
indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store))
indexing_pipeline.connect("embedder", "writer")
indexing_pipeline.run({"embedder": {"documents": documents}})

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", TwelveLabsTextEmbedder())
query_pipeline.add_component(
    "retriever",
    InMemoryEmbeddingRetriever(document_store=document_store),
)
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

result = query_pipeline.run({"text_embedder": {"text": "feline making music"}})
print(result["retriever"]["documents"][0].content)

# a cat playing piano

Overview​

Usage​

On its own​

In a pipeline​

Overview

Usage

On its own

In a pipeline