Skip to main content
Version: 2.31-unstable

TwelveLabsTextEmbedder

This component transforms a string into a vector using the TwelveLabs Marengo multimodal embedding model. Because Marengo embeds text, images, audio, and video into one shared vector space, the resulting embeddings support cross-modal retrieval (for example, searching a video collection with a text query). Use this component to embed a query before searching with an embedding Retriever.

Most common position in a pipelineBefore an embedding Retriever in a query/RAG pipeline
Mandatory init variablesapi_key: The TwelveLabs API key. Can be set with TWELVELABS_API_KEY env var.
Mandatory run variablestext: A string
Output variablesembedding: A list of float numbers

meta: A dictionary of metadata
API referenceTwelveLabs
GitHub linkhttps://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/twelvelabs
Package nametwelvelabs-haystack

Overview

TwelveLabsTextEmbedder embeds a simple string (such as a query) into a vector. For embedding lists of documents, use the TwelveLabsDocumentEmbedder, which enriches each document with the computed embedding. The default model is marengo3.0.

Because Marengo embeds into a single shared space, embeddings produced from text are directly comparable (cosine similarity) with embeddings of images, audio, and video from the same model.

To start using this integration with Haystack, install the package with:

shell
pip install twelvelabs-haystack

The component uses a TWELVELABS_API_KEY environment variable by default. Otherwise, you can pass an API key at initialization with api_key:

python
from haystack.utils import Secret
from haystack_integrations.components.embedders.twelvelabs import TwelveLabsTextEmbedder

embedder = TwelveLabsTextEmbedder(api_key=Secret.from_token("<your-api-key>"))

To get an API key, head to playground.twelvelabs.io.

Usage

On its own

Here is how you can use the component on its own:

python
from haystack_integrations.components.embedders.twelvelabs import TwelveLabsTextEmbedder

text_embedder = TwelveLabsTextEmbedder()

result = text_embedder.run(text="a cat playing piano")
print(result["embedding"])

# [-0.043398008, -0.025287028, -0.0061081843, ...]
print(result["meta"])

# {'model': 'marengo3.0'}
info

We recommend setting TWELVELABS_API_KEY as an environment variable instead of setting it as a parameter.

In a pipeline

python
from haystack import Document, Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.writers import DocumentWriter
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack_integrations.components.embedders.twelvelabs import (
TwelveLabsDocumentEmbedder,
TwelveLabsTextEmbedder,
)

document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")

documents = [
Document(content="a cat playing piano"),
Document(content="a dog catching a frisbee at the beach"),
Document(content="a timelapse of a city skyline at night"),
]

indexing_pipeline = Pipeline()
indexing_pipeline.add_component("embedder", TwelveLabsDocumentEmbedder())
indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store))
indexing_pipeline.connect("embedder", "writer")
indexing_pipeline.run({"embedder": {"documents": documents}})

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", TwelveLabsTextEmbedder())
query_pipeline.add_component(
"retriever",
InMemoryEmbeddingRetriever(document_store=document_store),
)
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

result = query_pipeline.run({"text_embedder": {"text": "feline making music"}})
print(result["retriever"]["documents"][0].content)

# a cat playing piano