TwelveLabsTextEmbedder
This component transforms a string into a vector using the TwelveLabs Marengo multimodal embedding model. Because Marengo embeds text, images, audio, and video into one shared vector space, the resulting embeddings support cross-modal retrieval (for example, searching a video collection with a text query). Use this component to embed a query before searching with an embedding Retriever.
| Most common position in a pipeline | Before an embedding Retriever in a query/RAG pipeline |
| Mandatory init variables | api_key: The TwelveLabs API key. Can be set with TWELVELABS_API_KEY env var. |
| Mandatory run variables | text: A string |
| Output variables | embedding: A list of float numbers meta: A dictionary of metadata |
| API reference | TwelveLabs |
| GitHub link | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/twelvelabs |
| Package name | twelvelabs-haystack |
Overview
TwelveLabsTextEmbedder embeds a simple string (such as a query) into a vector. For embedding lists of documents, use the TwelveLabsDocumentEmbedder, which enriches each document with the computed embedding. The default model is marengo3.0.
Because Marengo embeds into a single shared space, embeddings produced from text are directly comparable (cosine similarity) with embeddings of images, audio, and video from the same model.
To start using this integration with Haystack, install the package with:
The component uses a TWELVELABS_API_KEY environment variable by default. Otherwise, you can pass an API key at initialization with api_key:
from haystack.utils import Secret
from haystack_integrations.components.embedders.twelvelabs import TwelveLabsTextEmbedder
embedder = TwelveLabsTextEmbedder(api_key=Secret.from_token("<your-api-key>"))
To get an API key, head to playground.twelvelabs.io.
Usage
On its own
Here is how you can use the component on its own:
from haystack_integrations.components.embedders.twelvelabs import TwelveLabsTextEmbedder
text_embedder = TwelveLabsTextEmbedder()
result = text_embedder.run(text="a cat playing piano")
print(result["embedding"])
# [-0.043398008, -0.025287028, -0.0061081843, ...]
print(result["meta"])
# {'model': 'marengo3.0'}
We recommend setting TWELVELABS_API_KEY as an environment variable instead of setting it as a parameter.
In a pipeline
from haystack import Document, Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.writers import DocumentWriter
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack_integrations.components.embedders.twelvelabs import (
TwelveLabsDocumentEmbedder,
TwelveLabsTextEmbedder,
)
document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")
documents = [
Document(content="a cat playing piano"),
Document(content="a dog catching a frisbee at the beach"),
Document(content="a timelapse of a city skyline at night"),
]
indexing_pipeline = Pipeline()
indexing_pipeline.add_component("embedder", TwelveLabsDocumentEmbedder())
indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store))
indexing_pipeline.connect("embedder", "writer")
indexing_pipeline.run({"embedder": {"documents": documents}})
query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", TwelveLabsTextEmbedder())
query_pipeline.add_component(
"retriever",
InMemoryEmbeddingRetriever(document_store=document_store),
)
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
result = query_pipeline.run({"text_embedder": {"text": "feline making music"}})
print(result["retriever"]["documents"][0].content)
# a cat playing piano