Version: 3.0

FunASRTranscriber

Transcribe audio files to Haystack Documents using FunASR — a local, open-source speech recognition toolkit supporting 50+ languages.


Most common position in a pipeline	As the first component in an indexing pipeline
Mandatory run variables	`sources`: A list of audio file paths (`str` or `Path`) or `ByteStream` objects
Output variables	`documents`: A list of Haystack Documents, one per source, with transcript text in `content`
API reference	FunASR integration
GitHub link	https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/funasr/src/haystack_integrations/components/audio/funasr/transcriber.py

Overview

FunASRTranscriber uses FunASR, an open-source speech recognition toolkit from Alibaba DAMO Academy, to transcribe audio files into Haystack Document objects. It runs entirely locally — no API key required.

The default model is iic/SenseVoiceSmall, a multilingual model supporting 50+ languages that is 5–10x faster than Whisper. Models are downloaded from ModelScope on first use and cached in ~/.cache/modelscope.

The component accepts audio file paths (str or Path) as well as ByteStream objects. The model is loaded into memory automatically the first time the component runs.

Usage

On its own

python

from haystack_integrations.components.audio.funasr import FunASRTranscriber

transcriber = FunASRTranscriber()

result = transcriber.run(sources=["speech.wav"])
print(result["documents"][0].content)

In a pipeline

python

from haystack import Pipeline
from haystack.components.fetchers import LinkContentFetcher
from haystack_integrations.components.audio.funasr import FunASRTranscriber

pipe = Pipeline()
pipe.add_component("fetcher", LinkContentFetcher())
pipe.add_component("transcriber", FunASRTranscriber())

pipe.connect("fetcher", "transcriber")

result = pipe.run(
    data={
        "fetcher": {
            "urls": ["https://example.com/interview.wav"],
        },
    },
)
print(result["transcriber"]["documents"][0].content)

Overview​

Usage​

On its own​

In a pipeline​

Overview

Usage

On its own

In a pipeline