DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio (Waitlist)
API Reference

Transcribes audio files.

Module whisper_local

LocalWhisperTranscriber

Transcribes audio files using OpenAI's Whisper model in your local machine.

For the supported audio formats, languages, and other parameters, see the Whisper API documentation and the official Whisper github repository.

Usage example:

from haystack.components.audio import LocalWhisperTranscriber

whisper = LocalWhisperTranscriber(model="small")
whisper.warm_up()
transcription = whisper.run(audio_files=["path/to/audio/file"])

LocalWhisperTranscriber.__init__

def __init__(model: WhisperLocalModel = "large",
             device: Optional[ComponentDevice] = None,
             whisper_params: Optional[Dict[str, Any]] = None)

Creates an instance of the LocalWhisperTranscriber component.

Arguments:

  • model (Literal["tiny", "small", "medium", "large", "large-v2"]): Name of the model to use. Set it to one of the following values:
  • device: The device on which the model is loaded. If None, the default device is automatically selected.

LocalWhisperTranscriber.warm_up

def warm_up() -> None

Loads the model in memory.

LocalWhisperTranscriber.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

LocalWhisperTranscriber.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "LocalWhisperTranscriber"

Deserializes the component from a dictionary.

Arguments:

  • data: The dictionary to deserialize from.

Returns:

The deserialized component.

LocalWhisperTranscriber.run

@component.output_types(documents=List[Document])
def run(sources: List[Union[str, Path, ByteStream]],
        whisper_params: Optional[Dict[str, Any]] = None)

Transcribes the audio files into a list of Documents, one for each input file.

For the supported audio formats, languages, and other parameters, see the Whisper API documentation and the official Whisper github repo.

Arguments:

  • audio_files: A list of paths or binary streams to transcribe.

Returns:

A dictionary with the following keys:

  • documents: A list of Documents, one for each file. The content of the document is the transcription text, while the document's metadata contains the values returned by the Whisper model, such as the alignment data and the path to the audio file used for the transcription.

LocalWhisperTranscriber.transcribe

def transcribe(sources: List[Union[str, Path, ByteStream]],
               **kwargs) -> List[Document]

Transcribes the audio files into a list of Documents, one for each input file.

For the supported audio formats, languages, and other parameters, see the Whisper API documentation and the official Whisper github repo.

Arguments:

  • audio_files: A list of paths or binary streams to transcribe.

Returns:

A list of Documents, one for each file.

Module whisper_remote

RemoteWhisperTranscriber

Transcribes audio files using the Whisper API from OpenAI.

The component requires an API key, see the relative OpenAI documentation for more details. For the supported audio formats, languages, and other parameters, see the Whisper API documentation

Usage example:

from haystack.components.audio import RemoteWhisperTranscriber

whisper = RemoteWhisperTranscriber(api_key=Secret.from_token("<your-api-key>"), model="tiny")
transcription = whisper.run(sources=["path/to/audio/file"])

RemoteWhisperTranscriber.__init__

def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
             model: str = "whisper-1",
             api_base_url: Optional[str] = None,
             organization: Optional[str] = None,
             **kwargs)

Creates an instance of the RemoteWhisperTranscriber component.

Arguments:

  • api_key: OpenAI API key.
  • model: Name of the model to use. It now accepts only whisper-1.
  • organization: The Organization ID. See production best practices.
  • api_base: An optional URL to use as the API base. See OpenAI docs.
  • kwargs: Other parameters to use for the model. These parameters are all sent directly to the OpenAI endpoint. See OpenAI documentation for more details. Some of the supported parameters:
  • language: The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.
  • prompt: An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.
  • response_format: The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt. Defaults to "json". Currently only "json" is supported.
  • temperature: The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.

RemoteWhisperTranscriber.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

RemoteWhisperTranscriber.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "RemoteWhisperTranscriber"

Deserializes the component from a dictionary.

Arguments:

  • data: The dictionary to deserialize from.

Returns:

The deserialized component.

RemoteWhisperTranscriber.run

@component.output_types(documents=List[Document])
def run(sources: List[Union[str, Path, ByteStream]])

Transcribes the audio files into a list of Documents, one for each input file.

Arguments:

  • sources: A list of file paths or ByteStreams containing the audio files to transcribe.

Returns:

A dictionary with the following keys:

  • documents: A list of Documents, one for each file. The content of the document is the transcribed text.