DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio
API Reference

Transcribes audio files.

Module whisper_local

LocalWhisperTranscriber

Transcribes audio files using OpenAI's Whisper model on your local machine.

For the supported audio formats, languages, and other parameters, see the Whisper API documentation and the official Whisper GitHub repository.

Usage example

from haystack.components.audio import LocalWhisperTranscriber

whisper = LocalWhisperTranscriber(model="small")
whisper.warm_up()
transcription = whisper.run(sources=["path/to/audio/file"])

LocalWhisperTranscriber.__init__

def __init__(model: WhisperLocalModel = "large",
             device: Optional[ComponentDevice] = None,
             whisper_params: Optional[Dict[str, Any]] = None)

Creates an instance of the LocalWhisperTranscriber component.

Arguments:

  • model: The name of the model to use. Set to one of the following models: "tiny", "base", "small", "medium", "large" (default). For details on the models and their modifications, see the Whisper documentation.
  • device: The device for loading the model. If None, automatically selects the default device.

LocalWhisperTranscriber.warm_up

def warm_up() -> None

Loads the model in memory.

LocalWhisperTranscriber.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

LocalWhisperTranscriber.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "LocalWhisperTranscriber"

Deserializes the component from a dictionary.

Arguments:

  • data: The dictionary to deserialize from.

Returns:

The deserialized component.

LocalWhisperTranscriber.run

@component.output_types(documents=List[Document])
def run(sources: List[Union[str, Path, ByteStream]],
        whisper_params: Optional[Dict[str, Any]] = None)

Transcribes a list of audio files into a list of documents.

For the supported audio formats, languages, and other parameters, see the Whisper API documentation and the official Whisper GitHup repo.

Arguments:

  • sources: A list of paths or binary streams to transcribe.

Returns:

A dictionary with the following keys:

  • documents: A list of documents where each document is a transcribed audio file. The content of the document is the transcription text, and the document's metadata contains the values returned by the Whisper model, such as the alignment data and the path to the audio file used for the transcription.

LocalWhisperTranscriber.transcribe

def transcribe(sources: List[Union[str, Path, ByteStream]],
               **kwargs) -> List[Document]

Transcribes the audio files into a list of Documents, one for each input file.

For the supported audio formats, languages, and other parameters, see the Whisper API documentation and the official Whisper github repo.

Arguments:

  • sources: A list of paths or binary streams to transcribe.

Returns:

A list of Documents, one for each file.

Module whisper_remote

RemoteWhisperTranscriber

Transcribes audio files using the OpenAI's Whisper API.

The component requires an OpenAI API key, see the OpenAI documentation for more details. For the supported audio formats, languages, and other parameters, see the Whisper API documentation.

Usage example

from haystack.components.audio import RemoteWhisperTranscriber

whisper = RemoteWhisperTranscriber(api_key=Secret.from_token("<your-api-key>"), model="tiny")
transcription = whisper.run(sources=["path/to/audio/file"])

RemoteWhisperTranscriber.__init__

def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
             model: str = "whisper-1",
             api_base_url: Optional[str] = None,
             organization: Optional[str] = None,
             **kwargs)

Creates an instance of the RemoteWhisperTranscriber component.

Arguments:

  • api_key: OpenAI API key. You can set it with an environment variable OPENAI_API_KEY, or pass with this parameter during initialization.
  • model: Name of the model to use. Currently accepts only whisper-1.
  • organization: Your OpenAI organization ID. See OpenAI's documentation on Setting Up Your Organization.
  • api_base: An optional URL to use as the API base. For details, see the OpenAI documentation.
  • kwargs: Other optional parameters for the model. These are sent directly to the OpenAI endpoint. See OpenAI documentation for more details. Some of the supported parameters are:
  • language: The language of the input audio. Provide the input language in ISO-639-1 format to improve transcription accuracy and latency.
  • prompt: An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.
  • response_format: The format of the transcript output. This component only supports json.
  • temperature: The sampling temperature, between 0 and 1. Higher values like 0.8 make the output more random, while lower values like 0.2 make it more focused and deterministic. If set to 0, the model uses log probability to automatically increase the temperature until certain thresholds are hit.

RemoteWhisperTranscriber.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

RemoteWhisperTranscriber.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "RemoteWhisperTranscriber"

Deserializes the component from a dictionary.

Arguments:

  • data: The dictionary to deserialize from.

Returns:

The deserialized component.

RemoteWhisperTranscriber.run

@component.output_types(documents=List[Document])
def run(sources: List[Union[str, Path, ByteStream]])

Transcribes the list of audio files into a list of documents.

Arguments:

  • sources: A list of file paths or ByteStream objects containing the audio files to transcribe.

Returns:

A dictionary with the following keys:

  • documents: A list of documents, one document for each file. The content of each document is the transcribed text.