Transcribes audio files.
Module whisper_local
LocalWhisperTranscriber
Transcribes audio files using OpenAI's Whisper model on your local machine.
For the supported audio formats, languages, and other parameters, see the
Whisper API documentation and the official Whisper
GitHub repository.
Usage example
from haystack.components.audio import LocalWhisperTranscriber
whisper = LocalWhisperTranscriber(model="small")
whisper.warm_up()
transcription = whisper.run(sources=["path/to/audio/file"])LocalWhisperTranscriber.__init__
def __init__(model: WhisperLocalModel = "large",
device: Optional[ComponentDevice] = None,
whisper_params: Optional[dict[str, Any]] = None)Creates an instance of the LocalWhisperTranscriber component.
Arguments:
model: The name of the model to use. Set to one of the following models:
"tiny", "base", "small", "medium", "large" (default).
For details on the models and their modifications, see the
Whisper documentation.device: The device for loading the model. IfNone, automatically selects the default device.
LocalWhisperTranscriber.warm_up
def warm_up() -> NoneLoads the model in memory.
LocalWhisperTranscriber.to_dict
def to_dict() -> dict[str, Any]Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
LocalWhisperTranscriber.from_dict
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "LocalWhisperTranscriber"Deserializes the component from a dictionary.
Arguments:
data: The dictionary to deserialize from.
Returns:
The deserialized component.
LocalWhisperTranscriber.run
@component.output_types(documents=list[Document])
def run(sources: list[Union[str, Path, ByteStream]],
whisper_params: Optional[dict[str, Any]] = None)Transcribes a list of audio files into a list of documents.
Arguments:
sources: A list of paths or binary streams to transcribe.whisper_params: For the supported audio formats, languages, and other parameters, see the
Whisper API documentation and the official Whisper
GitHup repo.
Returns:
A dictionary with the following keys:
documents: A list of documents where each document is a transcribed audio file. The content of
the document is the transcription text, and the document's metadata contains the values returned by
the Whisper model, such as the alignment data and the path to the audio file used
for the transcription.
LocalWhisperTranscriber.transcribe
def transcribe(sources: list[Union[str, Path, ByteStream]],
**kwargs) -> list[Document]Transcribes the audio files into a list of Documents, one for each input file.
For the supported audio formats, languages, and other parameters, see the
Whisper API documentation and the official Whisper
github repo.
Arguments:
sources: A list of paths or binary streams to transcribe.
Returns:
A list of Documents, one for each file.
Module whisper_remote
RemoteWhisperTranscriber
Transcribes audio files using the OpenAI's Whisper API.
The component requires an OpenAI API key, see the
OpenAI documentation for more details.
For the supported audio formats, languages, and other parameters, see the
Whisper API documentation.
Usage example
from haystack.components.audio import RemoteWhisperTranscriber
whisper = RemoteWhisperTranscriber(api_key=Secret.from_token("<your-api-key>"), model="tiny")
transcription = whisper.run(sources=["path/to/audio/file"])RemoteWhisperTranscriber.__init__
def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
model: str = "whisper-1",
api_base_url: Optional[str] = None,
organization: Optional[str] = None,
http_client_kwargs: Optional[dict[str, Any]] = None,
**kwargs)Creates an instance of the RemoteWhisperTranscriber component.
Arguments:
api_key: OpenAI API key.
You can set it with an environment variableOPENAI_API_KEY, or pass with this parameter
during initialization.model: Name of the model to use. Currently accepts onlywhisper-1.organization: Your OpenAI organization ID. See OpenAI's documentation on
Setting Up Your Organization.api_base: An optional URL to use as the API base. For details, see the
OpenAI documentation.http_client_kwargs: A dictionary of keyword arguments to configure a customhttpx.Clientorhttpx.AsyncClient.
For more information, see the HTTPX documentation.kwargs: Other optional parameters for the model. These are sent directly to the OpenAI
endpoint. See OpenAI documentation for more details.
Some of the supported parameters are:language: The language of the input audio.
Provide the input language in ISO-639-1 format
to improve transcription accuracy and latency.prompt: An optional text to guide the model's
style or continue a previous audio segment.
The prompt should match the audio language.response_format: The format of the transcript
output. This component only supportsjson.temperature: The sampling temperature, between 0
and 1. Higher values like 0.8 make the output more
random, while lower values like 0.2 make it more
focused and deterministic. If set to 0, the model
uses log probability to automatically increase the
temperature until certain thresholds are hit.
RemoteWhisperTranscriber.to_dict
def to_dict() -> dict[str, Any]Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
RemoteWhisperTranscriber.from_dict
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "RemoteWhisperTranscriber"Deserializes the component from a dictionary.
Arguments:
data: The dictionary to deserialize from.
Returns:
The deserialized component.
RemoteWhisperTranscriber.run
@component.output_types(documents=list[Document])
def run(sources: list[Union[str, Path, ByteStream]])Transcribes the list of audio files into a list of documents.
Arguments:
sources: A list of file paths orByteStreamobjects containing the audio files to transcribe.
Returns:
A dictionary with the following keys:
documents: A list of documents, one document for each file.
The content of each document is the transcribed text.
