Module whisper_transcriber

WhisperTranscriber

class WhisperTranscriber(BaseComponent)

Transcribes audio files using OpenAI's Whisper. This class supports two underlying implementations:

API (default): Uses the OpenAI API and requires an API key. See the [OpenAI blog post](https://beta.openai.com/docs/api-reference/whisper for more details.
Local (requires installing Whisper): Uses the local installation of Whisper.

To use Whisper locally, install it following the instructions on the Whisper GitHub repo and omit the api_key parameter.

To use the API implementation, provide an api_key. You can get one by signing up for an OpenAI account.

For the supported audio formats, languages, and other parameters, see the Whisper API documentation and the official Whisper github repo.

WhisperTranscriber.init

def __init__(api_key: Optional[str] = None,
             model_name_or_path: WhisperModel = "medium",
             device: Optional[Union[str, "torch.device"]] = None,
             api_base: str = "https://api.openai.com/v1") -> None

Creates a WhisperTranscriber instance.

Arguments:

api_key: OpenAI API key. If None, a local installation of Whisper is used.
model_name_or_path: Name of the model to use. If using a local installation of Whisper, set this to one of the following values: "tiny", "small", "medium", "large", "large-v2". If using the API, set this value to: "whisper-1" (default).
device: Device to use for inference. Only used if you're using a local installation of Whisper. If None, the device is automatically selected.
api_base: The OpenAI API Base url, defaults to https://api.openai.com/v1.

WhisperTranscriber.transcribe

def transcribe(audio_file: Union[str, BinaryIO],
               language: Optional[str] = None,
               return_segments: bool = False,
               translate: bool = False,
               **kwargs) -> Dict[str, Any]

Transcribe an audio file.

Arguments:

audio_file: Path to the audio file or a binary file-like object.
language: Language of the audio file. If None, the language is automatically detected.
return_segments: If True, returns the transcription for each segment of the audio file. Supported with local installation of whisper only.
translate: If True, translates the transcription to English.

Returns:

A dictionary containing the transcription text and metadata like timings, segments etc.

WhisperTranscriber.run

def run(query: Optional[str] = None,
        file_paths: Optional[List[str]] = None,
        labels: Optional[MultiLabel] = None,
        documents: Optional[List[Document]] = None,
        meta: Optional[dict] = None)

Transcribe audio files.

Arguments:

query: Ignored
file_paths: List of paths to audio files.
labels: Ignored
documents: Ignored
meta: Ignored

Returns:

A dictionary containing a list of Document objects, one for each input file.

WhisperTranscriber.run_batch

def run_batch(queries: Optional[Union[str, List[str]]] = None,
              file_paths: Optional[List[str]] = None,
              labels: Optional[Union[MultiLabel, List[MultiLabel]]] = None,
              documents: Optional[Union[List[Document],
                                        List[List[Document]]]] = None,
              meta: Optional[Union[Dict[str, Any], List[Dict[str,
                                                             Any]]]] = None,
              params: Optional[dict] = None,
              debug: Optional[bool] = None)

Transcribe audio files.

Arguments:

queries: Ignored
file_paths: List of paths to audio files.
labels: Ignored
documents: Ignored
meta: Ignored
params: Ignored
debug: Ignored

Module whisper_transcriber

WhisperTranscriber

WhisperTranscriber.__init__

WhisperTranscriber.transcribe

WhisperTranscriber.run

WhisperTranscriber.run_batch

WhisperTranscriber.init