DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio (Waitlist)
Documentation

WhisperTranscriber

Use WhisperTranscriber to transcribe audio files using OpenAI's Whisper model.

Position in a PipelineWherever there's a need to transcribe a file.
InputEither audio file path or binary
OutputWhen used in a pipeline: Document
When used stand-alone: Dictionary with transcription fields
ClassesWhisperTranscriber

WhisperTranscriber supports two modes of operation:

  • API (default): Uses the OpenAI API and requires an API key.
  • Local (requires installing Whisper): Uses the local installation of Whisper.

Usage

Stand-Alone

You can use Whisper as a stand-alone node to transcribe audio files.

To use WhisperTranscriber, provide an OpenAI API key. You can get one by signing up for an OpenAI account.

To run Whisper locally, install it following the instructions on the Whisper GitHub repo and omit the api_key parameter.

These examples show how to run WhisperTranscriber in the API and the local mode:

from haystack.nodes.whisper import WhisperTranscriber

whisper = WhisperTranscriber(api_key="YOUR_API_KEY")
transcription = whisper.transcribe(audio_file="path/to/audio/file")
# As a prerequisite, you must install Whisper from OpenAI GitHub repo
# This example skips this step
# In the local mode, WhisperTranscriber works on both a CPU and a GPU
# without any additional settings

from haystack.nodes.whisper import WhisperTranscriber

whisper = WhisperTranscriber()
transcription = whisper.transcribe(audio_file="path/to/audio/file")

The transcribe() method transcribes audio files. You can provide the audio file as a path or as a binary file-like object. You can set the language of the file with the language parameter. If you don't specify the language, WhisperTranscriber automatically detects it.

By default, the transcribe() method returns the transcription for the entire audio file. To get the transcription for each audio file segment, set the return_segments parameter to True.

If the source audio is not in English, you can translate the transcription to English by setting the translate parameter to True.

In a Pipeline

Let's download a short video from YouTube, extract audio from it, and summarize the transcribed text using PromptNode. As a prerequisite, install the pytube packages using pip install. We'll use WhisperTranscriber in API mode using the OpenAI key (we use it for the PromptNode anyway).

from pytube import YouTube
from haystack.nodes import PromptNode
from haystack.nodes.audio import WhisperTranscriber
from haystack.pipelines import Pipeline

def youtube2audio (url: str):
    yt = YouTube(url)
    video = yt.streams.filter(abr='160kbps').last()
    return video.download()  

whisper = WhisperTranscriber(api_key="<your-openai-api-key-here")
prompt_node = PromptNode("gpt-3.5-turbo-instruct",default_prompt_template="summarization",
                         api_key="<your-openai-api-key-here>")

file_path = youtube2audio("https://www.youtube.com/watch?v=8jbyxchYblM")

pipeline = Pipeline()
pipeline.add_node(component=whisper, name="whisper", inputs=["File"])
pipeline.add_node(component=prompt_node, name="prompt", inputs=["whisper"])

output = pipeline.run(file_paths=[file_path])

print(output["results"])

>The Fed does not need to raise rates much more as short rates are near inflation. 
>The best way to address the issue of full employment and rising wages is to open 
>up immigration rather than forcing people out of work and slowing the economy.