WhisperTranscriber
Use WhisperTranscriber to transcribe audio files using OpenAI's Whisper model.
Position in a Pipeline | Wherever there's a need to transcribe a file. |
Input | Either audio file path or binary |
Output | When used in a pipeline: Document When used stand-alone: Dictionary with transcription fields |
Classes | WhisperTranscriber |
WhisperTranscriber supports two modes of operation:
- API (default): Uses the OpenAI API and requires an API key.
- Local (requires installing Whisper): Uses the local installation of Whisper.
Usage
Stand-Alone
You can use Whisper as a stand-alone node to transcribe audio files.
To use WhisperTranscriber, provide an OpenAI API key. You can get one by signing up for an OpenAI account.
To run Whisper locally, install it following the instructions on the Whisper GitHub repo and omit the api_key parameter.
These examples show how to run WhisperTranscriber in the API and the local mode:
from haystack.nodes.whisper import WhisperTranscriber
whisper = WhisperTranscriber(api_key="YOUR_API_KEY")
transcription = whisper.transcribe(audio_file="path/to/audio/file")
# As a prerequisite, you must install Whisper from OpenAI GitHub repo
# This example skips this step
# In the local mode, WhisperTranscriber works on both a CPU and a GPU
# without any additional settings
from haystack.nodes.whisper import WhisperTranscriber
whisper = WhisperTranscriber()
transcription = whisper.transcribe(audio_file="path/to/audio/file")
The transcribe()
method transcribes audio files. You can provide the audio file as a path or as a binary file-like object. You can set the language of the file with the language
parameter. If you don't specify the language, WhisperTranscriber automatically detects it.
By default, the transcribe()
method returns the transcription for the entire audio file. To get the transcription for each audio file segment, set the return_segments
parameter to True
.
If the source audio is not in English, you can translate the transcription to English by setting the translate
parameter to True
.
In a Pipeline
Let's download a short video from YouTube, extract audio from it, and summarize the transcribed text using PromptNode. As a prerequisite, install the pytube
packages using pip install
. We'll use WhisperTranscriber in API mode using the OpenAI key (we use it for the PromptNode anyway).
from pytube import YouTube
from haystack.nodes import PromptNode
from haystack.nodes.audio import WhisperTranscriber
from haystack.pipelines import Pipeline
def youtube2audio (url: str):
yt = YouTube(url)
video = yt.streams.filter(abr='160kbps').last()
return video.download()
whisper = WhisperTranscriber(api_key="<your-openai-api-key-here")
prompt_node = PromptNode("text-davinci-003",default_prompt_template="summarization",
api_key="<your-openai-api-key-here>")
file_path = youtube2audio("https://www.youtube.com/watch?v=8jbyxchYblM")
pipeline = Pipeline()
pipeline.add_node(component=whisper, name="whisper", inputs=["File"])
pipeline.add_node(component=prompt_node, name="prompt", inputs=["whisper"])
output = pipeline.run(file_paths=[file_path])
print(output["results"])
>The Fed does not need to raise rates much more as short rates are near inflation.
>The best way to address the issue of full employment and rising wages is to open
>up immigration rather than forcing people out of work and slowing the economy.
Updated over 1 year ago