RemoteWhisperTranscriber
Use RemoteWhisperTranscriber
to transcribe audio files using OpenAI's Whisper model.
Most common position in a pipeline | As the first component in an indexing pipeline |
Mandatory init variables | "api_key": An OpenAI API key. Can be set with an environment variable OPENAI_API_KEY . |
Mandatory run variables | “sources”: A list of paths or binary streams that you want to transcribe |
Output variables | “documents”: A list of documents |
API reference | Audio |
GitHub link | https://github.com/deepset-ai/haystack/blob/main/haystack/components/audio/whisper_remote.py |
Overview
RemoteWhisperTranscriber
works with OpenAI-compatible clients and isn't limited to just OpenAI as a provider. For example, Groq offers a drop-in replacement that can be used as well. You can set the API key in one of two ways:
- Through the
api_key
initialization parameter, where the key is resolved using Secret API. - By setting it in the
OPENAI_API_KEY
environment variable, which the system will use to access the key.
from haystack.components.audio import RemoteWhisperTranscriber
transcriber = RemoteWhisperTranscriber()
Additionally, the component requires the following parameters to work:
model
specifies the Whisper model.api_base_url
specifies the OpenAI base URL and defaults to"<https://api.openai.com/v1>"
. If you are using Whisper provider other than OpenAI set this parameter according to provider's documentation.
See other optional parameters in our API documentation.
See the Whisper API documentation and the official Whisper GitHub repo for the supported audio formats and languages.
Usage
On its own
Here’s an example of how to use RemoteWhisperTranscriber
to transcribe a local file:
import requests
from haystack.components.audio import RemoteWhisperTranscriber
response = requests.get("https://ia903102.us.archive.org/19/items/100-Best--Speeches/EK_19690725_64kb.mp3")
with open("kennedy_speech.mp3", "wb") as file:
file.write(response.content)
transcriber = RemoteWhisperTranscriber()
transcription = transcriber.run(sources=["./kennedy_speech.mp3"])
print(transcription["documents"][0].content)
In a pipeline
The pipeline below fetches an audio file from a specified URL and transcribes it. It first retrieves the audio file using LinkContentFetcher
, then transcribes the audio into text with RemoteWhisperTranscriber
, and finally outputs the transcription text.
from haystack.components.audio import RemoteWhisperTranscriber
from haystack.components.fetchers import LinkContentFetcher
from haystack import Pipeline
pipe = Pipeline()
pipe.add_component("fetcher", LinkContentFetcher())
pipe.add_component("transcriber", RemoteWhisperTranscriber())
pipe.connect("fetcher", "transcriber")
result = pipe.run(
data={"fetcher": {"urls": ["https://ia903102.us.archive.org/19/items/100-Best--Speeches/EK_19690725_64kb.mp3"]}})
print(result["transcriber"]["documents"][0].content)
Additional References
🧑🍳 Cookbook: Multilingual RAG from a podcast with Whisper, Qdrant and Mistral
Updated about 1 month ago