DocumentToSpeech

DocumentToSpeech lives in the haystack-extras Github repo, it's not part of Haystack core. This node is experimental because of the data classes it uses (SpeechDocument). Bear in mind that they might change in the future.


Position in a Pipeline	The last node in a document search pipeline, after a Retriever in a single-Retriever pipeline; or at the end of an indexing pipeline, before the DocumentStore
Input	Document
Output	SpeechDocument
Classes	DocumentToSpeech

Installation

DocumentToSpeech is not installed as part of Haystack core. It lives in a separate, haystack-extras, repo and you need to install it separately:

# First, install the audio system dependencies:
sudo apt-get install libsndfile1 ffmpeg

# Then, install the node:
pip install farm-haystack-text2speech

Usage

To initialize DocumentToSpeech, run:

from text2speech import DocumentToSpeech

model_name = 'espnet/kan-bayashi_ljspeech_vits'
answer_dir = './generated_audio_answers'

audio_document = DocumentToSpeech(model_name_or_path=model_name, generated_audio_dir=answer_dir)

To use DocumentToSpeech in a pipeline, run:

from text2speech import DocumentToSpeech

retriever = BM25Retriever(document_store=document_store)
document2speech = DocumentToSpeech(
    model_name_or_path="espnet/kan-bayashi_ljspeech_vits",
    generated_audio_dir=Path(__file__).parent / "audio_documents",
    )

audio_pipeline = Pipeline()
audio_pipeline.add_node(retriever, name="Retriever", inputs=["Query"])
audio_pipeline.add_node(document2speech, name="DocumentToSpeech", inputs=["Retriever"])

Here's an example of an indexing pipeline with DocumentToSpeech:

file_paths = [p for p in Path(documents_path).glob("**/*")]

indexing_pipeline = Pipeline()

classifier = FileTypeClassifier()
indexing_pipeline.add_node(classifier, name="classifier", inputs=["File"])

text_converter = TextConverter(remove_numeric_tables=True)
indexing_pipeline.add_node(text_converter, name="text_converter", inputs=["classifier.output_1"])

preprocessor = PreProcessor(
        clean_whitespace=True,
        clean_empty_lines=True,
        split_length=100,
        split_overlap=50,
        split_respect_sentence_boundary=True,
)
indexing_pipeline.add_node(preprocessor, name="preprocessor", inputs=["text_converter"])

doc2speech = DocumentToSpeech(model_name_or_path="espnet/kan-bayashi_ljspeech_vits", generated_audio_dir=Path("./audio_documents"))
indexing_pipeline.add_node(doc2speech, name="doc2speech", inputs=["preprocessor"])

document_store = ElasticsearchDocumentStore(host="localhost", username="", password="", index="document")
indexing_pipeline.add_node(document_store, name="document_store", inputs=["doc2speech"])

indexing_pipeline.run(file_paths=file_paths, meta=files_metadata)