DocumentToSpeech
Use this node in document retrieval pipelines to convert text Documents into SpeechDocuments. The document's content is read out into an audio file. This page explains how to use this node.
This node is experimental because of the data classes it uses (SpeechDocument
). Bear in mind that they might change in the future.
Position in a Pipeline | The last node in a document search pipeline, after a Retriever in a single-Retriever pipeline; or at the end of an indexing pipeline, before the Document Store |
Input | Document |
Output | SpeechDocument |
Classes | DocumentToSpeech |
Usage
To initialize DocumentToSpeech
, run:
from haystack.nodes import DocumentToSpeech
model_name = 'espnet/kan-bayashi_ljspeech_vits'
answer_dir = './generated_audio_answers'
audio_document = DocumentToSpeech(model_name_or_path=model_name, generated_audio_dir=answer_dir)
To use DocumentToSpeech
in a pipeline, run:
from haystack.nodes import DocumentToSpeech
retriever = BM25Retriever(document_store=document_store)
document2speech = DocumentToSpeech(
model_name_or_path="espnet/kan-bayashi_ljspeech_vits",
generated_audio_dir=Path(__file__).parent / "audio_documents",
)
audio_pipeline = Pipeline()
audio_pipeline.add_node(retriever, name="Retriever", inputs=["Query"])
audio_pipeline.add_node(document2speech, name="DocumentToSpeech", inputs=["Retriever"])
Here's an example of an indexing pipeline with DocumentToSpeech
:
file_paths = [p for p in Path(documents_path).glob("**/*")]
indexing_pipeline = Pipeline()
classifier = FileTypeClassifier()
indexing_pipeline.add_node(classifier, name="classifier", inputs=["File"])
text_converter = TextConverter(remove_numeric_tables=True)
indexing_pipeline.add_node(text_converter, name="text_converter", inputs=["classifier.output_1"])
preprocessor = PreProcessor(
clean_whitespace=True,
clean_empty_lines=True,
split_length=100,
split_overlap=50,
split_respect_sentence_boundary=True,
)
indexing_pipeline.add_node(preprocessor, name="preprocessor", inputs=["text_converter"])
doc2speech = DocumentToSpeech(model_name_or_path="espnet/kan-bayashi_ljspeech_vits", generated_audio_dir=Path("./audio_documents"))
indexing_pipeline.add_node(doc2speech, name="doc2speech", inputs=["preprocessor"])
document_store = ElasticsearchDocumentStore(host="localhost", username="", password="", index="document")
indexing_pipeline.add_node(document_store, name="document_store", inputs=["doc2speech"])
indexing_pipeline.run(file_paths=file_paths, meta=files_metadata)
Updated about 1 year ago