Jump to Content
Documentation
API Reference
📓 Tutorials
🧑🍳 Cookbook
🤝 Integrations
💜 Discord
🎨 Studio
v2.0
v2.1
v2.2
v2.3
v2.4
v2.5
v2.6
v2.7
v2.8
v2.9
v2.10
v2.11
v2.12-unstable
Documentation
Moon (Dark Mode)
Sun (Light Mode)
v2.11
Documentation
API Reference
📓 Tutorials
🧑🍳 Cookbook
🤝 Integrations
💜 Discord
🎨 Studio
Search
Overview
Introduction to Haystack
Get Started
Installation
Migration Guide
Telemetry
Breaking Change Policy
FAQ
Haystack Concepts
Haystack Concepts Overview
Data Classes
ChatMessage
Pipelines
AsyncPipeline
Creating Pipelines
Visualizing Pipelines
Serializing Pipelines
Debugging Pipelines
Pipeline Templates
Components
Creating Custom Components
Document Store
Creating Custom Document Stores
Choosing a Document Store
Agents
Metadata Filtering
Device Management
Secret Management
Jinja Templates
Introduction to Integrations
Experimental Package
Document Stores
InMemoryDocumentStore
AstraDocumentStore
AzureAISearchDocumentStore
ChromaDocumentStore
CouchbaseDocumentStore
ElasticsearchDocumentStore
LanceDBDocumentStore
MarqoDocumentStore
MilvusDocumentStore
MongoDBAtlasDocumentStore
Neo4jDocumentStore
OpenSearchDocumentStore
PgvectorDocumentStore
PineconeDocumentStore
QdrantDocumentStore
WeaviateDocumentStore
Pipeline Components
Audio
LocalWhisperTranscriber
RemoteWhisperTranscriber
External Integrations
Builders
AnswerBuilder
ChatPromptBuilder
PromptBuilder
Caching
CacheChecker
Classifiers
DocumentLanguageClassifier
TransformersZeroShotDocumentClassifier
Connectors
JinaReaderConnector
LangfuseConnector
OpenAPIConnector
OpenAPIServiceConnector
WeaveConnector
External Integrations
Converters
AzureOCRDocumentConverter
CSVToDocument
DOCXToDocument
HTMLToDocument
JSONConverter
MarkdownToDocument
MSGToDocument
OpenAPIServiceToFunctions
OutputAdapter
PDFMinerToDocument
PPTXToDocument
PyPDFToDocument
TikaDocumentConverter
TextFileToDocument
UnstructuredFileConverter
XLSXToDocument
External Integrations
Embedders
Choosing the Right Embedder
AmazonBedrockTextEmbedder
AmazonBedrockDocumentEmbedder
AzureOpenAITextEmbedder
AzureOpenAIDocumentEmbedder
CohereTextEmbedder
CohereDocumentEmbedder
FastembedTextEmbedder
FastembedDocumentEmbedder
FastembedSparseTextEmbedder
FastembedSparseDocumentEmbedder
HuggingFaceAPITextEmbedder
HuggingFaceAPIDocumentEmbedder
JinaTextEmbedder
JinaDocumentEmbedder
MistralTextEmbedder
MistralDocumentEmbedder
NvidiaTextEmbedder
NvidiaDocumentEmbedder
OllamaTextEmbedder
OllamaDocumentEmbedder
OpenAITextEmbedder
OpenAIDocumentEmbedder
OptimumTextEmbedder
OptimumDocumentEmbedder
SentenceTransformersTextEmbedder
SentenceTransformersDocumentEmbedder
STACKITTextEmbedder
STACKITDocumentEmbedder
External Integrations
Evaluators
AnswerExactMatchEvaluator
ContextRelevanceEvaluator
DeepEvalEvaluator
DocumentMAPEvaluator
DocumentMRREvaluator
DocumentNDCGEvaluator
DocumentRecallEvaluator
FaithfulnessEvaluator
LLMEvaluator
RagasEvaluator
SASEvaluator
External Integrations
Extractors
LLMMetadataExtractor
NamedEntityExtractor
Fetchers
LinkContentFetcher
External Integrations
Generators
Guides to Generators
Choosing the Right Generator
Generators vs Chat Generators
Function Calling
AmazonBedrockChatGenerator
AmazonBedrockGenerator
AnthropicChatGenerator
AnthropicVertexChatGenerator
AnthropicGenerator
AzureOpenAIChatGenerator
AzureOpenAIGenerator
CohereChatGenerator
CohereGenerator
GoogleAIGeminiChatGenerator
GoogleAIGeminiGenerator
HuggingFaceAPIChatGenerator
HuggingFaceAPIGenerator
HuggingFaceLocalChatGenerator
HuggingFaceLocalGenerator
LlamaCppChatGenerator
LlamaCppGenerator
MistralChatGenerator
NvidiaGenerator
OllamaChatGenerator
OllamaGenerator
OpenAIChatGenerator
OpenAIGenerator
SagemakerGenerator
STACKITChatGenerator
VertexAICodeGenerator
VertexAIGeminiChatGenerator
VertexAIGeminiGenerator
VertexAIImageCaptioner
VertexAIImageGenerator
VertexAIImageQA
VertexAITextGenerator
DALLEImageGenerator
External Integrations
Joiners
AnswerJoiner
BranchJoiner
DocumentJoiner
ListJoiner
StringJoiner
PreProcessors
CSVDocumentCleaner
CSVDocumentSplitter
DocumentCleaner
DocumentSplitter
RecursiveSplitter
TextCleaner
Rankers
CohereRanker
FastembedRanker
JinaRanker
LostInTheMiddleRanker
MetaFieldRanker
MetaFieldGroupingRanker
NvidiaRanker
TransformersSimilarityRanker
SentenceTransformersDiversityRanker
External Integrations
Readers
ExtractiveReader
Retrievers
AstraEmbeddingRetriever
AzureAISearchEmbeddingRetriever
AzureAISearchBM25Retriever
AzureAISearchHybridRetriever
ChromaEmbeddingRetriever
ChromaQueryTextRetriever
ElasticsearchBM25Retriever
ElasticsearchEmbeddingRetriever
InMemoryBM25Retriever
InMemoryEmbeddingRetriever
FilterRetriever
MongoDBAtlasEmbeddingRetriever
OpenSearchBM25Retriever
OpenSearchEmbeddingRetriever
PgvectorEmbeddingRetriever
PgvectorKeywordRetriever
PineconeEmbeddingRetriever
QdrantEmbeddingRetriever
QdrantSparseEmbeddingRetriever
QdrantHybridRetriever
SentenceWindowRetriever
SnowflakeTableRetriever
WeaviateBM25Retriever
WeaviateEmbeddingRetriever
Routers
ConditionalRouter
FileTypeRouter
MetadataRouter
TextLanguageRouter
TransformersTextRouter
TransformersZeroShotTextRouter
Samplers
TopPSampler
Tool Components
ToolInvoker
Validators
JsonSchemaValidator
WebSearch
SearchApiWebSearch
SerperDevWebSearch
External Integrations
Writers
DocumentWriter
Tools
Tool
ComponentTool
MCPTool
optimization
Evaluation
Model-Based Evaluation
Statistical Evaluation
Advanced RAG Techniques
Hypothetical Document Embeddings (HyDE)
Development
Enabling GPU Acceleration
Tracing
Logging
Hayhooks
Deployment
Docker
Kubernetes
OpenShift
External Integrations
Suggest
Use `DocumentCleaner` to make text documents more readable. It removes extra whitespaces, empty lines, specified substrings, regexes, page headers, and footers in this particular order. This is useful for preparing the documents for further processing by LLMs.