ElasticsearchDocumentStore
Use an Elasticsearch database with Haystack.
ElasticsearchDocumentStore is excellent if you want to evaluate the performance of different retrieval options (dense vs. sparse) and aim for a smooth transition from PoC to production.
It features the approximate nearest neighbors (ANN) search.
Initialization
Install Elasticsearch and then start an instance. Haystack 2.0 supports Elasticsearch 8.
If you have Docker set up, we recommend pulling the Docker image and running it.
docker pull docker.elastic.co/elasticsearch/elasticsearch:8.11.1
docker run -p 9200:9200 -e "discovery.type=single-node" -e "ES_JAVA_OPTS=-Xms1024m -Xmx1024m" -e "xpack.security.enabled=false" elasticsearch:8.11.1
As an alternative, you can go to Elasticsearch integration GitHub and start a Docker container running Elasticsearch using the provided docker-compose.yml
:
docker compose up
Once you have a running Elasticsearch instance, install the elasticsearch-haystack
integration:
pip install elasticsearch-haystack
Then, initialize an ElasticsearchDocumentStore
object that’s connected to the Elasticsearch instance and writes documents to it:
from haystack_integrations.document_stores.elasticsearch import ElasticsearchDocumentStore
from haystack import Document
document_store = ElasticsearchDocumentStore(hosts = "http://localhost:9200")
document_store.write_documents([
Document(content="This is first"),
Document(content="This is second")
])
print(document_store.count_documents())
Supported Retrievers
ElasticsearchBM25Retriever
: A keyword-based Retriever that fetches documents matching a query from the Document Store.
ElasticsearchEmbeddingRetriever
: Compares the query and document embeddings and fetches the documents most relevant to the query.
Updated 8 months ago