OracleKeywordRetriever
A keyword-based Retriever that fetches documents matching a query from the Oracle Document Store.
| Most common position in a pipeline | 1. Before a PromptBuilder in a RAG pipeline 2. The last component in a keyword search pipeline 3. Before an ExtractiveReader in an extractive QA pipeline |
| Mandatory init variables | document_store: An instance of an OracleDocumentStore |
| Mandatory run variables | query: A string |
| Output variables | documents: A list of documents matching the query |
| API reference | Oracle |
| GitHub link | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/oracle |
| Package name | oracle-haystack |
Overview
The OracleKeywordRetriever is a keyword-based Retriever compatible with OracleDocumentStore. It uses Oracle's DBMS_SEARCH full-text index — automatically created when the document store is initialized — to search documents by keyword relevance.
This retriever works without embeddings, making it suitable for keyword-only pipelines or as the keyword branch of a hybrid search pipeline.
In addition to query, the retriever accepts top_k (maximum documents to return) and filters to narrow the search space.
Installation
To run Oracle Database 23ai locally with Docker:
docker run -d --name oracle23ai \
-p 1521:1521 \
-e ORACLE_PASSWORD=oracle \
container-registry.oracle.com/database/free:latest
Install the Oracle integration for Haystack:
Usage
On its own
This Retriever needs an OracleDocumentStore and indexed documents to run.
from haystack.utils import Secret
from haystack_integrations.document_stores.oracle import (
OracleDocumentStore,
OracleConnectionConfig,
)
from haystack_integrations.components.retrievers.oracle import OracleKeywordRetriever
document_store = OracleDocumentStore(
connection_config=OracleConnectionConfig(
user=Secret.from_env_var("ORACLE_USER"),
password=Secret.from_env_var("ORACLE_PASSWORD"),
dsn=Secret.from_env_var("ORACLE_DSN"),
),
embedding_dim=768,
)
retriever = OracleKeywordRetriever(document_store=document_store)
retriever.run(query="my keyword query")
In a RAG pipeline
from haystack import Document, Pipeline
from haystack.components.builders import ChatPromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack.document_stores.types import DuplicatePolicy
from haystack.utils import Secret
from haystack_integrations.document_stores.oracle import (
OracleDocumentStore,
OracleConnectionConfig,
)
from haystack_integrations.components.retrievers.oracle import OracleKeywordRetriever
prompt_template = [
ChatMessage.from_user(
"""
Given these documents, answer the question.\nDocuments:
{% for doc in documents %}
{{ doc.content }}
{% endfor %}
\nQuestion: {{question}}
\nAnswer:
""",
),
]
document_store = OracleDocumentStore(
connection_config=OracleConnectionConfig(
user=Secret.from_env_var("ORACLE_USER"),
password=Secret.from_env_var("ORACLE_PASSWORD"),
dsn=Secret.from_env_var("ORACLE_DSN"),
),
embedding_dim=768,
)
documents = [
Document(content="There are over 7,000 languages spoken around the world today."),
Document(
content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors.",
),
Document(
content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.",
),
]
document_store.write_documents(documents=documents, policy=DuplicatePolicy.SKIP)
retriever = OracleKeywordRetriever(document_store=document_store)
rag_pipeline = Pipeline()
rag_pipeline.add_component(name="retriever", instance=retriever)
rag_pipeline.add_component(
instance=ChatPromptBuilder(template=prompt_template, required_variables="*"),
name="prompt_builder",
)
rag_pipeline.add_component(instance=OpenAIChatGenerator(), name="llm")
rag_pipeline.connect("retriever", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder.prompt", "llm.messages")
question = "How many languages are there?"
result = rag_pipeline.run(
{
"retriever": {"query": question},
"prompt_builder": {"question": question},
},
)
print(result["llm"]["replies"][0].text)