Skip to main content
Version: 2.30

OracleKeywordRetriever

A keyword-based Retriever that fetches documents matching a query from the Oracle Document Store.

Most common position in a pipeline1. Before a PromptBuilder in a RAG pipeline 2. The last component in a keyword search pipeline 3. Before an ExtractiveReader in an extractive QA pipeline
Mandatory init variablesdocument_store: An instance of an OracleDocumentStore
Mandatory run variablesquery: A string
Output variablesdocuments: A list of documents matching the query
API referenceOracle
GitHub linkhttps://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/oracle
Package nameoracle-haystack

Overview

The OracleKeywordRetriever is a keyword-based Retriever compatible with OracleDocumentStore. It uses Oracle's DBMS_SEARCH full-text index — automatically created when the document store is initialized — to search documents by keyword relevance.

This retriever works without embeddings, making it suitable for keyword-only pipelines or as the keyword branch of a hybrid search pipeline.

In addition to query, the retriever accepts top_k (maximum documents to return) and filters to narrow the search space.

Installation

To run Oracle Database 23ai locally with Docker:

shell
docker run -d --name oracle23ai \
-p 1521:1521 \
-e ORACLE_PASSWORD=oracle \
container-registry.oracle.com/database/free:latest

Install the Oracle integration for Haystack:

shell
pip install oracle-haystack

Usage

On its own

This Retriever needs an OracleDocumentStore and indexed documents to run.

python
from haystack.utils import Secret
from haystack_integrations.document_stores.oracle import (
OracleDocumentStore,
OracleConnectionConfig,
)
from haystack_integrations.components.retrievers.oracle import OracleKeywordRetriever

document_store = OracleDocumentStore(
connection_config=OracleConnectionConfig(
user=Secret.from_env_var("ORACLE_USER"),
password=Secret.from_env_var("ORACLE_PASSWORD"),
dsn=Secret.from_env_var("ORACLE_DSN"),
),
embedding_dim=768,
)

retriever = OracleKeywordRetriever(document_store=document_store)
retriever.run(query="my keyword query")

In a RAG pipeline

python
from haystack import Document, Pipeline
from haystack.components.builders import ChatPromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack.document_stores.types import DuplicatePolicy
from haystack.utils import Secret

from haystack_integrations.document_stores.oracle import (
OracleDocumentStore,
OracleConnectionConfig,
)
from haystack_integrations.components.retrievers.oracle import OracleKeywordRetriever

prompt_template = [
ChatMessage.from_user(
"""
Given these documents, answer the question.\nDocuments:
{% for doc in documents %}
{{ doc.content }}
{% endfor %}

\nQuestion: {{question}}
\nAnswer:
""",
),
]

document_store = OracleDocumentStore(
connection_config=OracleConnectionConfig(
user=Secret.from_env_var("ORACLE_USER"),
password=Secret.from_env_var("ORACLE_PASSWORD"),
dsn=Secret.from_env_var("ORACLE_DSN"),
),
embedding_dim=768,
)

documents = [
Document(content="There are over 7,000 languages spoken around the world today."),
Document(
content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors.",
),
Document(
content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.",
),
]

document_store.write_documents(documents=documents, policy=DuplicatePolicy.SKIP)

retriever = OracleKeywordRetriever(document_store=document_store)

rag_pipeline = Pipeline()
rag_pipeline.add_component(name="retriever", instance=retriever)
rag_pipeline.add_component(
instance=ChatPromptBuilder(template=prompt_template, required_variables="*"),
name="prompt_builder",
)
rag_pipeline.add_component(instance=OpenAIChatGenerator(), name="llm")

rag_pipeline.connect("retriever", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder.prompt", "llm.messages")

question = "How many languages are there?"
result = rag_pipeline.run(
{
"retriever": {"query": question},
"prompt_builder": {"question": question},
},
)
print(result["llm"]["replies"][0].text)