Version: 2.31

OracleKeywordRetriever

A keyword-based Retriever that fetches documents matching a query from the Oracle Document Store.


Most common position in a pipeline	1. Before a `PromptBuilder` in a RAG pipeline 2. The last component in a keyword search pipeline 3. Before an `ExtractiveReader` in an extractive QA pipeline
Mandatory init variables	`document_store`: An instance of an OracleDocumentStore
Mandatory run variables	`query`: A string
Output variables	`documents`: A list of documents matching the query
API reference	Oracle
GitHub link	https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/oracle
Package name	`oracle-haystack`

Overview

The OracleKeywordRetriever is a keyword-based Retriever compatible with OracleDocumentStore. It uses Oracle's DBMS_SEARCH full-text index — automatically created when the document store is initialized — to search documents by keyword relevance.

This retriever works without embeddings, making it suitable for keyword-only pipelines or as the keyword branch of a hybrid search pipeline.

In addition to query, the retriever accepts top_k (maximum documents to return) and filters to narrow the search space.

Installation

To run Oracle Database 23ai locally with Docker:

shell

docker run -d --name oracle23ai \
  -p 1521:1521 \
  -e ORACLE_PASSWORD=oracle \
  container-registry.oracle.com/database/free:latest

Install the Oracle integration for Haystack:

shell

pip install oracle-haystack

Usage

On its own

This Retriever needs an OracleDocumentStore and indexed documents to run.

python

from haystack.utils import Secret
from haystack_integrations.document_stores.oracle import (
    OracleDocumentStore,
    OracleConnectionConfig,
)
from haystack_integrations.components.retrievers.oracle import OracleKeywordRetriever

document_store = OracleDocumentStore(
    connection_config=OracleConnectionConfig(
        user=Secret.from_env_var("ORACLE_USER"),
        password=Secret.from_env_var("ORACLE_PASSWORD"),
        dsn=Secret.from_env_var("ORACLE_DSN"),
    ),
    embedding_dim=768,
)

retriever = OracleKeywordRetriever(document_store=document_store)
retriever.run(query="my keyword query")

In a RAG pipeline

python

from haystack import Document, Pipeline
from haystack.components.builders import ChatPromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack.document_stores.types import DuplicatePolicy
from haystack.utils import Secret

from haystack_integrations.document_stores.oracle import (
    OracleDocumentStore,
    OracleConnectionConfig,
)
from haystack_integrations.components.retrievers.oracle import OracleKeywordRetriever

prompt_template = [
    ChatMessage.from_user(
        """
    Given these documents, answer the question.\nDocuments:
    {% for doc in documents %}
        {{ doc.content }}
    {% endfor %}

    \nQuestion: {{question}}
    \nAnswer:
        """,
    ),
]

document_store = OracleDocumentStore(
    connection_config=OracleConnectionConfig(
        user=Secret.from_env_var("ORACLE_USER"),
        password=Secret.from_env_var("ORACLE_PASSWORD"),
        dsn=Secret.from_env_var("ORACLE_DSN"),
    ),
    embedding_dim=768,
)

documents = [
    Document(content="There are over 7,000 languages spoken around the world today."),
    Document(
        content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors.",
    ),
    Document(
        content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.",
    ),
]

document_store.write_documents(documents=documents, policy=DuplicatePolicy.SKIP)

retriever = OracleKeywordRetriever(document_store=document_store)

rag_pipeline = Pipeline()
rag_pipeline.add_component(name="retriever", instance=retriever)
rag_pipeline.add_component(
    instance=ChatPromptBuilder(template=prompt_template, required_variables="*"),
    name="prompt_builder",
)
rag_pipeline.add_component(instance=OpenAIChatGenerator(), name="llm")

rag_pipeline.connect("retriever", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder.prompt", "llm.messages")

question = "How many languages are there?"
result = rag_pipeline.run(
    {
        "retriever": {"query": question},
        "prompt_builder": {"question": question},
    },
)
print(result["llm"]["replies"][0].text)

Overview​

Installation​

Usage​

On its own​

In a RAG pipeline​

Overview

Installation

Usage

On its own

In a RAG pipeline