Version: 2.25

Oracle AI Vector Search

haystack_integrations.components.retrievers.oracle.embedding_retriever

OracleEmbeddingRetriever

Retrieves documents from an OracleDocumentStore using vector similarity.

Use inside a Haystack pipeline after a text embedder::

pipeline.add_component("embedder", SentenceTransformersTextEmbedder())
pipeline.add_component("retriever", OracleEmbeddingRetriever(
    document_store=store, top_k=5
))
pipeline.connect("embedder.embedding", "retriever.query_embedding")

run

python

run(
    query_embedding: list[float],
    filters: dict[str, Any] | None = None,
    top_k: int | None = None,
) -> dict[str, list[Document]]

Retrieve documents by vector similarity.

Args: query_embedding: Dense float vector from an embedder component. filters: Runtime filters, merged with constructor filters according to filter_policy. top_k: Override the constructor top_k for this call.

Returns: {"documents": [Document, ...]}

run_async

python

run_async(
    query_embedding: list[float],
    filters: dict[str, Any] | None = None,
    top_k: int | None = None,
) -> dict[str, list[Document]]

Async variant of :meth:run.

to_dict

python

to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

dict[str, Any] – Dictionary with serialized data.

from_dict

python

from_dict(data: dict[str, Any]) -> OracleEmbeddingRetriever

Deserializes the component from a dictionary.

Parameters:

data (dict[str, Any]) – Dictionary to deserialize from.

Returns:

OracleEmbeddingRetriever – Deserialized component.

haystack_integrations.document_stores.oracle.document_store

OracleConnectionConfig

Connection parameters for Oracle Database.

Supports both thin (direct TCP) and thick (wallet / ADB-S) modes. Thin mode requires no Oracle Instant Client; thick mode is activated automatically when wallet_location is provided.

to_dict

python

to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

dict[str, Any] – Dictionary with serialized data.

from_dict

python

from_dict(data: dict[str, Any]) -> OracleConnectionConfig

Deserializes the component from a dictionary.

Parameters:

data (dict[str, Any]) – Dictionary to deserialize from.

Returns:

OracleConnectionConfig – Deserialized component.

OracleDocumentStore

Haystack DocumentStore backed by Oracle AI Vector Search.

Requires Oracle Database 23ai or later (for VECTOR data type and IF NOT EXISTS DDL support).

Usage::

from haystack.utils import Secret
from haystack_integrations.document_stores.oracle import (
    OracleDocumentStore, OracleConnectionConfig,
)

store = OracleDocumentStore(
    connection_config=OracleConnectionConfig(
        user=Secret.from_env_var("ORACLE_USER"),
        password=Secret.from_env_var("ORACLE_PASSWORD"),
        dsn=Secret.from_env_var("ORACLE_DSN"),
    ),
    embedding_dim=1536,
)

init

python

__init__(
    *,
    connection_config: OracleConnectionConfig,
    table_name: str = "haystack_documents",
    embedding_dim: int,
    distance_metric: Literal["COSINE", "EUCLIDEAN", "DOT"] = "COSINE",
    create_table_if_not_exists: bool = True,
    create_index: bool = False,
    hnsw_neighbors: int = 32,
    hnsw_ef_construction: int = 200,
    hnsw_accuracy: int = 95,
    hnsw_parallel: int = 4
) -> None

Initialise the document store and optionally create the backing table and indexes.

Parameters:

connection_config (OracleConnectionConfig) – Oracle connection settings (user, password, DSN, optional wallet).
table_name (str) – Name of the Oracle table used to store documents. Must be a valid Oracle identifier (letters, digits, _, $, #; max 128 chars; cannot start with a digit).
embedding_dim (int) – Dimensionality of the embedding vectors. Must match the model producing them.
distance_metric (Literal['COSINE', 'EUCLIDEAN', 'DOT']) – Vector distance function used for similarity search. One of "COSINE", "EUCLIDEAN", or "DOT".
create_table_if_not_exists (bool) – When True (default), creates the table and the DBMS_SEARCH keyword index on first use if they do not already exist. Set to False when connecting to a pre-existing table.
create_index (bool) – When True, creates an HNSW vector index on initialisation. Equivalent to calling :meth:create_hnsw_index manually. Defaults to False.
hnsw_neighbors (int) – Number of neighbours in the HNSW graph. Higher values improve recall at the cost of index size and build time. Defaults to 32.
hnsw_ef_construction (int) – Size of the dynamic candidate list during HNSW index construction. Higher values improve recall at the cost of build time. Defaults to 200.
hnsw_accuracy (int) – Target recall accuracy percentage for the HNSW index (0-100). Defaults to 95.
hnsw_parallel (int) – Degree of parallelism used when building the HNSW index. Defaults to 4.

Raises:

ValueError – If table_name is not a valid Oracle identifier or embedding_dim is not a positive integer.

create_keyword_index

python

create_keyword_index() -> None

Create the DBMS_SEARCH keyword index on this table.

Safe to call multiple times — silently skips if the index already exists. Required for keyword retrieval. Called automatically when create_table_if_not_exists=True, but must be called explicitly when connecting to a pre-existing table.

create_hnsw_index

python

create_hnsw_index() -> None

Create an HNSW vector index on the embedding column.

Safe to call multiple times — uses IF NOT EXISTS.

create_hnsw_index_async

python

create_hnsw_index_async() -> None

Asynchronously creates an HNSW vector index on the embedding column.

Safe to call multiple times — uses IF NOT EXISTS.

write_documents

python

write_documents(
    documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE
) -> int

Writes documents to the document store.

Parameters:

documents (list[Document]) – A list of Documents to write to the document store.
policy (DuplicatePolicy) – The duplicate policy to use when writing documents.

Returns:

int – The number of documents written to the document store.

Raises:

DuplicateDocumentError – If a document with the same id already exists in the document store and the policy is set to DuplicatePolicy.FAIL or DuplicatePolicy.NONE.

write_documents_async

python

write_documents_async(
    documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE
) -> int

Asynchronously writes documents to the document store.

Parameters:

documents (list[Document]) – A list of Documents to write to the document store.
policy (DuplicatePolicy) – The duplicate policy to use when writing documents.

Returns:

int – The number of documents written to the document store.

Raises:

DuplicateDocumentError – If a document with the same id already exists in the document store and the policy is set to DuplicatePolicy.FAIL or DuplicatePolicy.NONE.

filter_documents

python

filter_documents(filters: dict[str, Any] | None = None) -> list[Document]

Returns the documents that match the filters provided.

For a detailed specification of the filters, refer to the documentation

Parameters:

filters (dict[str, Any] | None) – The filters to apply to the document list.

Returns:

list[Document] – A list of Documents that match the given filters.

filter_documents_async

python

filter_documents_async(filters: dict[str, Any] | None = None) -> list[Document]

Asynchronously returns the documents that match the filters provided.

For a detailed specification of the filters, refer to the documentation

Parameters:

filters (dict[str, Any] | None) – The filters to apply to the document list.

Returns:

list[Document] – A list of Documents that match the given filters.

delete_documents

python

delete_documents(document_ids: list[str]) -> None

Deletes documents that match the provided document_ids from the document store.

Parameters:

document_ids (list[str]) – the document ids to delete

delete_documents_async

python

delete_documents_async(document_ids: list[str]) -> None

Asynchronously deletes documents that match the provided document_ids from the document store.

Parameters:

document_ids (list[str]) – the document ids to delete

count_documents

python

count_documents() -> int

Returns how many documents are present in the document store.

Returns:

int – Number of documents in the document store.

count_documents_async

python

count_documents_async() -> int

Asynchronously returns how many documents are present in the document store.

Returns:

int – Number of documents in the document store.

delete_table

python

delete_table() -> None

Permanently drops the document store table and its associated DBMS_SEARCH keyword index.

Uses DROP TABLE ... PURGE which bypasses the Oracle recycle bin — the operation is irreversible. The keyword index is dropped after the table; if either operation fails a :class:DocumentStoreError is raised.

Raises:

DocumentStoreError – If the table or keyword index cannot be dropped.

delete_table_async

python

delete_table_async() -> None

Asynchronously permanently drops the document store table and its DBMS_SEARCH keyword index.

Uses DROP TABLE ... PURGE which bypasses the Oracle recycle bin — the operation is irreversible.

Raises:

DocumentStoreError – If the table or keyword index cannot be dropped.

delete_all_documents

python

delete_all_documents() -> None

Removes all documents from the table using TRUNCATE.

TRUNCATE is non-recoverable — it cannot be rolled back and bypasses row-level triggers. The table structure and indexes are preserved.

delete_all_documents_async

python

delete_all_documents_async() -> None

Asynchronously removes all documents from the table using TRUNCATE.

TRUNCATE is non-recoverable — it cannot be rolled back and bypasses row-level triggers. The table structure and indexes are preserved.

count_documents_by_filter

python

count_documents_by_filter(filters: dict[str, Any]) -> int

Returns the number of documents that match the provided filters.

Parameters:

filters (dict[str, Any]) – Haystack filter dict. An empty dict matches all documents. See the metadata filtering docs <https://docs.haystack.deepset.ai/docs/metadata-filtering>_.

Returns:

int – Count of matching documents.

count_documents_by_filter_async

python

count_documents_by_filter_async(filters: dict[str, Any]) -> int

Asynchronously returns the number of documents that match the provided filters.

Parameters:

filters (dict[str, Any]) – Haystack filter dict. An empty dict matches all documents. See the metadata filtering docs <https://docs.haystack.deepset.ai/docs/metadata-filtering>_.

Returns:

int – Count of matching documents.

delete_by_filter

python

delete_by_filter(filters: dict[str, Any]) -> int

Deletes all documents that match the provided filters.

Parameters:

filters (dict[str, Any]) – Haystack filter dict. An empty dict is treated as a no-op and returns 0 without touching the table. See the metadata filtering docs <https://docs.haystack.deepset.ai/docs/metadata-filtering>_.

Returns:

int – Number of deleted documents.

delete_by_filter_async

python

delete_by_filter_async(filters: dict[str, Any]) -> int

Asynchronously deletes all documents that match the provided filters.

Parameters:

filters (dict[str, Any]) – Haystack filter dict. An empty dict is treated as a no-op and returns 0 without touching the table. See the metadata filtering docs <https://docs.haystack.deepset.ai/docs/metadata-filtering>_.

Returns:

int – Number of deleted documents.

update_by_filter

python

update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int

Merges meta into the metadata of all documents that match the provided filters.

Uses Oracle's JSON_MERGEPATCH — existing keys are updated, new keys are added, and keys set to null in meta are removed.

Parameters:

filters (dict[str, Any]) – Haystack filter dict that selects which documents to update. See the metadata filtering docs <https://docs.haystack.deepset.ai/docs/metadata-filtering>_.
meta (dict[str, Any]) – Metadata patch to apply. Must be a non-empty dictionary.

Returns:

int – Number of updated documents.

Raises:

ValueError – If meta is empty.

update_by_filter_async

python

update_by_filter_async(filters: dict[str, Any], meta: dict[str, Any]) -> int

Asynchronously merges meta into the metadata of all documents matching the provided filters.

Uses Oracle's JSON_MERGEPATCH — existing keys are updated, new keys are added, and keys set to null in meta are removed.

Parameters:

filters (dict[str, Any]) – Haystack filter dict that selects which documents to update. See the metadata filtering docs <https://docs.haystack.deepset.ai/docs/metadata-filtering>_.
meta (dict[str, Any]) – Metadata patch to apply. Must be a non-empty dictionary.

Returns:

int – Number of updated documents.

Raises:

ValueError – If meta is empty.

count_unique_metadata_by_filter

python

count_unique_metadata_by_filter(
    filters: dict[str, Any], metadata_fields: list[str]
) -> dict[str, int]

Returns the number of distinct values for each requested metadata field among matching documents.

Parameters:

filters (dict[str, Any]) – Haystack filter dict that scopes the document set. See the metadata filtering docs <https://docs.haystack.deepset.ai/docs/metadata-filtering>_.
metadata_fields (list[str]) – List of metadata field names to count distinct values for. Fields may be prefixed with "meta." (e.g. "meta.lang" or "lang"). Must be a non-empty list.

Returns:

dict[str, int] – Dict mapping each field name to its distinct-value count.

Raises:

ValueError – If metadata_fields is empty.
ValueError – If any field name contains characters outside [A-Za-z0-9_.].

count_unique_metadata_by_filter_async

python

count_unique_metadata_by_filter_async(
    filters: dict[str, Any], metadata_fields: list[str]
) -> dict[str, int]

Asynchronously returns the number of distinct values for each metadata field among matching documents.

Parameters:

filters (dict[str, Any]) – Haystack filter dict that scopes the document set. See the metadata filtering docs <https://docs.haystack.deepset.ai/docs/metadata-filtering>_.
metadata_fields (list[str]) – List of metadata field names to count distinct values for. Fields may be prefixed with "meta." (e.g. "meta.lang" or "lang"). Must be a non-empty list.

Returns:

dict[str, int] – Dict mapping each field name to its distinct-value count.

Raises:

ValueError – If metadata_fields is empty.
ValueError – If any field name contains characters outside [A-Za-z0-9_.].

get_metadata_fields_info

python

get_metadata_fields_info() -> dict[str, dict[str, str]]

Return a mapping of metadata field names to their detected types.

Uses Oracle's JSON_DATAGUIDE aggregate to introspect the stored metadata column. Returns an empty dict when the table has no documents.

Returns:

dict[str, dict[str, str]] – Dict of the form {"field_name": {"type": "<type>"}, ...} where <type> is one of "text", "number", or "boolean".

get_metadata_field_min_max

python

get_metadata_field_min_max(metadata_field: str) -> dict[str, Any]

Return the minimum and maximum values of a metadata field across all documents.

First attempts numeric comparison via TO_NUMBER so that MAX(1, 5, 10) returns 10 rather than "5" (which would win under lexicographic ordering). Falls back to plain string comparison when the field contains non-numeric values. Numeric strings are automatically converted to int or float in the result.

Parameters:

metadata_field (str) – Metadata field name. May be prefixed with "meta." (e.g. "meta.year" or "year").

Returns:

dict[str, Any] – {"min": <value>, "max": <value>}. Both values are None when the table is empty or the field does not exist.

Raises:

ValueError – If metadata_field contains characters outside [A-Za-z0-9_.].

get_metadata_field_unique_values

python

get_metadata_field_unique_values(
    metadata_field: str,
    search_term: str | None = None,
    from_: int = 0,
    size: int | None = None,
) -> tuple[list[str], int]

Return a paginated list of distinct values for a metadata field, plus the total distinct count.

Parameters:

metadata_field (str) – Metadata field name. May be prefixed with "meta." (e.g. "meta.lang" or "lang").
search_term (str | None) – Optional substring filter applied to both the document text and the field value.
from_ (int) – Zero-based offset for pagination. Defaults to 0.
size (int | None) – Maximum number of values to return. When None all values from from_ onward are returned.

Returns:

tuple[list[str], int] – A tuple (values, total) where values is the paginated list of distinct field values as strings and total is the overall distinct count (before pagination).

Raises:

ValueError – If metadata_field contains characters outside [A-Za-z0-9_.].

get_metadata_fields_info_async

python

get_metadata_fields_info_async() -> dict[str, dict[str, str]]

Asynchronously returns a mapping of metadata field names to their detected types.

Uses Oracle's JSON_DATAGUIDE aggregate to introspect the stored metadata column. Returns an empty dict when the table has no documents.

Returns:

dict[str, dict[str, str]] – Dict of the form {"field_name": {"type": "<type>"}, ...} where <type> is one of "text", "number", or "boolean".

get_metadata_field_min_max_async

python

get_metadata_field_min_max_async(metadata_field: str) -> dict[str, Any]

Asynchronously returns the minimum and maximum values of a metadata field across all documents.

First attempts numeric comparison via TO_NUMBER, falling back to string comparison for non-numeric fields. Numeric strings are automatically converted to int or float.

Parameters:

metadata_field (str) – Metadata field name. May be prefixed with "meta." (e.g. "meta.year" or "year").

Returns:

dict[str, Any] – {"min": <value>, "max": <value>}. Both values are None when the table is empty or the field does not exist.

Raises:

ValueError – If metadata_field contains characters outside [A-Za-z0-9_.].

get_metadata_field_unique_values_async

python

get_metadata_field_unique_values_async(
    metadata_field: str,
    search_term: str | None = None,
    from_: int = 0,
    size: int | None = None,
) -> tuple[list[str], int]

Asynchronously returns a paginated list of distinct values for a metadata field, plus the total count.

Parameters:

metadata_field (str) – Metadata field name. May be prefixed with "meta." (e.g. "meta.lang" or "lang").
search_term (str | None) – Optional substring filter applied to both the document text and the field value.
from_ (int) – Zero-based offset for pagination. Defaults to 0.
size (int | None) – Maximum number of values to return. When None all values from from_ onward are returned.

Returns:

tuple[list[str], int] – A tuple (values, total) where values is the paginated list of distinct field values as strings and total is the overall distinct count (before pagination).

Raises:

ValueError – If metadata_field contains characters outside [A-Za-z0-9_.].

to_dict

python

to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

dict[str, Any] – Dictionary with serialized data.

from_dict

python

from_dict(data: dict[str, Any]) -> OracleDocumentStore

Deserializes the component from a dictionary.

Parameters:

data (dict[str, Any]) – Dictionary to deserialize from.

Returns:

OracleDocumentStore – Deserialized component.

haystack_integrations.components.retrievers.oracle.embedding_retriever​

OracleEmbeddingRetriever​

run​

run_async​

to_dict​

from_dict​

haystack_integrations.document_stores.oracle.document_store​

OracleConnectionConfig​

to_dict​

from_dict​

OracleDocumentStore​

init​

create_keyword_index​

create_hnsw_index​

create_hnsw_index_async​

write_documents​

write_documents_async​

filter_documents​

filter_documents_async​

delete_documents​

delete_documents_async​

count_documents​

count_documents_async​

delete_table​

delete_table_async​

delete_all_documents​

delete_all_documents_async​

count_documents_by_filter​

count_documents_by_filter_async​

delete_by_filter​

delete_by_filter_async​

update_by_filter​

update_by_filter_async​

count_unique_metadata_by_filter​

count_unique_metadata_by_filter_async​

get_metadata_fields_info​

get_metadata_field_min_max​

get_metadata_field_unique_values​

get_metadata_fields_info_async​

get_metadata_field_min_max_async​

get_metadata_field_unique_values_async​

to_dict​

from_dict​

haystack_integrations.components.retrievers.oracle.embedding_retriever

OracleEmbeddingRetriever

run

run_async

to_dict

from_dict

haystack_integrations.document_stores.oracle.document_store

OracleConnectionConfig

to_dict

from_dict

OracleDocumentStore

init

create_keyword_index

create_hnsw_index

create_hnsw_index_async

write_documents

write_documents_async

filter_documents

filter_documents_async

delete_documents

delete_documents_async

count_documents

count_documents_async

delete_table

delete_table_async

delete_all_documents

delete_all_documents_async

count_documents_by_filter

count_documents_by_filter_async

delete_by_filter

delete_by_filter_async

update_by_filter

update_by_filter_async

count_unique_metadata_by_filter

count_unique_metadata_by_filter_async

get_metadata_fields_info

get_metadata_field_min_max

get_metadata_field_unique_values

get_metadata_fields_info_async

get_metadata_field_min_max_async

get_metadata_field_unique_values_async

to_dict

from_dict