Oracle AI Vector Search
haystack_integrations.components.retrievers.oracle.embedding_retriever
OracleEmbeddingRetriever
Retrieves documents from an OracleDocumentStore using vector similarity.
Use inside a Haystack pipeline after a text embedder::
pipeline.add_component("embedder", SentenceTransformersTextEmbedder())
pipeline.add_component("retriever", OracleEmbeddingRetriever(
document_store=store, top_k=5
))
pipeline.connect("embedder.embedding", "retriever.query_embedding")
run
run(
query_embedding: list[float],
filters: dict[str, Any] | None = None,
top_k: int | None = None,
) -> dict[str, list[Document]]
Retrieve documents by vector similarity.
Args: query_embedding: Dense float vector from an embedder component. filters: Runtime filters, merged with constructor filters according to filter_policy. top_k: Override the constructor top_k for this call.
Returns:
{"documents": [Document, ...]}
run_async
run_async(
query_embedding: list[float],
filters: dict[str, Any] | None = None,
top_k: int | None = None,
) -> dict[str, list[Document]]
Async variant of :meth:run.
to_dict
Serializes the component to a dictionary.
Returns:
dict[str, Any]– Dictionary with serialized data.
from_dict
Deserializes the component from a dictionary.
Parameters:
- data (
dict[str, Any]) – Dictionary to deserialize from.
Returns:
OracleEmbeddingRetriever– Deserialized component.
haystack_integrations.document_stores.oracle.document_store
OracleConnectionConfig
Connection parameters for Oracle Database.
Supports both thin (direct TCP) and thick (wallet / ADB-S) modes. Thin mode requires no Oracle Instant Client; thick mode is activated automatically when wallet_location is provided.
to_dict
Serializes the component to a dictionary.
Returns:
dict[str, Any]– Dictionary with serialized data.
from_dict
Deserializes the component from a dictionary.
Parameters:
- data (
dict[str, Any]) – Dictionary to deserialize from.
Returns:
OracleConnectionConfig– Deserialized component.
OracleDocumentStore
Haystack DocumentStore backed by Oracle AI Vector Search.
Requires Oracle Database 23ai or later (for VECTOR data type and IF NOT EXISTS DDL support).
Usage::
from haystack.utils import Secret
from haystack_integrations.document_stores.oracle import (
OracleDocumentStore, OracleConnectionConfig,
)
store = OracleDocumentStore(
connection_config=OracleConnectionConfig(
user=Secret.from_env_var("ORACLE_USER"),
password=Secret.from_env_var("ORACLE_PASSWORD"),
dsn=Secret.from_env_var("ORACLE_DSN"),
),
embedding_dim=1536,
)
init
__init__(
*,
connection_config: OracleConnectionConfig,
table_name: str = "haystack_documents",
embedding_dim: int,
distance_metric: Literal["COSINE", "EUCLIDEAN", "DOT"] = "COSINE",
create_table_if_not_exists: bool = True,
create_index: bool = False,
hnsw_neighbors: int = 32,
hnsw_ef_construction: int = 200,
hnsw_accuracy: int = 95,
hnsw_parallel: int = 4
) -> None
Initialise the document store and optionally create the backing table and indexes.
Parameters:
- connection_config (
OracleConnectionConfig) – Oracle connection settings (user, password, DSN, optional wallet). - table_name (
str) – Name of the Oracle table used to store documents. Must be a valid Oracle identifier (letters, digits,_,$,#; max 128 chars; cannot start with a digit). - embedding_dim (
int) – Dimensionality of the embedding vectors. Must match the model producing them. - distance_metric (
Literal['COSINE', 'EUCLIDEAN', 'DOT']) – Vector distance function used for similarity search. One of"COSINE","EUCLIDEAN", or"DOT". - create_table_if_not_exists (
bool) – WhenTrue(default), creates the table and the DBMS_SEARCH keyword index on first use if they do not already exist. Set toFalsewhen connecting to a pre-existing table. - create_index (
bool) – WhenTrue, creates an HNSW vector index on initialisation. Equivalent to calling :meth:create_hnsw_indexmanually. Defaults toFalse. - hnsw_neighbors (
int) – Number of neighbours in the HNSW graph. Higher values improve recall at the cost of index size and build time. Defaults to32. - hnsw_ef_construction (
int) – Size of the dynamic candidate list during HNSW index construction. Higher values improve recall at the cost of build time. Defaults to200. - hnsw_accuracy (
int) – Target recall accuracy percentage for the HNSW index (0-100). Defaults to95. - hnsw_parallel (
int) – Degree of parallelism used when building the HNSW index. Defaults to4.
Raises:
ValueError– Iftable_nameis not a valid Oracle identifier orembedding_dimis not a positive integer.
create_keyword_index
Create the DBMS_SEARCH keyword index on this table.
Safe to call multiple times — silently skips if the index already exists.
Required for keyword retrieval. Called automatically when
create_table_if_not_exists=True, but must be called explicitly
when connecting to a pre-existing table.
create_hnsw_index
Create an HNSW vector index on the embedding column.
Safe to call multiple times — uses IF NOT EXISTS.
create_hnsw_index_async
Asynchronously creates an HNSW vector index on the embedding column.
Safe to call multiple times — uses IF NOT EXISTS.
write_documents
write_documents(
documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE
) -> int
Writes documents to the document store.
Parameters:
- documents (
list[Document]) – A list of Documents to write to the document store. - policy (
DuplicatePolicy) – The duplicate policy to use when writing documents.
Returns:
int– The number of documents written to the document store.
Raises:
DuplicateDocumentError– If a document with the same id already exists in the document store and the policy is set toDuplicatePolicy.FAILorDuplicatePolicy.NONE.
write_documents_async
write_documents_async(
documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE
) -> int
Asynchronously writes documents to the document store.
Parameters:
- documents (
list[Document]) – A list of Documents to write to the document store. - policy (
DuplicatePolicy) – The duplicate policy to use when writing documents.
Returns:
int– The number of documents written to the document store.
Raises:
DuplicateDocumentError– If a document with the same id already exists in the document store and the policy is set toDuplicatePolicy.FAILorDuplicatePolicy.NONE.
filter_documents
Returns the documents that match the filters provided.
For a detailed specification of the filters, refer to the documentation
Parameters:
- filters (
dict[str, Any] | None) – The filters to apply to the document list.
Returns:
list[Document]– A list of Documents that match the given filters.
filter_documents_async
Asynchronously returns the documents that match the filters provided.
For a detailed specification of the filters, refer to the documentation
Parameters:
- filters (
dict[str, Any] | None) – The filters to apply to the document list.
Returns:
list[Document]– A list of Documents that match the given filters.
delete_documents
Deletes documents that match the provided document_ids from the document store.
Parameters:
- document_ids (
list[str]) – the document ids to delete
delete_documents_async
Asynchronously deletes documents that match the provided document_ids from the document store.
Parameters:
- document_ids (
list[str]) – the document ids to delete
count_documents
Returns how many documents are present in the document store.
Returns:
int– Number of documents in the document store.
count_documents_async
Asynchronously returns how many documents are present in the document store.
Returns:
int– Number of documents in the document store.
delete_table
Permanently drops the document store table and its associated DBMS_SEARCH keyword index.
Uses DROP TABLE ... PURGE which bypasses the Oracle recycle bin — the operation is
irreversible. The keyword index is dropped after the table; if either operation fails a
:class:DocumentStoreError is raised.
Raises:
DocumentStoreError– If the table or keyword index cannot be dropped.
delete_table_async
Asynchronously permanently drops the document store table and its DBMS_SEARCH keyword index.
Uses DROP TABLE ... PURGE which bypasses the Oracle recycle bin — the operation is
irreversible.
Raises:
DocumentStoreError– If the table or keyword index cannot be dropped.
delete_all_documents
Removes all documents from the table using TRUNCATE.
TRUNCATE is non-recoverable — it cannot be rolled back and bypasses row-level triggers.
The table structure and indexes are preserved.
delete_all_documents_async
Asynchronously removes all documents from the table using TRUNCATE.
TRUNCATE is non-recoverable — it cannot be rolled back and bypasses row-level triggers.
The table structure and indexes are preserved.
count_documents_by_filter
Returns the number of documents that match the provided filters.
Parameters:
- filters (
dict[str, Any]) – Haystack filter dict. An empty dict matches all documents. See themetadata filtering docs <https://docs.haystack.deepset.ai/docs/metadata-filtering>_.
Returns:
int– Count of matching documents.
count_documents_by_filter_async
Asynchronously returns the number of documents that match the provided filters.
Parameters:
- filters (
dict[str, Any]) – Haystack filter dict. An empty dict matches all documents. See themetadata filtering docs <https://docs.haystack.deepset.ai/docs/metadata-filtering>_.
Returns:
int– Count of matching documents.
delete_by_filter
Deletes all documents that match the provided filters.
Parameters:
- filters (
dict[str, Any]) – Haystack filter dict. An empty dict is treated as a no-op and returns0without touching the table. See themetadata filtering docs <https://docs.haystack.deepset.ai/docs/metadata-filtering>_.
Returns:
int– Number of deleted documents.
delete_by_filter_async
Asynchronously deletes all documents that match the provided filters.
Parameters:
- filters (
dict[str, Any]) – Haystack filter dict. An empty dict is treated as a no-op and returns0without touching the table. See themetadata filtering docs <https://docs.haystack.deepset.ai/docs/metadata-filtering>_.
Returns:
int– Number of deleted documents.
update_by_filter
Merges meta into the metadata of all documents that match the provided filters.
Uses Oracle's JSON_MERGEPATCH — existing keys are updated, new keys are added,
and keys set to null in meta are removed.
Parameters:
- filters (
dict[str, Any]) – Haystack filter dict that selects which documents to update. See themetadata filtering docs <https://docs.haystack.deepset.ai/docs/metadata-filtering>_. - meta (
dict[str, Any]) – Metadata patch to apply. Must be a non-empty dictionary.
Returns:
int– Number of updated documents.
Raises:
ValueError– Ifmetais empty.
update_by_filter_async
Asynchronously merges meta into the metadata of all documents matching the provided filters.
Uses Oracle's JSON_MERGEPATCH — existing keys are updated, new keys are added,
and keys set to null in meta are removed.
Parameters:
- filters (
dict[str, Any]) – Haystack filter dict that selects which documents to update. See themetadata filtering docs <https://docs.haystack.deepset.ai/docs/metadata-filtering>_. - meta (
dict[str, Any]) – Metadata patch to apply. Must be a non-empty dictionary.
Returns:
int– Number of updated documents.
Raises:
ValueError– Ifmetais empty.
count_unique_metadata_by_filter
count_unique_metadata_by_filter(
filters: dict[str, Any], metadata_fields: list[str]
) -> dict[str, int]
Returns the number of distinct values for each requested metadata field among matching documents.
Parameters:
- filters (
dict[str, Any]) – Haystack filter dict that scopes the document set. See themetadata filtering docs <https://docs.haystack.deepset.ai/docs/metadata-filtering>_. - metadata_fields (
list[str]) – List of metadata field names to count distinct values for. Fields may be prefixed with"meta."(e.g."meta.lang"or"lang"). Must be a non-empty list.
Returns:
dict[str, int]– Dict mapping each field name to its distinct-value count.
Raises:
ValueError– Ifmetadata_fieldsis empty.ValueError– If any field name contains characters outside[A-Za-z0-9_.].
count_unique_metadata_by_filter_async
count_unique_metadata_by_filter_async(
filters: dict[str, Any], metadata_fields: list[str]
) -> dict[str, int]
Asynchronously returns the number of distinct values for each metadata field among matching documents.
Parameters:
- filters (
dict[str, Any]) – Haystack filter dict that scopes the document set. See themetadata filtering docs <https://docs.haystack.deepset.ai/docs/metadata-filtering>_. - metadata_fields (
list[str]) – List of metadata field names to count distinct values for. Fields may be prefixed with"meta."(e.g."meta.lang"or"lang"). Must be a non-empty list.
Returns:
dict[str, int]– Dict mapping each field name to its distinct-value count.
Raises:
ValueError– Ifmetadata_fieldsis empty.ValueError– If any field name contains characters outside[A-Za-z0-9_.].
get_metadata_fields_info
Return a mapping of metadata field names to their detected types.
Uses Oracle's JSON_DATAGUIDE aggregate to introspect the stored metadata column.
Returns an empty dict when the table has no documents.
Returns:
dict[str, dict[str, str]]– Dict of the form{"field_name": {"type": "<type>"}, ...}where<type>is one of"text","number", or"boolean".
get_metadata_field_min_max
Return the minimum and maximum values of a metadata field across all documents.
First attempts numeric comparison via TO_NUMBER so that MAX(1, 5, 10) returns 10
rather than "5" (which would win under lexicographic ordering). Falls back to plain string
comparison when the field contains non-numeric values. Numeric strings are automatically
converted to int or float in the result.
Parameters:
- metadata_field (
str) – Metadata field name. May be prefixed with"meta."(e.g."meta.year"or"year").
Returns:
dict[str, Any]–{"min": <value>, "max": <value>}. Both values areNonewhen the table is empty or the field does not exist.
Raises:
ValueError– Ifmetadata_fieldcontains characters outside[A-Za-z0-9_.].
get_metadata_field_unique_values
get_metadata_field_unique_values(
metadata_field: str,
search_term: str | None = None,
from_: int = 0,
size: int | None = None,
) -> tuple[list[str], int]
Return a paginated list of distinct values for a metadata field, plus the total distinct count.
Parameters:
- metadata_field (
str) – Metadata field name. May be prefixed with"meta."(e.g."meta.lang"or"lang"). - search_term (
str | None) – Optional substring filter applied to both the document text and the field value. - from_ (
int) – Zero-based offset for pagination. Defaults to0. - size (
int | None) – Maximum number of values to return. WhenNoneall values fromfrom_onward are returned.
Returns:
tuple[list[str], int]– A tuple(values, total)wherevaluesis the paginated list of distinct field values as strings andtotalis the overall distinct count (before pagination).
Raises:
ValueError– Ifmetadata_fieldcontains characters outside[A-Za-z0-9_.].
get_metadata_fields_info_async
Asynchronously returns a mapping of metadata field names to their detected types.
Uses Oracle's JSON_DATAGUIDE aggregate to introspect the stored metadata column.
Returns an empty dict when the table has no documents.
Returns:
dict[str, dict[str, str]]– Dict of the form{"field_name": {"type": "<type>"}, ...}where<type>is one of"text","number", or"boolean".
get_metadata_field_min_max_async
Asynchronously returns the minimum and maximum values of a metadata field across all documents.
First attempts numeric comparison via TO_NUMBER, falling back to string comparison for
non-numeric fields. Numeric strings are automatically converted to int or float.
Parameters:
- metadata_field (
str) – Metadata field name. May be prefixed with"meta."(e.g."meta.year"or"year").
Returns:
dict[str, Any]–{"min": <value>, "max": <value>}. Both values areNonewhen the table is empty or the field does not exist.
Raises:
ValueError– Ifmetadata_fieldcontains characters outside[A-Za-z0-9_.].
get_metadata_field_unique_values_async
get_metadata_field_unique_values_async(
metadata_field: str,
search_term: str | None = None,
from_: int = 0,
size: int | None = None,
) -> tuple[list[str], int]
Asynchronously returns a paginated list of distinct values for a metadata field, plus the total count.
Parameters:
- metadata_field (
str) – Metadata field name. May be prefixed with"meta."(e.g."meta.lang"or"lang"). - search_term (
str | None) – Optional substring filter applied to both the document text and the field value. - from_ (
int) – Zero-based offset for pagination. Defaults to0. - size (
int | None) – Maximum number of values to return. WhenNoneall values fromfrom_onward are returned.
Returns:
tuple[list[str], int]– A tuple(values, total)wherevaluesis the paginated list of distinct field values as strings andtotalis the overall distinct count (before pagination).
Raises:
ValueError– Ifmetadata_fieldcontains characters outside[A-Za-z0-9_.].
to_dict
Serializes the component to a dictionary.
Returns:
dict[str, Any]– Dictionary with serialized data.
from_dict
Deserializes the component from a dictionary.
Parameters:
- data (
dict[str, Any]) – Dictionary to deserialize from.
Returns:
OracleDocumentStore– Deserialized component.