AlloyDBDocumentStore
| API reference | AlloyDB |
| GitHub link | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/alloydb |
AlloyDB is a fully managed, PostgreSQL-compatible database service on Google Cloud. The AlloyDBDocumentStore uses the pgvector extension to perform vector similarity search.
Connection is handled securely via the AlloyDB Python Connector, which provides TLS encryption and IAM-based authorization without requiring manual SSL certificate management, firewall rules, or IP allowlisting.
The AlloyDBDocumentStore supports embedding retrieval, keyword retrieval, and metadata filtering.
Installation
Install the alloydb-haystack integration:
To set up an AlloyDB cluster and instance, follow the AlloyDB quickstart.
Usage
Authentication
The AlloyDBDocumentStore uses Secrets and reads connection details from environment variables by default:
ALLOYDB_INSTANCE_URI: the AlloyDB instance URI in the formatprojects/PROJECT/locations/REGION/clusters/CLUSTER/instances/INSTANCE.ALLOYDB_USER: the database user. When using IAM database authentication, use the service account email (omitting.gserviceaccount.com) or the full IAM user email.ALLOYDB_PASSWORD: the database password. Not required whenenable_iam_auth=True.
export ALLOYDB_INSTANCE_URI="projects/MY_PROJECT/locations/MY_REGION/clusters/MY_CLUSTER/instances/MY_INSTANCE"
export ALLOYDB_USER="my-db-user"
export ALLOYDB_PASSWORD="my-db-password"
To authenticate with IAM instead of a password, set enable_iam_auth=True and grant the IAM principal the AlloyDB Client role. See the AlloyDB IAM authentication documentation for details.
Initialization
Initialize an AlloyDBDocumentStore and write Documents to it. Connection to AlloyDB is established lazily on first use, and the table that stores Haystack Documents is created automatically if it doesn't exist:
from haystack import Document
from haystack_integrations.document_stores.alloydb import AlloyDBDocumentStore
document_store = AlloyDBDocumentStore(
db="my-database",
embedding_dimension=768,
vector_function="cosine_similarity",
recreate_table=True,
)
document_store.write_documents(
[
Document(content="This is first", embedding=[0.1] * 768),
Document(content="This is second", embedding=[0.3] * 768),
],
)
print(document_store.count_documents())
To learn more about the initialization parameters, see our API docs.
To compute embeddings for your Documents, you can use a Document Embedder, such as the SentenceTransformersDocumentEmbedder.
Search Strategy
The AlloyDBDocumentStore supports two search strategies for embedding retrieval:
"exact_nearest_neighbor"(default): provides perfect recall but can be slow on large numbers of documents."hnsw": an approximate nearest neighbor search strategy that trades off some accuracy for speed. Recommended for large numbers of documents.
When using "hnsw", an index is created based on the vector_function you choose, so subsequent queries should keep using the same vector similarity function in order to take advantage of the index. You can tune index creation through hnsw_index_creation_kwargs (see the pgvector documentation).
Metadata Filtering
The AlloyDBDocumentStore fully supports comparison operators (==, !=, >, >=, <, <=, in, not in, like, not like) and the logical operators AND and OR. The like and not like operators are PostgreSQL-specific extensions to the standard Haystack filter syntax and map to the SQL LIKE / NOT LIKE pattern-matching operators.
The NOT logical operator is not supported. Because every comparison operator already has a negated counterpart (==/!=, in/not in, like/not like), any filter expressible with NOT around a single condition can be rewritten by inverting the comparison operator instead. To negate a nested AND/OR group, apply De Morgan's laws — for example, NOT (A AND B) becomes (NOT A) OR (NOT B), where each NOT A / NOT B is expressed via the inverted comparison.
For more details on filter syntax, refer to Metadata Filtering.
Supported Retrievers
AlloyDBEmbeddingRetriever: An embedding-based Retriever that fetches Documents from the Document Store based on a query embedding.AlloyDBKeywordRetriever: A keyword-based Retriever that fetches Documents matching a query using PostgreSQL full-text search.