Overview: Repo-based Standby Cluster
I am designing a cross-cluster PostgreSQL architecture using the Crunchy Postgres for Kubernetes (PGO) operator. I am currently evaluating the resiliency of our deployment in a multi-cluster scenario.
Architecture:
Cluster 1 (OpenShift): Hosts the Primary Postgres instance and the local S3-compatible Object Storage (ODF/NooBaa) used for WAL archiving and backups.
Cluster 2 (OpenShift): Hosts a Standby instance and a secondary pgBackRest instance configured to pull from the S3 bucket in Cluster 1.
Scenario:
We are evaluating the blast radius of a total control-plane and storage-plane failure in Cluster 1.
Questions:
1.Storage Dependency: Given that our pgBackRest repository is pinned to the S3 bucket inside Cluster 1, does this architecture inherently violate our Disaster Recovery (DR) RTO? In the event of a total Cluster 1 failure, the Standby in Cluster 2 loses both its replication stream and access to its WAL/backup repository. Is there a "native" PGO/pgBackRest configuration to handle S3 repository failover, or is external, independent object storage the only production-grade path forward?
2.Replica Topology: For a production-grade deployment targeting high availability and resilience against node-level failures, what is the recommended instance count per cluster? We are debating between 1 Primary + 1 Replica versus 1 Primary + 2 Replicas. Does PGO specifically benefit from the quorum provided by the 3-node configuration in terms of preventing split-brain scenarios during network partitions in an OpenShift environment?
Context:
Our environment is built on OpenShift with ODF/NooBaa, and we are aiming for a high-security, sovereign infrastructure design. Any guidance on achieving storage-level redundancy during a failover would be greatly appreciated.
Regards,
Zoheb Shaik
Overview: Repo-based Standby Cluster
I am designing a cross-cluster PostgreSQL architecture using the Crunchy Postgres for Kubernetes (PGO) operator. I am currently evaluating the resiliency of our deployment in a multi-cluster scenario.
Architecture:
Cluster 1 (OpenShift): Hosts the Primary Postgres instance and the local S3-compatible Object Storage (ODF/NooBaa) used for WAL archiving and backups.
Cluster 2 (OpenShift): Hosts a Standby instance and a secondary pgBackRest instance configured to pull from the S3 bucket in Cluster 1.
Scenario:
We are evaluating the blast radius of a total control-plane and storage-plane failure in Cluster 1.
Questions:
1.Storage Dependency: Given that our
pgBackRestrepository is pinned to the S3 bucket inside Cluster 1, does this architecture inherently violate our Disaster Recovery (DR) RTO? In the event of a total Cluster 1 failure, the Standby in Cluster 2 loses both its replication stream and access to its WAL/backup repository. Is there a "native" PGO/pgBackRest configuration to handle S3 repository failover, or is external, independent object storage the only production-grade path forward?2.Replica Topology: For a production-grade deployment targeting high availability and resilience against node-level failures, what is the recommended instance count per cluster? We are debating between
1 Primary + 1 Replicaversus1 Primary + 2 Replicas. Does PGO specifically benefit from the quorum provided by the 3-node configuration in terms of preventing split-brain scenarios during network partitions in an OpenShift environment?Context:
Our environment is built on OpenShift with ODF/NooBaa, and we are aiming for a high-security, sovereign infrastructure design. Any guidance on achieving storage-level redundancy during a failover would be greatly appreciated.
Regards,
Zoheb Shaik