Skip to content

PostgreSQL v18 updates; Database restores#720

Open
GUI wants to merge 13 commits into
deploy-updatesfrom
pg-v18
Open

PostgreSQL v18 updates; Database restores#720
GUI wants to merge 13 commits into
deploy-updatesfrom
pg-v18

Conversation

@GUI

@GUI GUI commented Jun 4, 2026

Copy link
Copy Markdown
Member

This builds upon the deployment updates in #714, but with 2 main additions:

  1. Updates to use the new PostgreSQL v18 database server, and also use PostgreSQL v18 in the Docker development environment.
  2. Adds support for a database restore command that can be used for local development or for staging branches. This will restore a snapshot of production data to these environments, making it easier to test staging branches or perform local development with some real data in place.
    • Since the production data is large, this subset of data is limited to the past 15 days of all runs and then every 26 days from the past year (roughly equivalent to another 14 days of data, but spread across the year for more sampling). We could tweak these, but I was just trying to find a compromise of some subset that wasn't too large.
    • For local development, authorized NLR users can restore this snapshot by running docker compose run --rm db-syncer rake db:data:restore
    • For staging deployments, if you add the deploy-db-restore label to the deployment PR, then on the next deployment, it will restore with the latest snapshot once (the label will then be removed, but you can add it again to trigger another restore at a later date).
    • This will hopefully simplify staging testing and eliminate the need for some of the custom databases and deployment environments, since you can now test against real production data on any branch at any time. But still happy to chat more if there were other use-cases you all had.
    • The database restore solution here relies on some Ruby gems our team uses in our other projects. So I know it's maybe a bit funky to shoehorn this into this Python/Julia app, but it's fairly separate, so hopefully this won't matter a ton. It mostly boils down to just raw SQL, pg_dump, and pg_restore commands, this was just the easiest way to leverage the process our team uses in our Ruby projects. But I'm happy to revisit if this feels too odd or we run into any issues. But I'm hoping with the Docker abstraction, it's sort of irrelevant how it's written (you can just run that docker compose run --rm db-syncer locally and not worry about Ruby or anything else).

@GUI GUI requested a review from Bill-Becker June 4, 2026 04:07
GUI added 3 commits June 4, 2026 08:10
Since this is an optional container only used in local development,
shift it to a profile so that it doesn't start by default. This
container will only work for NLR developers, so we don't want it
starting/building by default in the CI environment or for other users.
Also update the nojulia docker compose variant to align with the main
docker compose file that uses the default postgres superuser now (to
better align with other environments).
@Bill-Becker

Copy link
Copy Markdown
Collaborator

@GUI when trying to do "docker compose run --rm db-syncer rake db:data:restore" locally, I get SSL certificate problem: self-signed certificate in certificate chain.

@Bill-Becker

Bill-Becker commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Thoughts on why this deploy failed?
https://github.nrel.gov/TADA/REopt_API-deploy-mirror/actions/runs/495070

Was this deployed with deploy-db-restore label? I'm not sure how I can tell when it's removed after. I was trying to GET a run_uuid result from the production db that happened within the last 15 days through the staging pg-v18 branch, but it said the run_uuid was not in the db.

I do know we had a migration added after the May 15th deploy where you had to manually fix some migrations for the db snapshot, so we will need to do what you did before again.

@Bill-Becker

Copy link
Copy Markdown
Collaborator

I just wanted to confirm that the celery (+julia) and julia pods are restarting nightly still. It doesn't look like they are based on the age of the pods there. Where if I compare to the master branch on staging, the pods seem to have restarted at ~1am as expected.

@Bill-Becker Bill-Becker left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but I left some comments separately/outside of this review to follow up on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants