[DCP] Auto create tables when DB is empty#494
[DCP] Auto create tables when DB is empty#494gmechali merged 7 commits intodatacommonsorg:masterfrom
Conversation
Not up to standards ⛔🔴 Issues
|
| Category | Results |
|---|---|
| CodeStyle | 3 minor |
| Complexity | 3 medium |
🟢 Metrics 22 complexity · 0 duplication
Metric Results Complexity 22 Duplication 0
TIP This summary will be updated as you push new changes. Give us feedback
There was a problem hiding this comment.
Code Review
This pull request enhances the SpannerClient to handle existing but uninitialized databases by checking for the presence of core tables and applying DDL statements if the database is empty. It also introduces unit tests for these new initialization checks. Review feedback suggests using configured table names instead of hardcoded strings for consistency, optimizing I/O by deferring DDL reading until needed, removing redundant fully qualified class names, and adding a missing assertion in the test suite.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refactors the Spanner database initialization process by replacing the createDatabase method with a more comprehensive validateOrInitializeDatabase method. This new logic verifies the existence of required tables and proto bundles, handling cases where the database might be partially initialized or empty. Additionally, the PR introduces unit tests for SpannerClient and updates the project's dependencies to support testing. Review feedback suggests improving the robustness of DDL statement detection by using more specific string matching and adhering to the principle of programming to an interface by using List instead of ArrayList in method signatures.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
Thanks for making this change. This was one of the pending items since the previous logic was incompatible with the Terraform setup. FYI, Terraform deployment script creates additional tables in the DB for import automation. |
Got it - so the DCP Terraform script now can also just keep its current logic of creating the DataBase, but letting the ingestion script initialize the tables. We also could have initialized the tables in the Terraform script, but I think that holds some pretty significant risk. If a user ends up making a change to the Table Schema and reapplies terraform, it does not modify the table, it recreates it, and therefore could drop all the data in the tables. I will put some thought into it in a few days when I get to the TF Import Status setup for DCP. |
Modify the ingestion pipeline's logic for DB creation to be a bit more flexible. Before this change, it just checks whether the DB exists, and if not it creates it with the appropriate scheme and protodescriptor.
With this change, we retain the ability to create the DB and initialize it properly, but we also allow customers to provide an empty DB, and have it be initialized with the proper schema and proto descriptor.
This is important for DCP, where the DB should be created via Terraform at setup time in order to also connect the DCP service to the DB, however the schema is not necessary at that point, and so we can expect the ingestion pipeline to validate the schema and create it when necessary.
I've tested this on all the following scenarios: