Skip to content

[BUG]: Proguard is flaky in CI due to OOMs #6186

@BenHenning

Description

@BenHenning

Describe the bug

We're seeing an uptick in Proguard CI builds failing with an error code 14 like so:

Server terminated abruptly (error code: 14, error message: 'Socket closed', log file: '/home/runner/.cache/bazel/_bazel_runner/ef69b31fa14ea8341d6589b0a8832bdd/server/jvm.out')

My supposition is this is an OOM and Gemini supports this conclusion.

Steps To Reproduce

Not easy unfortunately because it's fundamentally a flake.

Expected Behavior

No crashes--builds are usually quite stable.

We presumably need to reduce our memory overhead for these tasks but it's a bit challenging because Proguard itself requires substantial memory.

Gemini's suggested fixes don't seem reasonable at this stage (it suggested trying to run max memory for Bazel vs. Proguard or upgrading the actions workers use). I think something that might work better is to actually change the CI action to build everything the AAB needs before running Proguard, shutting down Bazel, then run the final build command. This will force Bazel to free up a lot of memory accumulated during the build which frees up much more for Proguard to use. An example of what that might look like for an example target //:oppia_beta:

bazel build //:oppia_beta_deployable.aab
bazel shutdown
bazel build //:oppia_beta

Some caveats:

  • I haven't tested this.
  • Each of those probably should be distinct steps in the workflow not done together.
  • This will need to be adapted to work for each build flavor, though it needs to be specialized for //:oppia_dev because it doesn't have an intermediary "deployable" variant (since it doesn't go through Proguard optimizing).
  • This will slow down the build CIs a bit due to the shutdown and restart but it probably won't even be noticeable (maybe a 15-20s penalty for runs that easily take 15-20 minutes).

Screenshots/Videos

No response

What device/emulator are you using?

No response

Which Android version is your device/emulator running?

No response

Which version of the Oppia Android app are you using?

No response

Additional Context

No response

Metadata

Metadata

Labels

Impact: HighHigh perceived user impact (breaks a critical feature or blocks a release).Work: MediumThe means to find the solution is clear, but it isn't at good-first-issue level yet.bugEnd user-perceivable behaviors which are not desirable.

Type

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions