fix: add retry with || true for post-snapshot invoke in test-04 by Slambot01 · Pull Request #736 · hyperledger-labs/fablo

Slambot01 · 2026-04-28T13:03:39Z

Fixes #734

The first expectInvokeRest after snapshot restore intermittently fails with DEADLINE_EXCEEDED. waitForContainer confirms the CCaaS gRPC server is listening, but the peer's gRPC client reconnection happens asynchronously - sleep 2 (from #648) is insufficient under variable CI load.

The previous retry attempt in #648 used && break without || true, which couldn't work because expect-invoke-rest.sh calls exit 1 on failure and the test runs under set -e, aborting before the loop can retry.

Fix

Added expectInvokeRestWithRetry using the same || true pattern proven in test-01-v3-simple.sh (expectQueryWithRetry, lines 48–61). Applied only to the first invoke after snapshot restore .the second invoke runs normally since the connection is established by then.

Signed-off-by: Ritesh Pandit <riteshpandit1708@gmail.com>

sorry, I missed it didn't resolved the underlying issue

dzikowski · 2026-05-11T16:55:30Z

I initially approved, but then I realized it does not solved the underlying issue. Test still fail, probably because the error is in a different place. Not in the test, and waiting till the container is ready - but somewhere in the network boot/restore process. There is probably race condition when the chaincode container starts. Maybe as a workaround we should try to restart it in the test to verify if that's the issue? And then fix in a proper way, eliminating the root cause.

Slambot01 · 2026-05-12T02:50:40Z

I initially approved, but then I realized it does not solved the underlying issue. Test still fail, probably because the error is in a different place. Not in the test, and waiting till the container is ready - but somewhere in the network boot/restore process. There is probably race condition when the chaincode container starts. Maybe as a workaround we should try to restart it in the test to verify if that's the issue? And then fix in a proper way, eliminating the root cause.

Makes sense. I’ll look into the chaincode container restart angle. I’ll check if restarting the CCaaS container after snapshot restore fixes the failure, and if it does, I’ll trace it back to the actual issue in the restore flow.

…RPC connections Signed-off-by: Ritesh Pandit <riteshpandit1708@gmail.com>

Slambot01 · 2026-05-12T04:39:09Z

I initially approved, but then I realized it does not solved the underlying issue. Test still fail, probably because the error is in a different place. Not in the test, and waiting till the container is ready - but somewhere in the network boot/restore process. There is probably race condition when the chaincode container starts. Maybe as a workaround we should try to restart it in the test to verify if that's the issue? And then fix in a proper way, eliminating the root cause.

The CCaaS restart approach worked locally, but CI is still failing. All 10 retries end up hitting DEADLINE_EXCEEDED.
The CCaaS containers are bootstrapping properly after restart, and the peer gRPC state is showing READY, so it doesn’t look like the issue is with the CCaaS ↔ peer connection itself.At this point the problem seems to be somewhere deeper in the restore flow. I need to dig into the state the peer or fablo-rest ends up in after restore, because something there is preventing the endorsement flow from completing properly. I guess ,will need to see further for this.

fix: add retry with || true for post-snapshot invoke in test-04

a5b86dc

Signed-off-by: Ritesh Pandit <riteshpandit1708@gmail.com>

dzikowski previously approved these changes May 11, 2026

View reviewed changes

Merge branch 'main' into fix/e2e-snapshot-ccaas-retry

7ceaadd

fix: restart CCaaS containers after snapshot restore to clear stale g…

1f498c1

…RPC connections Signed-off-by: Ritesh Pandit <riteshpandit1708@gmail.com>

Slambot01 and others added 2 commits May 12, 2026 23:17

Merge branch 'main' into fix/e2e-snapshot-ccaas-retry

c8bea81

Merge branch 'main' into fix/e2e-snapshot-ccaas-retry

1407774

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add retry with || true for post-snapshot invoke in test-04#736

fix: add retry with || true for post-snapshot invoke in test-04#736
Slambot01 wants to merge 5 commits into
hyperledger-labs:mainfrom
Slambot01:fix/e2e-snapshot-ccaas-retry

Slambot01 commented Apr 28, 2026

Uh oh!

dzikowski commented May 11, 2026

Uh oh!

Slambot01 commented May 12, 2026

Uh oh!

Slambot01 commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Slambot01 commented Apr 28, 2026

Fix

Uh oh!

dzikowski commented May 11, 2026

Uh oh!

Slambot01 commented May 12, 2026

Uh oh!

Slambot01 commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants