Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ jobs:

env:
CGO_ENABLED: '1'
CELER_INSECURE_TLS: '1'
AGENTPAY_INSECURE_TLS: '1'

steps:
- name: Checkout
Expand Down
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ Wire and admin contracts live in [proto/](proto) and [webapi/proto/](webapi/prot

- **macOS amd64 cgo linker bug.** The local `go1.25.5` toolchain has been observed to fail with duplicate `runtime/cgo` symbols on this codebase (reproduces on trivial `import "C"` programs too — it's a toolchain issue, not a repo regression). Before `go build` / `go test`, export `GOTOOLCHAIN=go1.24.9` for the shell. Documented in [docs/backend-troubleshooting.md](docs/backend-troubleshooting.md).
- **CGO is required for SQLite-backed builds.** `-storedir` mode fails fast with `sqlite3 requires cgo` if you accidentally build with `CGO_ENABLED=0`. CI sets `CGO_ENABLED=1` explicitly.
- **`CELER_INSECURE_TLS=1` is normal for localhost.** Inter-OSP and client→OSP localhost dials use the built-in self-signed localhost cert; `test/manual/run_osp.sh` and the e2e harness already set this. Set it yourself when launching binaries directly against `localhost`/`127.0.0.1`.
- **`AGENTPAY_INSECURE_TLS=1` is normal for localhost.** Inter-OSP and client→OSP localhost dials use the built-in self-signed localhost cert; `test/manual/run_osp.sh` and the e2e harness already set this. Set it yourself when launching binaries directly against `localhost`/`127.0.0.1`.
- **e2e is slow.** CI gives the default suite 30 minutes (40-minute job cap) and crossnet 30/45. For a local validation loop, prefer the focused single-test runs in [AGENTS.md §Build And Test](AGENTS.md) over `go test ./test/e2e`. On failed runs the harness keeps `/tmp/celer_e2e_*` and prints a `-reuse` path — use it instead of paying the rebuild cost again.

## Logging convention
Expand Down
8 changes: 4 additions & 4 deletions config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,10 @@ var (

// Safe-margin knobs are env-var tunable so e2e tests can shrink them. Production
// defaults (60s) absorb chain-confirmation slack past a deadline; tests typically
// set CELER_*_SAFE_MARGIN_S=5 to keep the timeout-and-sweep flow snappy.
WithdrawTimeoutSafeMargin = envUint("CELER_WITHDRAW_SAFE_MARGIN_S", 60) // seconds
PaySendTimeoutSafeMargin = envUint("CELER_PAY_SEND_SAFE_MARGIN_S", 60) // seconds
PayRecvTimeoutSafeMargin = envUint("CELER_PAY_RECV_SAFE_MARGIN_S", 60) // seconds
// set AGENTPAY_*_SAFE_MARGIN_S=5 to keep the timeout-and-sweep flow snappy.
WithdrawTimeoutSafeMargin = envUint("AGENTPAY_WITHDRAW_SAFE_MARGIN_S", 60) // seconds
PaySendTimeoutSafeMargin = envUint("AGENTPAY_SEND_SAFE_MARGIN_S", 60) // seconds
PayRecvTimeoutSafeMargin = envUint("AGENTPAY_RECV_SAFE_MARGIN_S", 60) // seconds
)

const (
Expand Down
4 changes: 2 additions & 2 deletions docs/backend-troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ Checks:
Known-good local example:

```bash
CELER_INSECURE_TLS=1 go run ./server/server.go \
AGENTPAY_INSECURE_TLS=1 go run ./server/server.go \
-profile $AGENTPAY_MANUAL_ROOT/profile/o1_profile.json \
-ks ./testing/env/keystore/osp1.json \
-port 10001 \
Expand Down Expand Up @@ -196,7 +196,7 @@ What these usually mean:

- `celer stream already exists`: the server already has a live or remembered stream for that peer and RPC address.
- `grpcDial ... failed`: the target host or port is wrong, the peer process is not listening, or TLS/networking is broken.
- When the target is `localhost` or `127.0.0.1`, a dial timeout can also mean the process is using the built-in self-signed localhost certificate without `CELER_INSECURE_TLS=1` on the dialing side.
- When the target is `localhost` or `127.0.0.1`, a dial timeout can also mean the process is using the built-in self-signed localhost certificate without `AGENTPAY_INSECURE_TLS=1` on the dialing side.
- `waitRecvWithTimeout failed`: the transport connected, but the auth handshake did not complete.
- `peer not online` or `no celer stream`: later traffic depends on a stream that was never established or was dropped.

Expand Down
10 changes: 5 additions & 5 deletions docs/backend-usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,7 @@ SQLite-backed example:
./run_osp.sh 2
```

For localhost manual runs, `test/manual/run_osp.sh` defaults `CELER_INSECURE_TLS=1` so inter-OSP dials work with the built-in self-signed localhost certificate.
For localhost manual runs, `test/manual/run_osp.sh` defaults `AGENTPAY_INSECURE_TLS=1` so inter-OSP dials work with the built-in self-signed localhost certificate.

CockroachDB-backed example:

Expand Down Expand Up @@ -279,7 +279,7 @@ Optional phase-1 seller-OSP WebAPI listener for a same-host caller:
-webapigrpc 127.0.0.1:12000
```

If this process will dial localhost peers using the built-in localhost certificate, prefix the command with `CELER_INSECURE_TLS=1` unless you are using `test/manual/run_osp.sh`, which already does that for local manual runs.
If this process will dial localhost peers using the built-in localhost certificate, prefix the command with `AGENTPAY_INSECURE_TLS=1` unless you are using `test/manual/run_osp.sh`, which already does that for local manual runs.

For a CockroachDB-backed node, replace `-storedir` with `-storesql`:

Expand Down Expand Up @@ -322,13 +322,13 @@ The on-chain contracts use `block.timestamp` (unix seconds) for every challenge
- rtconfig `min_dispute_timeout` / `max_dispute_timeout` / `max_payment_timeout`
- per-token rtconfig `min_deadline_delta` / `max_deadline_delta` (open-channel policy)
- `config.OpenChannelTimeout`, `CooperativeWithdrawTimeout`, `PayResolveTimeout`, `AdminSendTokenTimeout`, `TcbTimeoutSeconds`
- env-var safe-margin knobs (`CELER_PAY_RECV_SAFE_MARGIN_S`, `CELER_PAY_SEND_SAFE_MARGIN_S`, `CELER_WITHDRAW_SAFE_MARGIN_S`, default `60` each)
- env-var safe-margin knobs (`AGENTPAY_RECV_SAFE_MARGIN_S`, `AGENTPAY_SEND_SAFE_MARGIN_S`, `AGENTPAY_WITHDRAW_SAFE_MARGIN_S`, default `60` each)

When tuning rtconfig for a new chain, retune in seconds — not blocks. There is no implicit blocks-per-second multiplier in the off-chain code.

#### Test environment overrides

The e2e test harness sets `CELER_*_SAFE_MARGIN_S=5` in `TestMain` so the timeout-and-sweep flow runs in seconds instead of minutes. Production deployments should leave these unset (default `60`).
The e2e test harness sets `AGENTPAY_*_SAFE_MARGIN_S=5` in `TestMain` so the timeout-and-sweep flow runs in seconds instead of minutes. Production deployments should leave these unset (default `60`).

## Server Flags That Matter Most

Expand Down Expand Up @@ -428,7 +428,7 @@ These clients still use the same backend protocol pipeline and storage model des

## Practical Notes

- The e2e tests set `CELER_INSECURE_TLS=1` so localhost clients can talk to the server's built-in localhost certificate without CA setup.
- The e2e tests set `AGENTPAY_INSECURE_TLS=1` so localhost clients can talk to the server's built-in localhost certificate without CA setup.
- OSP routing behavior only becomes meaningful after the OSP is registered in the on-chain `RouterRegistry`.
- The server starts periodic OSP cleanup that clears expired or on-chain-resolved payments with peer OSPs.
- `rtconfig` is operationally important. Payment timeout, refill, and deposit behavior are not hardcoded solely in Go constants.
Expand Down
8 changes: 4 additions & 4 deletions test/e2e/e2e_setup_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,13 +30,13 @@ func TestMain(m *testing.M) {
log.EnableColor()
// Allow client dials to use insecure transport for localhost during e2e,
// avoiding CA mismatches with the server's self-signed localhost cert.
os.Setenv("CELER_INSECURE_TLS", "1")
os.Setenv("AGENTPAY_INSECURE_TLS", "1")
// Shrink the chain-confirmation slack past pay/withdraw deadlines so the
// timeout-and-sweep e2e flow runs in seconds instead of minutes. Production
// defaults (60s) are restored automatically when these env vars are unset.
os.Setenv("CELER_PAY_RECV_SAFE_MARGIN_S", "5")
os.Setenv("CELER_PAY_SEND_SAFE_MARGIN_S", "5")
os.Setenv("CELER_WITHDRAW_SAFE_MARGIN_S", "5")
os.Setenv("AGENTPAY_RECV_SAFE_MARGIN_S", "5")
os.Setenv("AGENTPAY_SEND_SAFE_MARGIN_S", "5")
os.Setenv("AGENTPAY_WITHDRAW_SAFE_MARGIN_S", "5")
// Ensure DEBUG and above from app are visible in test output by default
log.SetLevelByName("debug")
if *reuse != "" {
Expand Down
2 changes: 1 addition & 1 deletion test/e2e/send_pay_timeout.go
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ func sendPayTimeout(t *testing.T, tokenType entity.TokenType, tokenAddr string)
return
}

// Wait past the pay deadline + receiver-side safe margin (CELER_PAY_RECV_SAFE_MARGIN_S=5
// Wait past the pay deadline + receiver-side safe margin (AGENTPAY_RECV_SAFE_MARGIN_S=5
// in TestMain) so SettleExpiredPays sees the pay as expired.
err = c1.WaitUntilDeadline(payTime + timeout + 10)
if err != nil {
Expand Down
2 changes: 1 addition & 1 deletion test/manual/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Then do the same for OSP2, run **`./osp-cli -profile $AGENTPAY_MANUAL_ROOT/profi

Run **`./run_osp.sh 1`** and **`./run_osp.sh 2`** in two new terminals respectively to start OSP1 and OSP2. OSP data store is created at `$AGENTPAY_MANUAL_ROOT/store/[ospAddr]`.

For local manual runs, `run_osp.sh` defaults `CELER_INSECURE_TLS=1` so inter-OSP dials work with the built-in self-signed localhost certificate. If you start the servers directly instead of using this script, set that env var yourself.
For local manual runs, `run_osp.sh` defaults `AGENTPAY_INSECURE_TLS=1` so inter-OSP dials work with the built-in self-signed localhost certificate. If you start the servers directly instead of using this script, set that env var yourself.

### Option 2: use CockroachDB as storage backend (higher performance)

Expand Down
22 changes: 11 additions & 11 deletions test/manual/run_osp.sh
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
#!/bin/sh

MANUAL_ROOT="${AGENTPAY_MANUAL_ROOT:-/tmp/celer_manual_test}"
LOCAL_INSECURE_TLS="${CELER_INSECURE_TLS:-1}"
LOCAL_INSECURE_TLS="${AGENTPAY_INSECURE_TLS:-1}"

run_osp_1() {
echo "run OSP 1"
CELER_INSECURE_TLS="${LOCAL_INSECURE_TLS}" go run "${AGENTPAY}/server/server.go" \
AGENTPAY_INSECURE_TLS="${LOCAL_INSECURE_TLS}" go run "${AGENTPAY}/server/server.go" \
-profile "${MANUAL_ROOT}/profile/o1_profile.json" \
-ks "${AGENTPAY}/testing/env/keystore/osp1.json" \
-port 10001 \
Expand All @@ -21,7 +21,7 @@ run_osp_1() {

run_osp_1_crdb() {
echo "run OSP 1 w/ cockroach db"
CELER_INSECURE_TLS="${LOCAL_INSECURE_TLS}" go run "${AGENTPAY}/server/server.go" \
AGENTPAY_INSECURE_TLS="${LOCAL_INSECURE_TLS}" go run "${AGENTPAY}/server/server.go" \
-profile "${MANUAL_ROOT}/profile/o1_profile.json" \
-ks "${AGENTPAY}/testing/env/keystore/osp1.json" \
-port 10001 \
Expand All @@ -37,7 +37,7 @@ run_osp_1_crdb() {

run_osp_2() {
echo "run OSP 2"
CELER_INSECURE_TLS="${LOCAL_INSECURE_TLS}" go run "${AGENTPAY}/server/server.go" \
AGENTPAY_INSECURE_TLS="${LOCAL_INSECURE_TLS}" go run "${AGENTPAY}/server/server.go" \
-profile "${MANUAL_ROOT}/profile/o2_profile.json" \
-ks "${AGENTPAY}/testing/env/keystore/osp2.json" \
-port 10002 \
Expand All @@ -53,7 +53,7 @@ run_osp_2() {

run_osp_2_crdb() {
echo "run OSP 2 w/ cockroach db"
CELER_INSECURE_TLS="${LOCAL_INSECURE_TLS}" go run "${AGENTPAY}/server/server.go" \
AGENTPAY_INSECURE_TLS="${LOCAL_INSECURE_TLS}" go run "${AGENTPAY}/server/server.go" \
-profile "${MANUAL_ROOT}/profile/o2_profile.json" \
-ks "${AGENTPAY}/testing/env/keystore/osp2.json" \
-port 10002 \
Expand All @@ -69,7 +69,7 @@ run_osp_2_crdb() {

run_osp_3() {
echo "run OSP 3"
CELER_INSECURE_TLS="${LOCAL_INSECURE_TLS}" go run "${AGENTPAY}/server/server.go" \
AGENTPAY_INSECURE_TLS="${LOCAL_INSECURE_TLS}" go run "${AGENTPAY}/server/server.go" \
-profile "${MANUAL_ROOT}/profile/o3_profile.json" \
-ks "${AGENTPAY}/testing/env/keystore/osp3.json" \
-port 10003 \
Expand All @@ -85,7 +85,7 @@ run_osp_3() {

run_osp_3_crdb() {
echo "run OSP 3 w/ cockroach db"
CELER_INSECURE_TLS="${LOCAL_INSECURE_TLS}" go run "${AGENTPAY}/server/server.go" \
AGENTPAY_INSECURE_TLS="${LOCAL_INSECURE_TLS}" go run "${AGENTPAY}/server/server.go" \
-profile "${MANUAL_ROOT}/profile/o3_profile.json" \
-ks "${AGENTPAY}/testing/env/keystore/osp3.json" \
-port 10003 \
Expand All @@ -101,7 +101,7 @@ run_osp_3_crdb() {

run_osp_4() {
echo "run OSP 4"
CELER_INSECURE_TLS="${LOCAL_INSECURE_TLS}" go run "${AGENTPAY}/server/server.go" \
AGENTPAY_INSECURE_TLS="${LOCAL_INSECURE_TLS}" go run "${AGENTPAY}/server/server.go" \
-profile "${MANUAL_ROOT}/profile/o4_profile.json" \
-ks "${AGENTPAY}/testing/env/keystore/osp4.json" \
-port 10004 \
Expand All @@ -117,7 +117,7 @@ run_osp_4() {

run_osp_4_crdb() {
echo "run OSP 4 w/ cockroach db"
CELER_INSECURE_TLS="${LOCAL_INSECURE_TLS}" go run "${AGENTPAY}/server/server.go" \
AGENTPAY_INSECURE_TLS="${LOCAL_INSECURE_TLS}" go run "${AGENTPAY}/server/server.go" \
-profile "${MANUAL_ROOT}/profile/o4_profile.json" \
-ks "${AGENTPAY}/testing/env/keystore/osp4.json" \
-port 10004 \
Expand All @@ -133,7 +133,7 @@ run_osp_4_crdb() {

run_osp_5() {
echo "run OSP 5"
CELER_INSECURE_TLS="${LOCAL_INSECURE_TLS}" go run "${AGENTPAY}/server/server.go" \
AGENTPAY_INSECURE_TLS="${LOCAL_INSECURE_TLS}" go run "${AGENTPAY}/server/server.go" \
-profile "${MANUAL_ROOT}/profile/o5_profile.json" \
-ks "${AGENTPAY}/testing/env/keystore/osp5.json" \
-port 10005 \
Expand All @@ -149,7 +149,7 @@ run_osp_5() {

run_osp_5_crdb() {
echo "run OSP 5 w/ cockroach db"
CELER_INSECURE_TLS="${LOCAL_INSECURE_TLS}" go run "${AGENTPAY}/server/server.go" \
AGENTPAY_INSECURE_TLS="${LOCAL_INSECURE_TLS}" go run "${AGENTPAY}/server/server.go" \
-profile "${MANUAL_ROOT}/profile/o5_profile.json" \
-ks "${AGENTPAY}/testing/env/keystore/osp5.json" \
-port 10005 \
Expand Down
6 changes: 3 additions & 3 deletions utils/utils.go
Original file line number Diff line number Diff line change
Expand Up @@ -115,12 +115,12 @@ func GetClientTlsOption() grpc.DialOption {
}

func IsPermissiveClientTLS() bool {
v := os.Getenv("CELER_INSECURE_TLS")
v := os.Getenv("AGENTPAY_INSECURE_TLS")
return v == "1" || strings.EqualFold(v, "true")
}

// GetClientTlsOptionPermissive returns insecure transport credentials when the
// environment variable CELER_INSECURE_TLS is set ("1"/"true"). This is useful
// environment variable AGENTPAY_INSECURE_TLS is set ("1"/"true"). This is useful
// for local e2e tests where the server uses a self-signed localhost certificate
// that may not chain to CAs available to the client.
func GetClientTlsOptionPermissive() grpc.DialOption {
Expand Down Expand Up @@ -149,7 +149,7 @@ func WrapLocalTLSDialError(target string, err error) error {
if err == nil || IsPermissiveClientTLS() || !IsLoopbackTarget(target) {
return err
}
return fmt.Errorf("%w; local TLS hint: %s uses the built-in self-signed localhost certificate, set CELER_INSECURE_TLS=1 on the dialing process or configure a trusted cert via -tlscert/-tlskey", err, target)
return fmt.Errorf("%w; local TLS hint: %s uses the built-in self-signed localhost certificate, set AGENTPAY_INSECURE_TLS=1 on the dialing process or configure a trusted cert via -tlscert/-tlskey", err, target)
}

// GetClientTlsConfig returns tls.Config with system and celerCA, for https interaction
Expand Down
Loading