Handover

Snapshot

Date: 2026-04-10
Repo root: C:\Users\justi\dev\risk-api
Branch: master
Repo baseline: d6e11f8 (Update handover after dashboard proof deploy; handover-only)
Repo app-code baseline: 0ab058c (Update handover for dashboard proof follow-up; app code from c97098c)
Deployed app baseline: 0ab058c (Update handover for dashboard proof follow-up; app code from c97098c)
Status: deployed production is green on 0ab058c. The live app is healthy, Fly is on machine version 110 with 1 passing health check, and the public first-party surfaces now include the dashboard Traffic Quality Classes panel, the /reports/base-weth-before-after proof artifact, the concrete action-aware approve example, the canonical first successful paid-call Base WETH path, and the Call before pay / approve / interact positioning copy across /, /skill.md, /llms.txt, /llms-full.txt, /how-payment-works, /.well-known/x402, OpenAPI examples, and the sitemap. Live /stats exposes traffic_classes for health checks, evaluator bots, malformed probes, unpaid conversion attempts, paid requests, and other traffic. A real paid production smoke on 2026-04-06 succeeded on the live action-aware approve request shape (402 -> PAYMENT-SIGNATURE -> 200) against Base WETH with:
- top-level decision: allow
- action-level decision: warn
- action-level reason codes: action_approve_requested Live /stats now also shows the new request-log observability fields for that paid approve request:
- action_spender_trust: unchecked
- action_decision: warn The deploy quirk still matters operationally, but the 2026-04-10 flyctl deploy --remote-only --app augurrisk run from clean detached worktree C:\Users\justi\AppData\Local\Temp\risk-api-deploy-0ab058c completed cleanly and cleared its lease. Live verification immediately afterward returned /health 200, /dashboard containing Traffic Quality Classes, /reports/base-weth-before-after containing both before/after scores, homepage and llms.txt containing Call before pay. Call before approve. Call before interact., sitemap containing the new proof route, and /stats with populated traffic_classes. The main product question is now whether the clearer first-call path plus proof artifact reduces malformed /analyze probes and whether real unpaid conversion attempts turn into repeated paid calls before widening the action-aware API. External registry copy was intentionally left alone because the product positioning did not change. The only remaining local leftovers are unrelated autoresearch files (auto/README.md, src/risk_api/auto_bench.py, tests/test_auto_bench.py) plus scratch dirs/files such as .claude/, .codex/live_db/, .codex/research.local/, .codex/tmp/, and .playwright-mcp/.
Traffic read on 2026-04-09: pulled the live durable SQLite analytics store from Fly machine 287d341f3e0ed8 (/data/analytics.sqlite3 plus WAL state) and analyzed the last 10 local days (2026-03-31 through 2026-04-09 Central). Main judgment: the product direction is still correct, but conversion is weak. The traffic validates the existing agent-first admission-control wedge rather than suggesting a pivot. Key counts in that 10-day window:
- 4,363 total requests
- 132 /analyze requests
- 110 422 /analyze responses
- 17 unpaid 402 /analyze attempts
- 2 paid /analyze requests
- 1 paid action-aware approve request, which is most likely the known smoke path Strongest recurring actors were machine evaluators and ecosystem probers checking /.well-known/x402, /.well-known/agent-card.json, openapi.json, llms.txt, and then /analyze, not retail users. The March 29 false-positive fix clearly helped: paid Base WETH checks moved from score=25 / level=low before dee071e to score=0 / level=safe on 2026-04-04 and 2026-04-06. The April 3 method-contract fix is also visible in successful OPTIONS /analyze probes from ScoutScore-FidelityCheck/1.0. Product implication: no strategy change, but next work should prioritize first-call conversion on /analyze and analytics segmentation for evaluator traffic versus real demand before widening the action-aware API.

What Changed

Continued and deployed the conversion execution plan on 2026-04-10:
- added a /dashboard Traffic Quality Classes section that renders traffic_classes for:
  - real unpaid conversion attempts
  - paid requests
  - malformed probes
  - known directory/evaluator bots
  - known health checks
  - other traffic
- added a new registry-backed proof page at /reports/base-weth-before-after showing:
  - exact Base WETH request
  - before output: score=25, level=low, decision=block, reason_codes=["honeypot_signal"]
  - after output: score=0, level=safe, decision=allow, reason_codes=[]
- added the proof link to the homepage Proof of Work section and sitemap through the existing REPORT_PAGES registry path
- added the positioning copy Call before pay. Call before approve. Call before interact. Augur is deterministic preflight for Base contract actions. to first-party public/machine docs
- local verification passed: python -m pytest -q -> 401 passed
- deployment verification passed:
  - pushed c97098c and 0ab058c to origin/master
  - deployed from clean detached worktree C:\Users\justi\AppData\Local\Temp\risk-api-deploy-0ab058c
  - flyctl status --app augurrisk -> machine version 110, state started, 1 passing health check
  - https://augurrisk.com/health -> 200 {"status":"ok"}
  - live /dashboard, /reports/base-weth-before-after, homepage, llms.txt, sitemap, and /stats.traffic_classes verified
Started and deployed the conversion-focused follow-up on 2026-04-09:
- tightened the canonical first successful paid-call path around Base WETH (GET /analyze?address=0x4200000000000000000000000000000000000006) across homepage copy, llms.txt, llms-full.txt, skill.md, /how-payment-works, /.well-known/x402, and OpenAPI examples
- added first-class analytics traffic_class labels and /stats.traffic_classes counts for:
  - known_health_check
  - known_directory_evaluator_bot
  - malformed_probe
  - real_unpaid_conversion_attempt
  - paid_request
  - other_traffic
- /health is now request-logged as funnel_stage=health_check / traffic_class=known_health_check, so total analytics volume will include health checks but the traffic-class breakdown separates them from conversion signals
- local verification passed: python -m pytest -q -> 400 passed
- deployment verification passed:
  - pushed commit e552f78 to origin/master
  - deployed from clean detached worktree C:\Users\justi\AppData\Local\Temp\risk-api-deploy-e552f78
  - flyctl status --app augurrisk -> machine version 106, state started, 1 passing health check
  - https://augurrisk.com/health -> 200 {"status":"ok"}
  - live llms.txt contains First Successful Paid Call and Action-Aware Example: Approve
  - live openapi.json exposes the canonical Base WETH address example and the first-call 402 description
  - live /stats exposes traffic_classes
Analyzed live production traffic on 2026-04-09 using the Fly volume-backed SQLite store:
- source of truth:
  - copied /data/analytics.sqlite3 from Fly machine 287d341f3e0ed8
  - also pulled the SQLite -wal / -shm files so the latest rows were included
  - confirmed live /stats was ahead of the older local snapshot and matched the live durable store shape (storage_backend=sqlite, storage_durable=true)
- 10-day window analyzed:
  - 2026-03-31 through 2026-04-09 local time (America/Chicago)
- key findings:
  - product direction remains aligned with the repo's current positioning: agent-first deterministic admission control, not a retail destination screener
  - evaluator and crawler traffic dominate the machine-facing surfaces; repeated fetches of /.well-known/x402, /.well-known/agent-card.json, openapi.json, llms.txt, skill.md, and the intent pages look like ecosystem evaluation rather than end-user demand
  - /analyze conversion is still weak:
    - 132 total /analyze requests in the window
    - 110 422 responses
    - 17 unpaid 402 attempts
    - 2 paid requests
  - most recent /analyze failures are mixed probe traffic and brittle integrations, not one clean buyer cohort:
    - python-httpx/0.28.1 alone produced 52 invalid-address 422s
    - most 422s were missing_address
    - malformed examples like 0x4200000000000000000000000000000000000006/openapi.json show URL-construction bugs or crawler misuse
    - repeated node, python-httpx, Go-http-client, Satring-Scraper, ScoutScore-*, X402-HealthCheck, and Thinkbot traffic means raw request counts overstate real product demand
  - the strongest current hidden batch / product-use signal is infrastructure evaluation:
    - repeated paid and unpaid checks of canonical Base contracts, especially Base WETH
    - recurring evaluator names like ScoutScore-FidelityCheck/1.0, ScoutScore-HealthCheck/1.0, x402audit/1.0, and AgentScore-Enrichment/1.0
    - practical read: Augur is being tested as a machine trust primitive inside agent systems, routers, or listings, which is consistent with the existing product wedge
- commit / product read:
  - dee071e (Fix false positives and align admission-control metadata) appears validated by live demand evidence:
    - paid Base WETH responses before the fix were still score=25 / level=low
    - paid Base WETH responses after the fix were score=0 / level=safe
  - 9e1d59f (Fix analyze method contract behavior) also appears validated:
    - successful OPTIONS /analyze rows from ScoutScore-FidelityCheck/1.0 now appear before unpaid probes, which is the intended compatibility path
  - 93ba6f0 / 1af2be0 action-aware approve work is live and observable, but still not externally validated by demand:
    - only 1 paid action-aware approve request is visible in the analyzed window
    - it used Base WETH plus spender 0x1111111111111111111111111111111111111111
    - action_decision=warn
    - action_spender_trust=unchecked
    - current read: this is most likely the production smoke, not broad market usage
- recommended next move:
  - no pivot
  - make the first successful /analyze call path even more explicit on every machine-facing surface
  - segment evaluator / health-check / crawler traffic from true conversion in analytics before using request counts as traction evidence
  - keep action-aware scope narrow until repeated non-smoke paid usage appears
Added and deployed a concrete first-party action-aware approve example on 2026-04-06 / 2026-04-07 UTC:
- intent:
  - make the current narrow action-aware product shape legible without widening the API
  - show exactly how top-level contract policy and action-level policy can differ on approve
  - keep the change on first-party surfaces instead of churning external registry copy again
- updated local surfaces:
  - homepage (/)
  - skill.md
  - llms.txt
  - llms-full.txt
- message shape:
  - exact GET /analyze?...&action=approve&spender=...&chain=base request example
  - example JSON with action_context and action_evaluation
  - explicit note that V1 currently supports only approve on Base
  - explicit note that spender trust remains unchecked when no allowlist is configured
- external/discovery follow-up:
  - intentionally not yet applied to x402.jobs / MoltMart / Work402 / ERC-8004 / x402.org because this is supporting evidence for the current product message, not a new positioning change
- local verification passed:
  - python -m pytest tests/test_app.py -k "llms_txt or llms_full_txt or skill_md or landing_documents_action_aware_approve_example or landing_links_llms_txt" -q -> 15 passed
  - python -m pytest tests/test_app.py -q -> 168 passed
- deployment status:
  - deployed from a clean detached worktree so unrelated local auto_bench changes were not shipped
- live verification after deploy:
  - flyctl status --app augurrisk -> machine version 104, state started, 1 passing health check
  - https://augurrisk.com/health returned {"status":"ok"}
  - homepage now contains Action-Aware Example: Approve
  - live skill.md, llms.txt, and llms-full.txt now all contain the concrete approve example and action_evaluation output
Deployed the narrow approve refinement set on 2026-04-06:
- commit: 1af2be0 (Refine action-aware approve policy)
- git push origin master succeeded
- flyctl deploy --remote-only --app augurrisk timed out during health polling again, but the new image still landed and the app recovered to healthy on machine version 103
- live verification after deploy:
  - flyctl status --app augurrisk -> machine version 103, state started, 1 passing health check
  - https://augurrisk.com/health returned {"status":"ok"}
  - https://augurrisk.com/openapi.json now includes action_approve_spender_allowlisted and action_approve_spender_not_allowlisted
- real paid production smoke on the live action-aware request shape succeeded:
  - endpoint: https://augurrisk.com/analyze?address=0x4200000000000000000000000000000000000006&action=approve&spender=0x1111111111111111111111111111111111111111&chain=base
  - observed flow: 402 -> signed PAYMENT-SIGNATURE -> 200
  - live result:
    - decision: allow
    - action_evaluation.decision: warn
    - action_evaluation.recommended_policy.reason_codes: ["action_approve_requested"]
    - score: 0
    - level: safe
- live observability proof:
  - /stats durable recent-entry view now shows the paid approve request with:
    - action_spender_trust: unchecked
    - action_decision: warn
    - funnel_stage: paid_request
Added action-aware request observability locally on 2026-04-06:
- /analyze request logging now records action-aware approve context more explicitly when present
- new structured request-log fields:
  - action_spender_trust
    - unchecked
    - allowlisted
    - not_allowlisted
  - action_decision
    - populated from action_evaluation.decision on successful 200 responses
- current purpose:
  - observe whether approve requests are actually using configured allowlists
  - see what action-level decision the current narrow policy is producing in practice
  - avoid freezing extra public API surface before seeing live evidence
- implementation shape:
  - src/risk_api/app.py computes/logs spender trust for validated action-aware requests
  - src/risk_api/analysis/action_policy.py now exports the spender-trust classifier used by both policy derivation and logging
- local verification passed:
  - python -m pytest tests/test_config.py tests/test_logging.py tests/test_action_policy.py tests/test_app.py -q -> 196 passed
  - python -m pytest -q -> 395 passed
- deployment status:
  - deployed in 1af2be0
Refined the action-aware approve layer locally on 2026-04-06 with an opt-in spender allowlist path:
- new config env: APPROVE_SPENDER_ALLOWLIST
  - comma-separated Base spender addresses
  - validated at startup and normalized to lowercase
  - no behavior change when unset
- action-policy behavior is now narrower and more useful when the allowlist is configured:
  - clean base-policy allow + allowlisted spender stays action-level allow
  - clean base-policy allow + spender not on the allowlist escalates to action-level manual_review
  - warn/manual-review/block contracts still do not downgrade below the base contract policy
- new action-level reason codes:
  - action_approve_spender_allowlisted
  - action_approve_spender_not_allowlisted
- route wiring now passes the configured spender allowlist into derive_action_evaluation()
- local verification passed:
  - python -m pytest tests/test_config.py tests/test_action_policy.py tests/test_app.py -q -> 181 passed
  - python -m pytest -q -> 393 passed
- deployment status:
  - deployed in 1af2be0
Implemented a narrow action-aware admission-control V1 locally on 2026-04-06:
- GET and POST /analyze now accept optional action context fields:
  - action
  - spender
  - chain
- V1 is intentionally narrow:
  - only action=approve is accepted
  - only chain=base is accepted when chain is supplied
  - spender is required for approve
  - unsupported action values, unsupported chain values, missing spender, malformed spender, and query/body conflicts on the new fields all return 422 before the x402 paywall
- the contract-level decision and recommended_policy remain unchanged when no action context is provided
- valid action context now adds:
  - action_context
  - action_evaluation
- action_evaluation is additive rather than replacing the core contract engine:
  - clean allow contracts escalate to action-level warn for approve
  - contract-level warn escalates to action-level manual_review
  - manual_review and block remain at least as severe
  - new reason code: action_approve_requested
- implementation shape:
  - new module src/risk_api/analysis/action_policy.py
  - shared request-field parser / validator in src/risk_api/app.py
  - additive wire serialization in src/risk_api/api_contract.py
  - OpenAPI and Bazaar discovery schemas now expose the optional action-aware fields and response objects
- coverage added in:
  - tests/test_action_policy.py
  - tests/test_app.py
  - tests/conftest.py fake Bazaar schema updated to match the new input contract
- verification passed:
  - python -m pytest tests/test_action_policy.py tests/test_app.py -k "approve or spender or action_evaluation or action_aware or unsupported_action or unsupported_chain or action_context or bazaar or openapi" -q -> 26 passed
  - python -m pytest tests/test_app.py -q -> 163 passed
  - python -m pytest -q -> 387 passed
- shipped result:
  - committed as 93ba6f0 (Add action-aware approve policy layer)
  - git push origin master succeeded
  - flyctl deploy --remote-only --app augurrisk succeeded cleanly
  - live verification after deploy:
    - flyctl status --app augurrisk -> machine version 100, state started, 1 passing health check
    - https://augurrisk.com/health returned {"status":"ok"}
    - https://augurrisk.com/openapi.json now includes ActionContext
- follow-up caveat:
  - if public machine/discovery docs beyond openapi.json start describing the action-aware layer more explicitly, review the duplicated machine-facing copy outside src/risk_api/app.py for alignment before the next messaging/discovery push
Hardened /analyze method-contract behavior and docs locally on 2026-04-03:
- src/risk_api/app.py now skips address validation and x402 gating for methods outside the real /analyze contract, so Flask handles OPTIONS and unsupported methods normally instead of returning misleading 422 errors
- /analyze now responds with the default Flask OPTIONS behavior and returns 405 Method Not Allowed for unsupported methods like PUT, PATCH, and DELETE
- the POST 422 OpenAPI examples now explicitly document conflicting query/body addresses, malformed JSON bodies, and non-object JSON bodies
- test coverage added in tests/test_app.py for:
  - ungated OPTIONS /analyze
  - unsupported-method 405 behavior with and without x402 enabled
  - POST 422 OpenAPI body-error examples
- the fake x402 test gate in tests/conftest.py now mirrors the real method contract instead of intercepting unsupported methods
- verification passed:
  - python -m pytest tests/test_app.py -q -> 151 passed
Confirmed a real paid production smoke test on 2026-04-03 using the Conway wallet against the live deployed agent wallet:
- command path used the existing local flow from scripts/test_x402_client.py
- payer wallet: 0x79301Cf19Aaea29fbe40F0F5B78F73e2c3b0a2b8
- payee / agent wallet: 0x13580b9C6A9AfBfE4C739e74136C1dA174dB9891
- target endpoint: https://augurrisk.com/analyze?address=0x4200000000000000000000000000000000000006
- observed flow: 402 -> signed PAYMENT-SIGNATURE -> 200
- live result:
  - decision: allow
  - score: 0
  - level: safe
  - findings: 0
- practical read:
  - the x402 payment path is currently working end to end on production
  - the current live policy for Base WETH now returns the clean result, not the older deployer-reputation warning result captured in older notes
Re-checked the Coinbase public discovery feed on 2026-04-03 local time (2026-04-04T04:06:55Z script timestamp) after the successful paid smoke:
- command: python scripts/check_cdp_discovery.py --max-pages 5 --limit 100 --page-delay 0.75 --max-retries 4 --retry-delay 5
- result: status=NOT_FOUND
- coverage scanned: 5 pages / 500 items
- keyword matches: 0
- practical read:
  - successful paid settlement plus correct live x402 metadata still do not imply public-feed visibility
  - this remains CDP feed/indexing or support-escalation territory, not a repo/runtime bug
Hardened /analyze input handling and endpoint resilience locally on 2026-03-30:
- src/risk_api/app.py now rejects malformed POST JSON bodies and conflicting address values between query params and JSON body before the x402 paywall instead of silently choosing one
- fixed request logging to keep recording only the resolved address after the parser signature change
- added app regressions in tests/test_app.py for:
  - matching query/body POSTs still succeeding
  - conflicting query/body POSTs returning 422
  - malformed JSON bodies returning 422
  - non-object JSON bodies returning 422
  - malformed JSON still returning 422 before x402 payment
- added /stats resilience coverage in tests/test_logging.py so malformed JSONL lines are ignored instead of breaking the endpoint
- extended the ignored local hidden holdout batch with two deployer-reputation partial-failure cases:
  - age lookup fails but low-tx warning survives
  - tx-count lookup fails but young-wallet warning survives
- verification passed:
  - python -m pytest tests/test_app.py tests/test_logging.py -q -> 155 passed
  - python auto/loop.py -> 59/59
  - python -m pytest -q -> 363 passed
Extended the local autoresearch harness on 2026-03-30 so hidden cases can run a fully mocked analyze_contract() pass instead of only pure bytecode/policy checks:
- src/risk_api/auto_bench.py now supports an analysis case kind with mocked RPC and Blockscout responses
- auto/README.md documents the new case kind for explorer-backed and proxy-runtime coverage
- tests/test_auto_bench.py now verifies the mocked analysis path
- local hidden batch added under ignored auto/corpus/*.local.json now pressures:
  - deployer_reputation creator NOT_FOUND
  - deployer_reputation fresh wallet + low tx count
  - proxy implementation NO_CODE
  - proxy FETCH_FAILED vs NO_CODE reason-code separation
  - a reentrancy lookahead-boundary sanity case
- verification passed:
  - python -m pytest tests/test_auto_bench.py -q -> 4 passed
  - python auto/loop.py -> 57/57
  - python -m pytest -q -> 357 passed
Fixed the paid blue-chip false-positive path locally on 2026-03-29:
- narrowed detect_honeypot_patterns() so ordinary dispatcher/default-REVERT flow no longer counts as a honeypot; the detector now only fires on blacklist-style transfer controls until a stronger transfer-path heuristic exists
- changed hidden_mint from automatic block to manual_review when it is the main signal, so governed mint-capability tokens do not hard-block by default
- added/updated reproducible regressions in auto/corpus/public_cases.json, tests/test_patterns.py, tests/test_policy.py, tests/test_engine.py, and tests/test_app.py
- local live re-check against the paid examples now returns:
  - Base WETH (0x4200000000000000000000000000000000000006) -> allow
  - AERO (0x940181a94A35A4569E4529A3CDfB74e38FD98631) -> manual_review
  - 0x1f9840a85d5aF5bf1D1762F925BDADdC4201F984 -> allow
- verification passed:
  - python -m pytest -q -> 340 passed
  - python auto/loop.py -> 43/43
Committed, deployed, and pushed the combined false-positive + metadata-alignment release on 2026-03-29:
- commit: dee071e (Fix false positives and align admission-control metadata)
- flyctl deploy --remote-only succeeded for augurrisk
- git push origin master succeeded
- live internal route verification passed for:
  - https://augurrisk.com/
  - https://augurrisk.com/openapi.json
  - https://augurrisk.com/skill.md
  - https://augurrisk.com/llms.txt
  - https://augurrisk.com/llms-full.txt
  - https://augurrisk.com/.well-known/agent-card.json
  - https://augurrisk.com/agent-metadata.json
  - https://augurrisk.com/.well-known/x402
  - https://augurrisk.com/health
- those live surfaces now reflect the admission control / decision / recommended_policy wording that was still missing before deploy
- stale registration-script test expectations were also updated so the current metadata payloads and examples validate cleanly
Ran the external alignment pass on 2026-03-29:
- pinned new IPFS metadata CID: QmfCBvB5wdBCTeT1XUiXyXY3z2TmUm1rUnQsqrW58reL6S
- verified the pinned metadata via https://gateway.pinata.cloud/ipfs/QmfCBvB5wdBCTeT1XUiXyXY3z2TmUm1rUnQsqrW58reL6S
- updated ERC-8004 agent 19074 to ipfs://QmfCBvB5wdBCTeT1XUiXyXY3z2TmUm1rUnQsqrW58reL6S
  - tx: 24cc2388aa6a2b714f783ffb4f24888c35ed9c761a50854156e01adcce79d733
- updated x402.jobs resource 4964c164-c748-4cd6-a7a5-0ac33e118b6a
  - public API path: https://api.x402.jobs/api/v1/resources/augurrisk-com/augur-base
- updated MoltMart service 984bf985-8f69-4237-b0e4-cd5452f1c489
  - authenticated API now shows the admission-control / decision / recommended_policy wording
- Work402 seller alias Augur already exists as did:erc8004:37906
  - rerunning onboarding returns 409 alias conflict, which confirms the existing DID rather than indicating a fresh failure
- current external audit status after the update pass and browser recheck:
  - x402.jobs public page shows the new admission-control wording
  - MoltMart public page shows the new admission-control wording
  - Work402 public page shows the new admission-control wording for did:erc8004:37906
  - 8004scan now shows the refreshed Augur page and current admission-control wording on the public agent profile
  - x402.org/ecosystem now shows the refreshed Augur admission-control wording on the public card
  - Coinbase public discovery feed still returned NOT_FOUND over the first 5 pages / 500 items when checked with python scripts/check_cdp_discovery.py --max-pages 5
  - x402list.fun is still stale on risk-api.life.conway.tech and still does not mention augurrisk.com
Opened the follow-up curated-listing PR on 2026-03-29:
- coinbase/x402 PR #1869 (Refresh Augur ecosystem listing copy)
- updates typescript/site/app/ecosystem/partners-data/augur/metadata.json in the upstream site repo
- purpose: move the Augur ecosystem card from the old score-first wording to the current admission-control wording
- follow-up status as of 2026-03-30: the public x402.org/ecosystem card is now showing the refreshed copy, so this no longer blocks the external audit pass
Landed a policy-quality follow-up locally on 2026-03-29:
- commit: f248e80 (Refine managed proxy policy decisions)
- added a new tracked autoresearch regression for high-score managed proxy admin surfaces
- policy now returns manual_review instead of auto-block when the signal set is:
  - upgradeable proxy
  - hidden mint / admin-control surface
  - delegatecall and optional suspicious selector signals
  - no harder stop condition like honeypot, selfdestruct, fee manipulation, or unresolved proxy logic
- local real-contract recheck from the current code:
  - Base USDC (0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913) -> manual_review
  - Base cbBTC (0xcbb7c0000ab88b473b1f5afd9ef808440eed33bf) -> manual_review
- validation passed:
  - python auto/loop.py -> 48/48
  - python -m pytest tests/test_policy.py tests/test_engine.py -q -> 43 passed
  - python -m pytest -q -> 347 passed
Landed the next hidden-batch follow-up locally on 2026-03-29:
- standard 45-byte EIP-1167 clone wrappers are now recognized as proxies instead of raw DELEGATECALL shells
- resolve_implementation() now extracts clone targets from runtime bytecode, so wrapper contracts can surface their shared implementation analysis without depending on storage-slot proxies
- added/promoted regressions in:
  - auto/corpus/public_cases.json
  - tests/test_patterns.py
  - tests/test_scoring.py
  - tests/test_engine.py
- local holdout that originally failed was minimal_proxy_clone_raw_delegatecall
- real spot-check after the fix:
  - Beefy/Aerodrome-style Base wrapper 0x09139A80454609B69700836a9eE12Db4b5DBB15f now resolves implementation 0x9818df1bdce8d0e79b982e2c3a93ac821b3c17e0
  - the wrapper shell now reports delegatecall as expected minimal-proxy behavior plus proxy detection, instead of tiny_bytecode + raw-shell framing
  - resulting score on that wrapper moved from a misleading shell-only 25 to an implementation-aware 45; decision stays manual_review because the shared implementation still has real delegatecall/suspicious-selector surface
- validation passed:
  - python auto/loop.py -> 50/50
  - python -m pytest tests/test_patterns.py tests/test_scoring.py tests/test_engine.py -q -> 70 passed
  - python -m pytest -q -> 353 passed
Landed the next hidden-batch follow-up locally on 2026-03-30:
- Solidity CBOR metadata trailers are now stripped before disassembly, so metadata bytes no longer create fake opcode findings like raw DELEGATECALL
- added/promoted regressions in:
  - auto/corpus/public_cases.json
  - tests/test_disassembler.py
  - tests/test_engine.py
- local holdout that originally failed was solidity_metadata_false_delegatecall
- real effect on the shared wrapper family behind the previous clone-proxy pass:
  - shared implementation 0x9818df1bdce8d0e79b982e2c3a93ac821b3c17e0 now moves from manual_review at 25 down to warn at 10
  - wrapper 0x09139A80454609B69700836a9eE12Db4b5DBB15f now moves from manual_review at 45 down to warn at 30
  - sampled Base clone-wrapper family result after the fix: 80/80 wrappers in the sample now land on the same warn cluster with reason codes upgradeable_proxy, delegatecall_surface, and suspicious_selector_signal
- validation passed:
  - python auto/loop.py -> 52/52
  - python -m pytest tests/test_disassembler.py tests/test_engine.py tests/test_patterns.py tests/test_scoring.py -q -> 80 passed
  - python -m pytest -q -> 356 passed
Committed, pushed, and deployed the Solidity-metadata follow-up on 2026-03-30:
- commit: fef6a10 (Ignore Solidity metadata during disassembly)
- git push origin master succeeded
- flyctl deploy --remote-only --app augurrisk updated the machine image but timed out while polling health because the Machines API hit lease / rate-limit errors
- flyctl machine start 287d341f3e0ed8 --app augurrisk recovered the app cleanly
- final live verification:
  - flyctl status --app augurrisk -> machine version 93, state started, 1 passing health check
  - https://augurrisk.com/openapi.json returned the live API document after recovery
Found an ops-script safety bug during the external pass:
- scripts/register_erc8004.py --help does not behave like help; because the script ignores --help, it executed the default register() path and created a second ERC-8004 agent
- accidental tx: 0d09b847ae49c28dfba251485076170ae0ea45aa3eefe4a131a560c3d3fc45b2
- decoded minted agent id: 37905
- fix landed locally the same session:
  - scripts/pin_metadata_ipfs.py, scripts/register_erc8004.py, scripts/register_moltmart.py, and scripts/register_work402.py now use safe argparse entrypoints
  - targeted script tests plus full python -m pytest -q passed afterward
Finished the duplicated registration/discovery metadata wording pass locally on 2026-03-29:
- updated stale registration surfaces in:
  - scripts/pin_metadata_ipfs.py
  - scripts/register_erc8004.py
  - scripts/register_x402jobs.py
  - scripts/register_moltmart.py
  - scripts/register_work402.py
- aligned those scripts around admission control language and the current output shape (decision, recommended_policy, supporting findings, score)
- scripts/register_moltmart.py and scripts/register_x402jobs.py no longer advertise the stale risk_level-style response shape
- syntax check passed:
  - python -m py_compile scripts/pin_metadata_ipfs.py scripts/register_erc8004.py scripts/register_x402jobs.py scripts/register_moltmart.py scripts/register_work402.py
Added a concrete registration/discovery alignment checklist on 2026-03-29:
- docs/REGISTRATIONS.md now explicitly separates:
  - internal app surfaces that update on deploy
  - script-driven third-party listings that require rerunning update flows
  - manual/curated or external-indexed surfaces that only support verification or escalation
- practical consequence:
  - repo alignment is now deployed on augurrisk.com
  - external listing alignment is still not proven until we rerun the script-driven listing updates and manually audit the live surfaces
- currently tracked external surfaces that matter for this pass:
  - ERC-8004 / 8004scan
  - x402.jobs
  - MoltMart
  - Work402 (testnet, if still worth maintaining)
  - x402.org/ecosystem
  - Coinbase public discovery feed
  - x402list.fun as an external stale-state check, not a repo-controlled surface
Captured the 2026-03-26 production traffic and outcome review so the next session does not over-prioritize copy/distribution cleanup:
- pulled the durable production analytics store from Fly volume data (/data/analytics.sqlite3) and checked Fly logs around the same window
- analyzed 2026-03-16 through 2026-03-26 UTC:
  - 3563 total logged events
  - 95 /analyze requests
  - 7 paid /analyze successes, all on 2026-03-25
  - top traffic was mostly discovery/crawler activity on /, /.well-known/agent-card.json, robots.txt, and sitemap.xml
- confirmed the Fly OOM alert was real but brief:
  - 2026-03-26 07:21:00 UTC: Fly logged Out of memory: Killed process 786 (gunicorn)
  - one proxy-side connection closed before message completed error appeared at the same moment
  - a new worker booted at 2026-03-26 07:21:01 UTC
  - the OOM happened during a burst of / requests from x402audit/1.0, not during the paid /analyze burst
- operational read from the production window:
  - no app-level 500 or 502 rows appeared in the durable analytics DB for the analyzed window
  - current live route checks returned 200 for /, /openapi.json, /skill.md, /llms.txt, /llms-full.txt, /.well-known/agent-card.json, /agent-metadata.json, and /.well-known/x402
  - paid /analyze calls averaged about 1.25s; unpaid 402 and invalid 422 paths stayed fast
- product-quality read from the same evidence:
  - the meaningful issue is not uptime; it is that real paid testers hit major contracts and got bad block decisions
  - the paid contracts in the traffic review included Base WETH (0x4200000000000000000000000000000000000006), AERO (0x940181a94A35A4569E4529A3CDfB74e38FD98631), and 0x1f9840a85d5aF5bf1D1762F925BDADdC4201F984
  - local re-check from the current code still blocks all three, so this is not just a historical artifact in old logs
- plan adjustment from this review:
  - keep the full admission-control wording/metadata alignment pass as valid cleanup
  - but move result quality ahead of metadata alignment and ahead of broader distribution work
  - treat the OOM as a secondary ops follow-up unless it repeats
Captured the 2026-03-26 growth/distribution cleanup so the next session starts with the right framing:
- docs/GrowthExecutionPlan.md now points at docs/PRODUCT_DIRECTION_UPDATE.md as the current strategy source instead of treating the older wedge memo as the newest call
- the growth plan now separates three channels that should not be conflated:
  - registry/directory maintenance
  - AI-answer visibility / citation work
  - operator ecosystems such as OpenClaw and installable workflow surfaces
- the old checklist is now framed as baseline history rather than the current sprint, and the next explicit growth items include measured AI-answer visibility work plus one operator-ecosystem experiment
- bookmark review takeaways from this session:
  - CrowdReply belongs in the LLM-discoverability / AI-answer-visibility bucket, not the registry-cleanup bucket
  - Larry something was almost certainly Larrybrain / Larry marketing in the OpenClaw ecosystem
  - Clawdmarket was not found by that exact name in the repo or the current bookmark export
- the temporary scratch note docs/X_BOOKMARK_DISTRIBUTION_NOTES.md was merged into docs/GrowthExecutionPlan.md and deleted to avoid doc sprawl
Added a product-direction follow-up memo and aligned the public/docs wording with it on 2026-03-19:
- new memo: docs/PRODUCT_DIRECTION_UPDATE.md
- supporting market-read memo: docs/SELLING_TO_AGENTS_MEMO.md
- docs/agent-economy-primer.md now points to the new memo for market/product implications so the primer stays focused on payment/discovery stack layers
- keeps the engine/wedge, but makes the current direction explicit: Augur should be sold as pre-transaction contract admission control for agents on Base, not mainly as a generic risk-score/report API
- README.md now leads with admission control and treats the score as supporting output instead of the main product
- src/risk_api/app.py now aligns the homepage, OpenAPI description, skill.md, llms.txt, llms-full.txt, agent card, plugin manifest, ERC-8004 metadata, and x402 discovery copy around decision / recommended_policy first and the 0-100 score second
- docs/PRODUCT_WEDGE_MEMO.md and docs/llm_discoverability_synthesis.md now point readers at the newer direction update so older strategy notes do not compete with the current call
- the new memo captures a durable market test: Augur needs to be a service agents are rational to call instead of computing around, so roadmap work should favor speed, reliability, clearer policy output, and maintained judgment over generic breadth
- the strategy docs now make the next narrow extension more explicit: destination-aware preflight for actions like deposit, approve, route, or pay can fit the wedge when it validates claimed protocol + chain + recipient consistency, but Augur should still avoid drifting into a generic phishing browser, wallet shield, or broad anti-scam suite
- docs/SELLING_TO_AGENTS_MEMO.md now also captures the article's concrete trust checklist: publish uptime history, latency percentiles, and accuracy evidence, expose provenance where useful, and return confidence metadata when uncertainty is real
- docs/X402_ECOSYSTEM_SUBMISSION.md and docs/submissions/x402-ecosystem/metadata.json now describe Augur as admission control rather than primarily as risk scoring
- docs/outreach.md now uses admission-control language for reusable post angles instead of calling Augur a generic risk screen
- verification for this wording pass: python -m pytest tests/test_app.py -q passed at 137 passed
- not yet committed or deployed
The previous committed/deployed baseline was the first Etherscan V2 pass:
- commit: a0547a4 (Move deployer reputation to Etherscan V2)
- docs-only follow-up commit: aef6f28 (Refresh handover after reputation deploy)
Landed the Blockscout follow-up:
- commit: 936f00b (Switch deployer reputation to Blockscout)
- src/risk_api/analysis/reputation.py now uses public Base Blockscout endpoints for creator lookup and transaction-history probes
- get_tx_count() no longer depends on explorer proxy RPC; it now uses a cheap tx-history probe that is exact for low-count wallets and clamped at the threshold for busy wallets
- missing explorer keys no longer disable deployer reputation; BLOCKSCOUT_API_KEY is optional and only used for higher-rate access
- src/risk_api/config.py now prefers BLOCKSCOUT_API_KEY and only falls back to legacy ETHERSCAN_API_KEY / BASESCAN_API_KEY
- public wording in src/risk_api/app.py is now generic explorer-backed copy instead of vendor-specific copy
- local verification passed:
  - python -m pytest -q
  - python auto/loop.py --allow-failures
  - real no-key smoke check: creator lookup, first-tx lookup, and low-tx probe all worked against Base Blockscout for a live Base contract
- pushed to origin/master
- flyctl deploy --remote-only succeeded for augurrisk
- live checks passed for https://augurrisk.com/health, https://augurrisk.com/openapi.json, https://augurrisk.com/, and https://augurrisk.com/deployer-reputation-api
Re-ran the hidden discovery loop from the current baseline on 2026-03-18:
- python auto/loop.py
- result stayed green at 32/32
- no blind spots, holdout disagreements, policy regressions, or serializer/doc drifts surfaced
- next action remains: add new hidden candidate or holdout cases before changing implementation again
Landed the next hidden-batch follow-up on 2026-03-18:
- commit: c92d499 (Cover more limit-control selector aliases)
- new local hidden candidates surfaced a fresh alias gap where setMaxWalletAmount(uint256), setMaxHoldAmount(uint256), and setMaxTransferAmount(uint256) still returned clean allow
- src/risk_api/analysis/selectors.py now treats those selectors as the same fee/limit manipulation family
- expanded tracked regressions in tests/test_patterns.py, tests/test_scoring.py, and tests/test_engine.py
- local hidden corpus is now green again at 35/35
- full local test suite passed at 333 passed
- pushed to origin/master
- flyctl deploy --remote-only succeeded for augurrisk
- live checks passed for https://augurrisk.com/health, https://augurrisk.com/openapi.json, and https://augurrisk.com/
Landed the next hidden-batch follow-up on 2026-03-18:
- commit: 23c1657 (Warn on trading toggle selector aliases)
- new local hidden candidates surfaced a fresh suspicious-selector gap where setTradingEnabled(bool) and enableTrading() still returned clean allow
- src/risk_api/analysis/selectors.py now routes those selectors through the existing suspicious-selector warning path alongside other admin trading toggles
- expanded tracked regressions in tests/test_selectors.py, tests/test_scoring.py, tests/test_engine.py, and tests/test_app.py
- local hidden corpus is now green again at 37/37
- full local test suite passed at 336 passed
- pushed to origin/master
- flyctl deploy --remote-only succeeded for augurrisk
- live checks passed for https://augurrisk.com/health, https://augurrisk.com/openapi.json, and https://augurrisk.com/
Landed the next hidden-batch follow-up on 2026-03-18:
- commit: 3cefda6 (Warn on fee bypass selector aliases)
- new local hidden candidates surfaced a fresh suspicious-selector gap where excludeFromFees(address,bool) and setIsExcludedFromFee(address,bool) still returned clean allow
- src/risk_api/analysis/selectors.py now routes those selectors through the existing suspicious-selector warning path alongside other selective fee-bypass controls
- expanded tracked regressions in tests/test_selectors.py, tests/test_scoring.py, tests/test_engine.py, and tests/test_app.py
- local hidden corpus is now green again at 39/39
- full local test suite passed at 339 passed
- pushed to origin/master
- flyctl deploy --remote-only succeeded for augurrisk
- live checks passed for https://augurrisk.com/health, https://augurrisk.com/openapi.json, and https://augurrisk.com/
Landed the next hidden-batch follow-up on 2026-03-18:
- commit: 1da269f (Warn on whitelist and cooldown toggle aliases)
- new local hidden candidates surfaced a fresh suspicious-selector gap where setWhitelistEnabled(bool), setTxCooldownEnabled(bool), and setCooldownEnabled(bool) still returned clean allow
- src/risk_api/analysis/selectors.py now routes those selectors through the existing suspicious-selector warning path alongside other admin trading controls
- expanded tracked regressions in tests/test_selectors.py, tests/test_scoring.py, tests/test_engine.py, and tests/test_app.py
- local hidden corpus is now green again at 42/42
- full local test suite passed at 342 passed
- pushed to origin/master
- flyctl deploy --remote-only succeeded for augurrisk
Landed an operational follow-up on 2026-03-18:
- commit: be812ee (Keep Fly machine running for production)
- root cause observed after deploy: the only Fly machine fell into stopped, and proxy wake-up attempts hit machines API returned an error: rate limit exceeded
- immediate recovery was flyctl machine start 78469e3f419218 --app augurrisk
- permanent repo-side fix: fly.toml now sets auto_stop_machines = 'off' so the single production machine stays up instead of relying on the flaky auto-start path
- after redeploy, flyctl status --app augurrisk stayed started with checks passing, and public health, openapi.json, and homepage checks returned 200
Added a strategy memo that locks the current wedge:
- docs/PRODUCT_WEDGE_MEMO.md
- frames Augur as Base contract admission control for agents
- keeps the product narrow rather than broadening into a full execution-security platform
Updated the public copy in README.md, homepage/skill.md/llms.txt/llms-full.txt generation in src/risk_api/app.py, and the growth-plan pointer so the same wedge appears across repo docs and public machine-readable surfaces.
Tightened the public wording pass across the main docs and discovery surfaces:
- standardizes the public headline around Deterministic Base contract risk screening for agents
- standardizes the short explainer around Screen Base contracts before your agent buys, routes funds, approves, or interacts
- adds compact use-case education on the homepage and README so the product need is clearer at a glance
Did a second public-copy pass after review findings:
- homepage setup language is plainer
- use-case pages no longer use Buyer Intent framing
- llms-full.txt now describes Augur as a paid HTTP API instead of an agent-to-agent API
- examples/javascript/augur-mcp/README.md is more customer-facing
Patched the in-repo MCP wrapper so startup and tool discovery no longer hard-fail when CLIENT_PRIVATE_KEY is unset:
- examples/javascript/augur-mcp/index.mjs now requires the key only when analyze_base_contract_risk is actually called
- npm run smoke now passes locally on the read-only path without a wallet key
- examples/javascript/augur-mcp/README.md now documents the split between read-only startup and paid analyze calls
Ran a 12-chat ChatGPT discoverability check for Augur and distilled the results into:
- docs/llm_discoverability_synthesis.md
- docs/llm_discoverability_runs_filled.csv
- docs/llm_discoverability_summary_filled.csv
Moved the raw LLM transcript dumps out of docs/ and into the local archive:
- .codex/research.local/llm-discoverability/
Shipped the live proof-of-work report:
- https://augurrisk.com/reports/base-bluechip-bytecode-snapshot
Added a report-specific Open Graph card for the proof page:
- image route: https://augurrisk.com/og/base-bluechip-bytecode-snapshot.png
- report pages now use that asset instead of the generic /avatar.png
The proof report now:
- uses the live /analyze response shape in its embedded snapshot JSON
- includes nested implementation output for proxy examples
- clearly labels the JSON as a dated snapshot, not a live rerun
Added registry-backed report routing in src/risk_api/app.py via /reports/<slug>.
Added a public MCP discovery/install surface:
- live page: https://augurrisk.com/mcp
- linked from the homepage, llms.txt, and llms-full.txt
Added a root agent-facing skill document:
- live doc: https://augurrisk.com/skill.md
- linked from the homepage, sitemap, robots, llms.txt, and llms-full.txt
Tightened the homepage visual hierarchy:
- added a stronger brand lockup, hero stats, and denser section intros
- kept the same public routes and machine-readable entrypoints
Clarified homepage wording around capability vs entry pages:
- renamed the misleading "Use Augur For" block to "Public Entry Pages"
- explicitly states that those pages are task-specific fronts for the same full 8-detector /analyze pass
Brought /mcp into the same visual system as the homepage without adding human-first promo sections:
- keeps the page focused on local stdio setup, client-side x402, and canonical machine docs
Deployed the latest public-surface pass to Fly from master:
- live commit: 572b206
- verified live https://augurrisk.com/, https://augurrisk.com/skill.md, and https://augurrisk.com/mcp
Packaged and published the MCP wrapper as augurrisk-mcp:
- npm: https://www.npmjs.com/package/augurrisk-mcp
- current version: 1.0.1
- public install path: npx -y augurrisk-mcp
Updated the homepage, MCP page, README.md, llms.txt, and llms-full.txt to surface the MCP package directly.
Recorded the first Coinbase x402 Discord post in docs/outreach.md.
Added OpenClaw (r/OpenClaw / OpenClaw Discord) as a secondary outreach target; avoid treating the AI-only OpenClaw forum as the primary posting surface.
Re-verified Coinbase discovery surfaces: x402.org/ecosystem now lists Augur, while the CDP Bazaar feed still does not reliably show an Augur match in public queries.
Verified the live deploys on augurrisk.com.
Deployed the latest copy pass to Fly from the local worktree and verified live:
- homepage hero and Explore by Use Case
- https://augurrisk.com/skill.md
- https://augurrisk.com/llms-full.txt
- https://augurrisk.com/honeypot-detection-api
Added a real first-pass policy layer to the live /analyze response:
- new top-level fields: decision and recommended_policy
- default mapping is now rule-based rather than score-band-only:
  - allow only for clean safe results with no reason codes
  - warn for low results and safe results that still carry non-blocking signals
  - manual_review for medium, unresolved proxy logic, raw DELEGATECALL, or SELFDESTRUCT
  - block for high / critical or honeypot signals
- recommended_policy now returns action, summary, and stable reason_codes
Updated all machine-readable surfaces and examples to reflect the real policy output:
- OpenAPI examples and AnalysisResult schema in src/risk_api/app.py
- x402 Bazaar discovery examples
- README.md, skill.md, llms.txt, and llms-full.txt
- proof report snapshots in src/risk_api/proof_reports.py
Added coverage for the policy layer:
- new unit tests in tests/test_policy.py
- engine and app tests now verify decision / recommended_policy
Tightened the first-pass policy and example contract after review:
- raw singleton delegatecall now forces at least manual_review via a policy override instead of slipping through as allow
- proxy handling now carries structured resolution status (resolved, unresolved, fetch_failed, nested_proxy) from the engine into policy/reason-code derivation
- OpenAPI examples, machine docs, and proof-report snapshot JSON now round-trip through the live serializer so implementation omission and nested implementation shapes stay aligned
- OpenAPI now publishes a PolicyReasonCode enum for recommended_policy.reason_codes
Put the new autoresearch harness to work with hidden local corpora:
- added local ignored files auto/corpus/holdout.local.json and auto/candidates/discovered-2026-03-16.local.json
- first run surfaced four real policy blind spots: hidden_mint_permissive_policy, honeypot_permissive_policy, selfdestruct_warn_regression, and fee_manipulation_safe_allow
- after tightening derive_policy(), python auto/bench.py --json-out auto/runs/latest.json is green again with those local holdouts loaded
Ran the next hidden holdout discovery batch locally:
- found a new selector gap where pause() silently returned allow because it never reached a detector or policy signal
- moved pause() onto the existing suspicious-selector path so it now warns with suspicious_selector_signal instead of passing clean
- expanded the private local corpora with fresh pause(), reentrancy, and proxy fetch_failed holdouts/candidates
- python auto/loop.py --allow-failures is green again at 26/26 checks
Ran a second hidden holdout pass after deploying fccbbb0:
- found a follow-on selector gap where raw blacklist(address) / addToBlacklist(address) selectors still returned allow if no transfer path was visible
- kept full honeypot blocking unchanged when transfer selectors are present, but now routes orphan blacklist controls through the existing suspicious-selector warning path
- expanded the private local corpora again with blacklist-without-transfer holdouts/candidates
- python auto/loop.py --allow-failures is green again at 28/28 checks
Committed, pushed, and deployed two more hidden-holdout batches:
- commit: fccbbb0 (Warn on pause selectors in autoresearch batch)
- change: pause() now warns via suspicious_selector_signal instead of silently allowing
- commit: a3fb26d (Warn on orphan blacklist selectors)
- change: orphan blacklist(address) / addToBlacklist(address) selectors now warn via the suspicious-selector path when no concrete detector surfaces them
- both commits were pushed to origin/master
- both flyctl deploy --remote-only runs succeeded for augurrisk
- live checks passed for https://augurrisk.com/health and https://augurrisk.com/openapi.json
Ran a third hidden holdout pass after deploying a3fb26d:
- found a fee/limit alias gap where raw setMaxSellAmount(uint256) and setWalletLimit(uint256) selectors still returned clean allow
- added those aliases to the fee-manipulation family and shared the label matcher between detector surfacing and orphan-selector suppression so they warn at score 15 without extra suspicious-selector points
- expanded the private local candidate corpus with the new limit-control cases
- python auto/loop.py --allow-failures is green again at 30/30 checks
Committed, pushed, and deployed the third hidden-holdout batch:
- commit: 71a394c (Warn on fee-limit selector aliases)
- change: setMaxSellAmount(uint256) and setWalletLimit(uint256) now surface as fee_manipulation rather than silently allowing
- pushed to origin/master
- flyctl deploy --remote-only succeeded for augurrisk
- live checks passed for https://augurrisk.com/health and https://augurrisk.com/openapi.json
Ran a fourth hidden holdout pass after deploying 71a394c:
- found a follow-on transaction-limit alias gap where raw setMaxBuyAmount(uint256), setTxLimit(uint256), and setMaxTxnAmount(uint256) selectors still returned clean allow
- extended the shared fee/limit alias family so these common anti-whale selectors now surface as fee_manipulation
- expanded the private local candidate corpus with the new transaction-limit cases
- python auto/loop.py --allow-failures is green again at 32/32 checks
Committed, pushed, and deployed the fourth hidden-holdout batch:
- commit: 09a75f6 (Warn on tx-limit selector aliases)
- change: setMaxBuyAmount(uint256), setTxLimit(uint256), and setMaxTxnAmount(uint256) now warn through the fee-manipulation path instead of silently allowing
- pushed to origin/master
- flyctl deploy --remote-only succeeded for augurrisk
- live checks passed for https://augurrisk.com/health and https://augurrisk.com/openapi.json
Tightened first-pass policy precedence from the autoresearch findings:
- hidden_mint and honeypot now block even when the numeric score is only low
- SELFDESTRUCT now forces at least manual_review even when the numeric score is only low
- safe results with residual non-blocking reason codes now warn instead of auto-allowing
- added targeted regressions in tests/test_policy.py, tests/test_engine.py, and tests/test_app.py
Added a bounded local autoresearch harness for detector and API-contract regressions:
- entrypoint: python auto/bench.py
- latest known-good run: python auto/bench.py --json-out auto/runs/latest.json
- tracked starter corpus: auto/corpus/public_cases.json
- agent prompt: auto/program.md
- reusable benchmark logic: src/risk_api/auto_bench.py
- local holdouts and candidate discoveries live in ignored *.local.json files under auto/corpus/ and auto/candidates/
- built-in checks cover policy regressions, serializer/doc drift, OpenAPI examples, machine docs, and proof-report shape
Closed a proof-report semantic drift gap after independent review:
- src/risk_api/proof_reports.py now aligns the embedded WETH and USDC snapshot policy output with current derive_policy() semantics
- src/risk_api/auto_bench.py now fails if any proof-report snapshot embeds stale decision / recommended_policy values relative to current policy logic
- tests/test_app.py now asserts proof-report snapshots still match live policy semantics
- tests/test_auto_bench.py now proves the bench catches stale proof-report policy examples
Wired the tracked public autoresearch corpus into GitHub Actions:
- .github/workflows/typecheck.yml now runs python auto/bench.py auto/corpus/public_cases.json in CI
- local verification passed with 11/11 public checks green
Added a thin autoresearch loop runner for day-to-day use:
- new wrapper: python auto/loop.py
- implementation: src/risk_api/auto_loop.py
- writes auto/runs/latest.json by default and prints a compact failure summary grouped by blind spot
- supports --allow-failures, --skip-app-contract-checks, optional custom case paths, and optional --json-out
- covered by tests/test_auto_loop.py
Split resolved-proxy eth_getCode == 0x from transport failures:
- ProxyResolutionStatus now includes no_code
- PolicyReasonCode now includes proxy_logic_no_code
- resolved implementation addresses with no deployed bytecode still map to manual_review, but no longer collapse into fetch_failed
- engine/app/policy tests now cover the Proxy implementation has no bytecode path
Promoted the four highest-signal local policy blind spots into the tracked public corpus:
- promoted to auto/corpus/public_cases.json: hidden mint -> block, honeypot -> block, selfdestruct -> manual_review, fee manipulation -> warn
- intentionally left the low-score resolved-proxy warn case in local holdouts because the tracked corpus already covers unresolved and nested proxy semantics and should stay compact
Committed, pushed, and deployed the autoresearch/policy-hardening batch:
- commit: 71ba517 (Add autoresearch harness and tighten policy semantics)
- pushed to origin/master
- flyctl deploy --remote-only succeeded for augurrisk
Committed and deployed the policy-output pass:
- commit: 9cf5e0f (Add first-pass policy decisions to analyze)
- verified live https://augurrisk.com/skill.md
- verified live https://augurrisk.com/llms-full.txt
- verified live https://augurrisk.com/openapi.json
- verified live 402 discovery output from GET /analyze
Checked the live dashboard/stats surfaces after deployment:
- /stats and /dashboard are still instance-local operational views, not canonical analytics
- the most recent visible 402 row can be polluted by our own verification probes
- curl/... user agents are a useful intentional CLI/script signal, but they do not prove a human was manually at the keyboard

Current Read

Current product-scope rule:
- keep all 8 existing detectors inside the same narrow admission-gate product
- do not narrow scope by removing detectors like honeypot, proxy, or deployer reputation
- do not broaden scope into simulation, generalized runtime monitoring, or wallet/session protection
Current wording rule:
- keep Base contract admission control for agents as internal strategy language
- prefer clearer public phrasing such as Deterministic Base contract risk screening for agents
- use straightforward user-facing copy like Screen Base contracts before your agent buys, routes funds, approves, or interacts
ChatGPT discoverability is currently weak:
- Augur did not appear unprompted in the 12 blind runs
- after direct comparison, the model consistently classifies Augur as a serious but narrow Base-only deterministic prefilter
- repeated perceived gap is transaction simulation plus broader runtime/interactions coverage
Treat the LLM result as a distribution/messaging signal first, not as proof that Augur should pivot into a full execution-security platform.
Follow-up review of the LLM research sharpened the interpretation:
- the problem is partly entity resolution (Augur often resolves to unrelated products) as well as generic discoverability
- at least a couple of blind runs were methodologically contaminated or ambiguous, so the 0/12 headline is directionally useful but not a clean benchmark
- stronger strategic takeaway is still category ownership and retrievability for a narrow wedge, not feature expansion toward simulation
MCP wrapper behavior is now cleaner for demos and onboarding:
- startup/read-only introspection works without CLIENT_PRIVATE_KEY
- paid analyze calls still require the key at tool invocation time
Public-facing product/discovery surface is now in good shape for promotion:
- root skill doc is live
- homepage wording no longer confuses public entry pages with full detector coverage
- proof page is live
- report OG card is fixed
- payment explainer is live
- MCP setup page is live
- npm MCP package is live
- buyer-intent pages are live
Current positioning rule: Augur stays agent-first. Prefer machine-readable docs, direct integration paths, and MCP/x402 clarity over social-proof or human-marketing sections.
Current messaging rule: keep one plain public headline plus one plain trigger-moment sentence across homepage, README, machine docs, and registration metadata; add brief use-case examples where they clarify why an agent would call Augur.
Current discovery/docs rule:
- /skill.md is the shortest agent quickstart/discovery doc, not a separate product
- keep core machine surfaces (/skill.md, OpenAPI, llms*.txt, .well-known/*, MCP page) unless there is a clear reason to retire one
- use-case pages are optional support surfaces; keep them only if they improve clarity or qualified traffic
Current product-output rule:
- Augur now returns explicit first-pass policy outputs: decision and recommended_policy
- recommended_policy currently includes action, summary, and reason_codes
- allow should be reserved for clean safe outputs with no reason codes
- honeypot should still block even at low
- high-score managed upgradeable assets with mint/admin-control surfaces but no clearer hard-stop signal should default to manual_review, not auto-block
- hidden_mint should now force at least manual_review, not automatic block, when it is the main signal
- raw non-proxy delegatecall and SELFDESTRUCT should never auto-allow or stay at plain warn just because the numeric score is low
- unresolved proxy logic should be carried as structured engine state and stable reason codes, not inferred from human-readable finding titles
- treat fetch_failed (RPC/lookup failure) separately from no_code (implementation address resolved but has no deployed bytecode)
- machine-facing examples should be produced through the same serializer as the live /analyze route
- this is a default first-pass recommendation layer, not a replacement for caller-specific policy logic
Current detector-research rule:
- use auto/bench.py as the bounded local harness for adversarial bytecode, policy edge cases, and API-contract drift
- run hidden discovery batches serially; let each batch land, deploy, and become the new baseline before starting the next one
- prefer adding a reproducible case before changing implementation
- keep local holdout corpora untracked so the loop cannot merely overfit the visible tracked corpus
- current tracked corpus is intentionally small; the next useful work is adding real hidden holdout cases under auto/corpus/*.local.json
- keep fee/limit selector alias matching shared between detector surfacing and orphan-selector filtering so limit controls warn at 15 instead of double-counting as suspicious_selector
- keep transaction-limit aliases like setMaxBuyAmount, setTxLimit, and setMaxTxnAmount in that same shared fee/limit family, along with broader limit-control aliases like setMaxWalletAmount, setMaxHoldAmount, and setMaxTransferAmount
- keep trading-gate aliases like setTradingEnabled(bool) and enableTrading() on the suspicious-selector warning path alongside other admin toggles like setSwapEnabled(bool)
- keep selective fee-bypass aliases like excludeFromFees(address,bool) and setIsExcludedFromFee(address,bool) on the suspicious-selector warning path alongside excludeFromFee(address)
- keep whitelist/cooldown toggles like setWhitelistEnabled(bool), setTxCooldownEnabled(bool), and setCooldownEnabled(bool) on that same suspicious-selector warning path when they surface owner-controlled trading restrictions
- keep pause() on the suspicious-selector path for now; it should warn instead of silently allowing, but it does not yet justify a dedicated public detector or automatic block
- if a known malicious selector is present but no concrete detector surfaces it, prefer warning through the suspicious-selector path over silently allowing it
- proof-report snapshots are allowed to stay dated, but their embedded decision / recommended_policy should still agree with current policy semantics unless you intentionally choose to preserve a historical policy layer and update the drift checks accordingly
Current detector weakness read:
- observed hidden-batch misses have been concentrated in fee_manipulation alias coverage and suspicious_selector fallback coverage, not repeated core honeypot misses
- the generic honeypot control-flow heuristic was producing obvious false positives on standard dispatcher/default-REVERT patterns in blue-chip contracts, so the detector is now intentionally narrowed to blacklist-style transfer-control signals until a stronger transfer-path heuristic exists
- reentrancy is also structurally narrow (CALL then nearby SSTORE) and should be treated as heuristic coverage, not deep semantic analysis
- deployer_reputation is the weakest detector operationally because it depends on explorer APIs; failures can erase signal even when bytecode analysis is healthy
- after the recent selector/proxy-wrapper/metadata passes, the next hidden-batch marginal value is no longer in more alias churn; it is in under-covered families like deployer_reputation, proxy no_code vs fetch_failed, and reentrancy edge cases
Current deployer-reputation fix read:
- the recommended path is now the landed Blockscout implementation:
  - Base Blockscout public endpoints returned real creator lookup and tx-history data in local smoke tests with no key
  - the detector still preserves the repo rule that external API failure stays distinct from true NOT_FOUND
  - request throttling/retry remains in place for explorer-side failures
  - BLOCKSCOUT_API_KEY is optional for higher limits; ETHERSCAN_API_KEY / BASESCAN_API_KEY are legacy fallbacks only
  - product call for now: keep deployer_reputation in the detector set, but treat it as supporting context rather than a pillar detector
  - practical next step is optional, not blocking: verify one real paid /analyze flow if you want end-to-end proof that deployer-reputation is now showing up again on production traffic
- the Etherscan result is now background context only:
  - current local key against Etherscan V2 on Base returns Free API access is not supported for this chain...
  - current BaseScan V1 path also returns a deprecation error
- optional later improvement: evaluate a richer wallet-provenance signal only if real users prove deployer reputation matters enough to justify more dependency or spend
Current analytics read:
- durable production read for 2026-03-16 through 2026-03-26 UTC:
  - 3563 total logged events
  - 95 /analyze requests
  - 7 paid /analyze successes
  - no app-level 500 or 502 rows in the analytics DB for that window
- the only confirmed Fly OOM in that review window was 2026-03-26 07:21:00 UTC
  - it caused one proxy-side dropped request and a worker restart about a second later
  - it did not overlap with the paid /analyze burst on 2026-03-25
- the important production concern is result quality, not sustained downtime
  - real paid testers checked Base WETH, AERO, and 0x1f98...F984
  - current local analysis still blocks those contracts
- keep using the Fly volume DB plus Fly logs together for traffic forensics
  - /stats and /dashboard stay useful hints, but not the source of truth
  - treat curl/... and similar agents as intent signals, not proof of a human at the keyboard
coinbase/x402 PR #1515 is merged into main.
coinbase/x402 follow-up PR #1869 delivered the wording refresh that is now visible on x402.org/ecosystem.
Current execution priority:
- first: keep the new local hidden batch in place and only change implementation if future python auto/loop.py runs expose a real disagreement in those analysis holdouts
- second: return to the Coinbase public discovery feed check or paid /analyze smoke evidence
- third: keep x402list.fun treated as external stale state unless the directory itself updates
OpenClaw looks relevant for agent-builder reach, but it should stay behind Base/x402-first distribution.
Treat x402.org/ecosystem and the CDP discovery/resources feed as separate surfaces; being live on the former does not imply the latter is queryable.
Existing upstream follow-up:
- determine whether Augur eventually appears in the CDP public discovery feed or whether Coinbase support clarification is needed
Separate local side-task status:
- QMD vault retrieval on this laptop is usable now
- default strong mode on the 8 GB Intel iGPU machine is structured hybrid lex+vec, not blind reliance on plain auto-expanded qmd query "..."
- the failure mode observed was intermittent Vulkan GPU out-of-memory on the heaviest local query path, not a broken QMD index
- C:\Users\justi\Obsidian Vault\Outputs\2026-03-08-qmd-reference.md was updated with current QMD 2.0.1 status, retrieval workflow, and CPU fallback guidance
- local vault-synth now exists at C:\Users\justi\dev\vault-synth
- it retrieves notes with QMD, synthesizes with OpenAI, prints answer plus sources, and only saves when --save is passed
- current Windows implementation uses fused qmd search + qmd vsearch for the default lex+vec path because multiline structured qmd query arguments were brittle through qmd.cmd
- it can fall back to C:\Users\justi\dev\risk-api\.env for OPENAI_API_KEY if no local vault-synth\.env exists
- vault-synth now auto-runs qmd --index vault-core update before retrieval by default; use --no-refresh only when you explicitly want speed over freshness
- this fixed a real stale-index mismatch where QMD served an older QMD reference note than the file on disk
- vault-synth now excludes its own saved notes from default retrieval so synthesis output does not become a self-referential source on later runs

Recommended Next Steps

Autoresearch Todo

Objective 1: put the public autoresearch bench in CI with python auto/bench.py auto/corpus/public_cases.json
Objective 2: add a thin auto/loop.py runner that writes auto/runs/latest.json and prints a compact failure summary
Objective 3: decide whether proxy slot resolved + implementation bytecode = 0x should stay fetch_failed or get its own proxy-resolution status
Objective 4: review local holdout/candidate cases and promote only durable representative regressions into auto/corpus/public_cases.json

Add the next hidden local cases under auto/corpus/*.local.json or auto/candidates/*.local.json, targeting:
- deployer_reputation NOT_FOUND vs explorer ERROR behavior
- proxy no_code vs fetch_failed semantics
- one or two reentrancy edge cases
Run python auto/loop.py and only change implementation if those new cases fail reproducibly.
Re-check the Coinbase/CDP public discovery feed with the refreshed evidence set:
- real paid production smoke succeeded on 2026-04-03
- real paid production smoke also succeeded on 2026-04-06 on the live action-aware approve request shape
- public feed recheck on 2026-04-03 local time still returned NOT_FOUND over the first 5 pages / 500 items
- if it is still absent after the newer smoke evidence, treat that as the support-escalation packet
Keep x402list.fun classified as stale external state unless the directory itself updates.
If you want stronger evidence before support escalation, do a broader but rate-limited feed scan beyond the first 500 items.
Work through the 2026-03-11 outreach queue in docs/outreach.md, with OpenClaw after the tighter Base/x402 targets.
Revise the LLM discoverability artifacts on the next pass:
- separate clean runs from contaminated runs
- capture entity-resolution failures explicitly
- fill missing rank/provenance fields in the filled CSV
Use the LLM memo to tighten both category wording and entity disambiguation around Augur Risk, augurrisk.com, and Base-first deterministic contract gating before broader promotion.
Do one real paid end-to-end MCP test with a wallet configured before any broader MCP push or npm patch release.
Watch:

proof_report_view
top_referers
/how-payment-works visits
unpaid 402 attempts
paid requests

If CDP support is not contacted yet, use the 2026-04-03 plus 2026-04-06 paid-smoke successes together with the current NOT_FOUND public-feed scan as the escalation packet.
Only build more proof/demo surfaces if distribution shows confusion or weak conversion.
If more public-page polish happens, keep checking that /skill.md, OpenAPI, and the paid /analyze path remain the dominant integration cues above the fold.
Use real paid-call observations, not only /stats, to decide whether the current allow / warn / manual_review / block mapping matches actual evaluator behavior.
The next highest-value move is distribution, not more product surface:

use the live first-party approve example in outreach and demos
watch whether it produces qualified action-aware 402 attempts, paid calls, or direct user questions
only reopen A-003 or a second action if that usage evidence justifies it

In the next session, start a fresh hidden holdout discovery batch:

use python auto/loop.py as the default runner
run one batch at a time; do not queue multiple hidden discovery batches before you know what the previous one changed
add the next batch of real hidden holdouts under auto/corpus/*.local.json or auto/candidates/*.local.json
the most recent local additions already cover dispatcher/default-REVERT, mint-capability-only manual_review, clone-wrapper, and Solidity-metadata behavior
the current local hidden batch now also covers analysis-path deployer NOT_FOUND, fresh/low-tx, partial explorer failures, proxy NO_CODE, proxy FETCH_FAILED, and a reentrancy lookahead boundary
next target families should now move past that set unless a new real failure points back there
prioritize unseen detector/policy edge cases over widening the tracked public corpus immediately
only promote a new case into auto/corpus/public_cases.json if it is durable and representative

Automation follow-up:

keep serial hidden-batch runs manual for now while the fixes are still shaping the research workflow
later, build a guarded local orchestrator that can run N serial batches end-to-end: hidden batch -> validation -> commit -> push -> deploy -> live verify -> next batch
first version should stay constrained to narrow selector/policy research surfaces and fail closed on ambiguous results

In the next session, tune C:\Users\justi\dev\vault-synth retrieval quality:

compare fused search + vsearch against plain qmd query on questions that should hit outputs/
decide whether the lexical branch should stay acronym-first, use a broader distilled keyword query, or use collection-aware hints
if vault-synth becomes a regular tool, add its own local .env or move OPENAI_API_KEY to a user-level secret store instead of relying on the risk-api fallback

Tomorrow Start Here

Confirm the deployed app is still healthy and the repo still matches the current code baseline:
- https://augurrisk.com/health
- https://augurrisk.com/openapi.json
- live deployed state now includes the first-party approve example docs follow-up on machine version 104
- if the next flyctl deploy --remote-only times out during health polling again, check flyctl status --app augurrisk and the live public routes immediately before assuming the deploy failed; on 2026-04-06 the deploy still landed and the machine recovered to healthy at version 103, while the later docs-only follow-up deployed cleanly to version 104
The current narrow approve refinement is already landed on production:
- optional APPROVE_SPENDER_ALLOWLIST support is live
- action-aware request observability is live in /stats via action_spender_trust and action_decision
- a real paid action-aware approve smoke succeeded on 2026-04-06
- the first-party docs now also show one exact approve request/response example on /, skill.md, llms.txt, and llms-full.txt
- next product decision is whether live evidence justifies adding an explicit public spender-trust response field (A-003)
The paid-result problem and the wording/deploy work are already landed:
- Base WETH (0x4200000000000000000000000000000000000006) now returns allow locally
- AERO (0x940181a94A35A4569E4529A3CDfB74e38FD98631) now returns manual_review locally
- 0x1f9840a85d5aF5bf1D1762F925BDADdC4201F984 now returns allow locally
- internal live routes now reflect the admission-control wording
Treat the March 26 OOM as secondary unless it repeats:
- it caused one brief dropped request during crawler traffic on /
- it did not overlap with the paid /analyze burst
- if it happens again, consider a memory bump or more direct memory profiling
The latest hidden discovery rerun is already green:
- python auto/loop.py passed at 59/59 on 2026-03-30
- python -m pytest -q passed at 363
- do not start a new hidden batch until you first add a new local candidate or holdout
The endpoint method-contract follow-up is already landed locally:
- /analyze now leaves OPTIONS and unsupported methods to Flask instead of masking them with 422
- POST 422 OpenAPI examples now explicitly cover conflicting query/body addresses plus malformed and non-object JSON bodies
- python -m pytest tests/test_app.py -q passed at 151
- next step is the refreshed Coinbase/CDP feed recheck or support escalation, then x402list.fun
After that, audit the live third-party surfaces instead of assuming the repo updates propagated:
- 8004scan
- x402.jobs
- MoltMart
- Work402 if applicable
- x402.org/ecosystem
- Coinbase public discovery feed
- x402list.fun as an external stale-state check
Runtime proof is now present from both 2026-04-03 and 2026-04-06:
- a real paid plain /analyze smoke succeeded from the Conway wallet to the live agent wallet
- a real paid action-aware approve smoke also succeeded on the live app
- use those successes when reasoning about CDP/discovery visibility versus app-route health
Keep the public copy generic (explorer-backed) unless there is a reason to advertise Blockscout specifically.
For the next research step, start a fresh hidden discovery probe only after adding a new local candidate or holdout:

use python auto/loop.py
keep batches serial
add a new hidden holdout/candidate before changing detector logic again
target deployer_reputation, proxy no_code, and reentrancy before spending another batch on selector aliases

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handover

Snapshot

What Changed

Current Read

Recommended Next Steps

Autoresearch Todo

Tomorrow Start Here

FilesExpand file tree

HANDOVER.md

Latest commit

History

HANDOVER.md

File metadata and controls

Handover

Snapshot

What Changed

Current Read

Recommended Next Steps

Autoresearch Todo

Tomorrow Start Here