Skip to content

Latest commit

 

History

History
964 lines (958 loc) · 81 KB

File metadata and controls

964 lines (958 loc) · 81 KB

Handover

Snapshot

  • Date: 2026-04-10
  • Repo root: C:\Users\justi\dev\risk-api
  • Branch: master
  • Repo baseline: d6e11f8 (Update handover after dashboard proof deploy; handover-only)
  • Repo app-code baseline: 0ab058c (Update handover for dashboard proof follow-up; app code from c97098c)
  • Deployed app baseline: 0ab058c (Update handover for dashboard proof follow-up; app code from c97098c)
  • Status: deployed production is green on 0ab058c. The live app is healthy, Fly is on machine version 110 with 1 passing health check, and the public first-party surfaces now include the dashboard Traffic Quality Classes panel, the /reports/base-weth-before-after proof artifact, the concrete action-aware approve example, the canonical first successful paid-call Base WETH path, and the Call before pay / approve / interact positioning copy across /, /skill.md, /llms.txt, /llms-full.txt, /how-payment-works, /.well-known/x402, OpenAPI examples, and the sitemap. Live /stats exposes traffic_classes for health checks, evaluator bots, malformed probes, unpaid conversion attempts, paid requests, and other traffic. A real paid production smoke on 2026-04-06 succeeded on the live action-aware approve request shape (402 -> PAYMENT-SIGNATURE -> 200) against Base WETH with:
    • top-level decision: allow
    • action-level decision: warn
    • action-level reason codes: action_approve_requested Live /stats now also shows the new request-log observability fields for that paid approve request:
    • action_spender_trust: unchecked
    • action_decision: warn The deploy quirk still matters operationally, but the 2026-04-10 flyctl deploy --remote-only --app augurrisk run from clean detached worktree C:\Users\justi\AppData\Local\Temp\risk-api-deploy-0ab058c completed cleanly and cleared its lease. Live verification immediately afterward returned /health 200, /dashboard containing Traffic Quality Classes, /reports/base-weth-before-after containing both before/after scores, homepage and llms.txt containing Call before pay. Call before approve. Call before interact., sitemap containing the new proof route, and /stats with populated traffic_classes. The main product question is now whether the clearer first-call path plus proof artifact reduces malformed /analyze probes and whether real unpaid conversion attempts turn into repeated paid calls before widening the action-aware API. External registry copy was intentionally left alone because the product positioning did not change. The only remaining local leftovers are unrelated autoresearch files (auto/README.md, src/risk_api/auto_bench.py, tests/test_auto_bench.py) plus scratch dirs/files such as .claude/, .codex/live_db/, .codex/research.local/, .codex/tmp/, and .playwright-mcp/.
  • Traffic read on 2026-04-09: pulled the live durable SQLite analytics store from Fly machine 287d341f3e0ed8 (/data/analytics.sqlite3 plus WAL state) and analyzed the last 10 local days (2026-03-31 through 2026-04-09 Central). Main judgment: the product direction is still correct, but conversion is weak. The traffic validates the existing agent-first admission-control wedge rather than suggesting a pivot. Key counts in that 10-day window:
    • 4,363 total requests
    • 132 /analyze requests
    • 110 422 /analyze responses
    • 17 unpaid 402 /analyze attempts
    • 2 paid /analyze requests
    • 1 paid action-aware approve request, which is most likely the known smoke path Strongest recurring actors were machine evaluators and ecosystem probers checking /.well-known/x402, /.well-known/agent-card.json, openapi.json, llms.txt, and then /analyze, not retail users. The March 29 false-positive fix clearly helped: paid Base WETH checks moved from score=25 / level=low before dee071e to score=0 / level=safe on 2026-04-04 and 2026-04-06. The April 3 method-contract fix is also visible in successful OPTIONS /analyze probes from ScoutScore-FidelityCheck/1.0. Product implication: no strategy change, but next work should prioritize first-call conversion on /analyze and analytics segmentation for evaluator traffic versus real demand before widening the action-aware API.

What Changed

  • Continued and deployed the conversion execution plan on 2026-04-10:
    • added a /dashboard Traffic Quality Classes section that renders traffic_classes for:
      • real unpaid conversion attempts
      • paid requests
      • malformed probes
      • known directory/evaluator bots
      • known health checks
      • other traffic
    • added a new registry-backed proof page at /reports/base-weth-before-after showing:
      • exact Base WETH request
      • before output: score=25, level=low, decision=block, reason_codes=["honeypot_signal"]
      • after output: score=0, level=safe, decision=allow, reason_codes=[]
    • added the proof link to the homepage Proof of Work section and sitemap through the existing REPORT_PAGES registry path
    • added the positioning copy Call before pay. Call before approve. Call before interact. Augur is deterministic preflight for Base contract actions. to first-party public/machine docs
    • local verification passed: python -m pytest -q -> 401 passed
    • deployment verification passed:
      • pushed c97098c and 0ab058c to origin/master
      • deployed from clean detached worktree C:\Users\justi\AppData\Local\Temp\risk-api-deploy-0ab058c
      • flyctl status --app augurrisk -> machine version 110, state started, 1 passing health check
      • https://augurrisk.com/health -> 200 {"status":"ok"}
      • live /dashboard, /reports/base-weth-before-after, homepage, llms.txt, sitemap, and /stats.traffic_classes verified
  • Started and deployed the conversion-focused follow-up on 2026-04-09:
    • tightened the canonical first successful paid-call path around Base WETH (GET /analyze?address=0x4200000000000000000000000000000000000006) across homepage copy, llms.txt, llms-full.txt, skill.md, /how-payment-works, /.well-known/x402, and OpenAPI examples
    • added first-class analytics traffic_class labels and /stats.traffic_classes counts for:
      • known_health_check
      • known_directory_evaluator_bot
      • malformed_probe
      • real_unpaid_conversion_attempt
      • paid_request
      • other_traffic
    • /health is now request-logged as funnel_stage=health_check / traffic_class=known_health_check, so total analytics volume will include health checks but the traffic-class breakdown separates them from conversion signals
    • local verification passed: python -m pytest -q -> 400 passed
    • deployment verification passed:
      • pushed commit e552f78 to origin/master
      • deployed from clean detached worktree C:\Users\justi\AppData\Local\Temp\risk-api-deploy-e552f78
      • flyctl status --app augurrisk -> machine version 106, state started, 1 passing health check
      • https://augurrisk.com/health -> 200 {"status":"ok"}
      • live llms.txt contains First Successful Paid Call and Action-Aware Example: Approve
      • live openapi.json exposes the canonical Base WETH address example and the first-call 402 description
      • live /stats exposes traffic_classes
  • Analyzed live production traffic on 2026-04-09 using the Fly volume-backed SQLite store:
    • source of truth:
      • copied /data/analytics.sqlite3 from Fly machine 287d341f3e0ed8
      • also pulled the SQLite -wal / -shm files so the latest rows were included
      • confirmed live /stats was ahead of the older local snapshot and matched the live durable store shape (storage_backend=sqlite, storage_durable=true)
    • 10-day window analyzed:
      • 2026-03-31 through 2026-04-09 local time (America/Chicago)
    • key findings:
      • product direction remains aligned with the repo's current positioning: agent-first deterministic admission control, not a retail destination screener
      • evaluator and crawler traffic dominate the machine-facing surfaces; repeated fetches of /.well-known/x402, /.well-known/agent-card.json, openapi.json, llms.txt, skill.md, and the intent pages look like ecosystem evaluation rather than end-user demand
      • /analyze conversion is still weak:
        • 132 total /analyze requests in the window
        • 110 422 responses
        • 17 unpaid 402 attempts
        • 2 paid requests
      • most recent /analyze failures are mixed probe traffic and brittle integrations, not one clean buyer cohort:
        • python-httpx/0.28.1 alone produced 52 invalid-address 422s
        • most 422s were missing_address
        • malformed examples like 0x4200000000000000000000000000000000000006/openapi.json show URL-construction bugs or crawler misuse
        • repeated node, python-httpx, Go-http-client, Satring-Scraper, ScoutScore-*, X402-HealthCheck, and Thinkbot traffic means raw request counts overstate real product demand
      • the strongest current hidden batch / product-use signal is infrastructure evaluation:
        • repeated paid and unpaid checks of canonical Base contracts, especially Base WETH
        • recurring evaluator names like ScoutScore-FidelityCheck/1.0, ScoutScore-HealthCheck/1.0, x402audit/1.0, and AgentScore-Enrichment/1.0
        • practical read: Augur is being tested as a machine trust primitive inside agent systems, routers, or listings, which is consistent with the existing product wedge
    • commit / product read:
      • dee071e (Fix false positives and align admission-control metadata) appears validated by live demand evidence:
        • paid Base WETH responses before the fix were still score=25 / level=low
        • paid Base WETH responses after the fix were score=0 / level=safe
      • 9e1d59f (Fix analyze method contract behavior) also appears validated:
        • successful OPTIONS /analyze rows from ScoutScore-FidelityCheck/1.0 now appear before unpaid probes, which is the intended compatibility path
      • 93ba6f0 / 1af2be0 action-aware approve work is live and observable, but still not externally validated by demand:
        • only 1 paid action-aware approve request is visible in the analyzed window
        • it used Base WETH plus spender 0x1111111111111111111111111111111111111111
        • action_decision=warn
        • action_spender_trust=unchecked
        • current read: this is most likely the production smoke, not broad market usage
    • recommended next move:
      • no pivot
      • make the first successful /analyze call path even more explicit on every machine-facing surface
      • segment evaluator / health-check / crawler traffic from true conversion in analytics before using request counts as traction evidence
      • keep action-aware scope narrow until repeated non-smoke paid usage appears
  • Added and deployed a concrete first-party action-aware approve example on 2026-04-06 / 2026-04-07 UTC:
    • intent:
      • make the current narrow action-aware product shape legible without widening the API
      • show exactly how top-level contract policy and action-level policy can differ on approve
      • keep the change on first-party surfaces instead of churning external registry copy again
    • updated local surfaces:
      • homepage (/)
      • skill.md
      • llms.txt
      • llms-full.txt
    • message shape:
      • exact GET /analyze?...&action=approve&spender=...&chain=base request example
      • example JSON with action_context and action_evaluation
      • explicit note that V1 currently supports only approve on Base
      • explicit note that spender trust remains unchecked when no allowlist is configured
    • external/discovery follow-up:
      • intentionally not yet applied to x402.jobs / MoltMart / Work402 / ERC-8004 / x402.org because this is supporting evidence for the current product message, not a new positioning change
    • local verification passed:
      • python -m pytest tests/test_app.py -k "llms_txt or llms_full_txt or skill_md or landing_documents_action_aware_approve_example or landing_links_llms_txt" -q -> 15 passed
      • python -m pytest tests/test_app.py -q -> 168 passed
    • deployment status:
      • deployed from a clean detached worktree so unrelated local auto_bench changes were not shipped
    • live verification after deploy:
      • flyctl status --app augurrisk -> machine version 104, state started, 1 passing health check
      • https://augurrisk.com/health returned {"status":"ok"}
      • homepage now contains Action-Aware Example: Approve
      • live skill.md, llms.txt, and llms-full.txt now all contain the concrete approve example and action_evaluation output
  • Deployed the narrow approve refinement set on 2026-04-06:
    • commit: 1af2be0 (Refine action-aware approve policy)
    • git push origin master succeeded
    • flyctl deploy --remote-only --app augurrisk timed out during health polling again, but the new image still landed and the app recovered to healthy on machine version 103
    • live verification after deploy:
      • flyctl status --app augurrisk -> machine version 103, state started, 1 passing health check
      • https://augurrisk.com/health returned {"status":"ok"}
      • https://augurrisk.com/openapi.json now includes action_approve_spender_allowlisted and action_approve_spender_not_allowlisted
    • real paid production smoke on the live action-aware request shape succeeded:
      • endpoint: https://augurrisk.com/analyze?address=0x4200000000000000000000000000000000000006&action=approve&spender=0x1111111111111111111111111111111111111111&chain=base
      • observed flow: 402 -> signed PAYMENT-SIGNATURE -> 200
      • live result:
        • decision: allow
        • action_evaluation.decision: warn
        • action_evaluation.recommended_policy.reason_codes: ["action_approve_requested"]
        • score: 0
        • level: safe
    • live observability proof:
      • /stats durable recent-entry view now shows the paid approve request with:
        • action_spender_trust: unchecked
        • action_decision: warn
        • funnel_stage: paid_request
  • Added action-aware request observability locally on 2026-04-06:
    • /analyze request logging now records action-aware approve context more explicitly when present
    • new structured request-log fields:
      • action_spender_trust
        • unchecked
        • allowlisted
        • not_allowlisted
      • action_decision
        • populated from action_evaluation.decision on successful 200 responses
    • current purpose:
      • observe whether approve requests are actually using configured allowlists
      • see what action-level decision the current narrow policy is producing in practice
      • avoid freezing extra public API surface before seeing live evidence
    • implementation shape:
      • src/risk_api/app.py computes/logs spender trust for validated action-aware requests
      • src/risk_api/analysis/action_policy.py now exports the spender-trust classifier used by both policy derivation and logging
    • local verification passed:
      • python -m pytest tests/test_config.py tests/test_logging.py tests/test_action_policy.py tests/test_app.py -q -> 196 passed
      • python -m pytest -q -> 395 passed
    • deployment status:
      • deployed in 1af2be0
  • Refined the action-aware approve layer locally on 2026-04-06 with an opt-in spender allowlist path:
    • new config env: APPROVE_SPENDER_ALLOWLIST
      • comma-separated Base spender addresses
      • validated at startup and normalized to lowercase
      • no behavior change when unset
    • action-policy behavior is now narrower and more useful when the allowlist is configured:
      • clean base-policy allow + allowlisted spender stays action-level allow
      • clean base-policy allow + spender not on the allowlist escalates to action-level manual_review
      • warn/manual-review/block contracts still do not downgrade below the base contract policy
    • new action-level reason codes:
      • action_approve_spender_allowlisted
      • action_approve_spender_not_allowlisted
    • route wiring now passes the configured spender allowlist into derive_action_evaluation()
    • local verification passed:
      • python -m pytest tests/test_config.py tests/test_action_policy.py tests/test_app.py -q -> 181 passed
      • python -m pytest -q -> 393 passed
    • deployment status:
      • deployed in 1af2be0
  • Implemented a narrow action-aware admission-control V1 locally on 2026-04-06:
    • GET and POST /analyze now accept optional action context fields:
      • action
      • spender
      • chain
    • V1 is intentionally narrow:
      • only action=approve is accepted
      • only chain=base is accepted when chain is supplied
      • spender is required for approve
      • unsupported action values, unsupported chain values, missing spender, malformed spender, and query/body conflicts on the new fields all return 422 before the x402 paywall
    • the contract-level decision and recommended_policy remain unchanged when no action context is provided
    • valid action context now adds:
      • action_context
      • action_evaluation
    • action_evaluation is additive rather than replacing the core contract engine:
      • clean allow contracts escalate to action-level warn for approve
      • contract-level warn escalates to action-level manual_review
      • manual_review and block remain at least as severe
      • new reason code: action_approve_requested
    • implementation shape:
      • new module src/risk_api/analysis/action_policy.py
      • shared request-field parser / validator in src/risk_api/app.py
      • additive wire serialization in src/risk_api/api_contract.py
      • OpenAPI and Bazaar discovery schemas now expose the optional action-aware fields and response objects
    • coverage added in:
      • tests/test_action_policy.py
      • tests/test_app.py
      • tests/conftest.py fake Bazaar schema updated to match the new input contract
    • verification passed:
      • python -m pytest tests/test_action_policy.py tests/test_app.py -k "approve or spender or action_evaluation or action_aware or unsupported_action or unsupported_chain or action_context or bazaar or openapi" -q -> 26 passed
      • python -m pytest tests/test_app.py -q -> 163 passed
      • python -m pytest -q -> 387 passed
    • shipped result:
      • committed as 93ba6f0 (Add action-aware approve policy layer)
      • git push origin master succeeded
      • flyctl deploy --remote-only --app augurrisk succeeded cleanly
      • live verification after deploy:
        • flyctl status --app augurrisk -> machine version 100, state started, 1 passing health check
        • https://augurrisk.com/health returned {"status":"ok"}
        • https://augurrisk.com/openapi.json now includes ActionContext
    • follow-up caveat:
      • if public machine/discovery docs beyond openapi.json start describing the action-aware layer more explicitly, review the duplicated machine-facing copy outside src/risk_api/app.py for alignment before the next messaging/discovery push
  • Hardened /analyze method-contract behavior and docs locally on 2026-04-03:
    • src/risk_api/app.py now skips address validation and x402 gating for methods outside the real /analyze contract, so Flask handles OPTIONS and unsupported methods normally instead of returning misleading 422 errors
    • /analyze now responds with the default Flask OPTIONS behavior and returns 405 Method Not Allowed for unsupported methods like PUT, PATCH, and DELETE
    • the POST 422 OpenAPI examples now explicitly document conflicting query/body addresses, malformed JSON bodies, and non-object JSON bodies
    • test coverage added in tests/test_app.py for:
      • ungated OPTIONS /analyze
      • unsupported-method 405 behavior with and without x402 enabled
      • POST 422 OpenAPI body-error examples
    • the fake x402 test gate in tests/conftest.py now mirrors the real method contract instead of intercepting unsupported methods
    • verification passed:
      • python -m pytest tests/test_app.py -q -> 151 passed
  • Confirmed a real paid production smoke test on 2026-04-03 using the Conway wallet against the live deployed agent wallet:
    • command path used the existing local flow from scripts/test_x402_client.py
    • payer wallet: 0x79301Cf19Aaea29fbe40F0F5B78F73e2c3b0a2b8
    • payee / agent wallet: 0x13580b9C6A9AfBfE4C739e74136C1dA174dB9891
    • target endpoint: https://augurrisk.com/analyze?address=0x4200000000000000000000000000000000000006
    • observed flow: 402 -> signed PAYMENT-SIGNATURE -> 200
    • live result:
      • decision: allow
      • score: 0
      • level: safe
      • findings: 0
    • practical read:
      • the x402 payment path is currently working end to end on production
      • the current live policy for Base WETH now returns the clean result, not the older deployer-reputation warning result captured in older notes
  • Re-checked the Coinbase public discovery feed on 2026-04-03 local time (2026-04-04T04:06:55Z script timestamp) after the successful paid smoke:
    • command: python scripts/check_cdp_discovery.py --max-pages 5 --limit 100 --page-delay 0.75 --max-retries 4 --retry-delay 5
    • result: status=NOT_FOUND
    • coverage scanned: 5 pages / 500 items
    • keyword matches: 0
    • practical read:
      • successful paid settlement plus correct live x402 metadata still do not imply public-feed visibility
      • this remains CDP feed/indexing or support-escalation territory, not a repo/runtime bug
  • Hardened /analyze input handling and endpoint resilience locally on 2026-03-30:
    • src/risk_api/app.py now rejects malformed POST JSON bodies and conflicting address values between query params and JSON body before the x402 paywall instead of silently choosing one
    • fixed request logging to keep recording only the resolved address after the parser signature change
    • added app regressions in tests/test_app.py for:
      • matching query/body POSTs still succeeding
      • conflicting query/body POSTs returning 422
      • malformed JSON bodies returning 422
      • non-object JSON bodies returning 422
      • malformed JSON still returning 422 before x402 payment
    • added /stats resilience coverage in tests/test_logging.py so malformed JSONL lines are ignored instead of breaking the endpoint
    • extended the ignored local hidden holdout batch with two deployer-reputation partial-failure cases:
      • age lookup fails but low-tx warning survives
      • tx-count lookup fails but young-wallet warning survives
    • verification passed:
      • python -m pytest tests/test_app.py tests/test_logging.py -q -> 155 passed
      • python auto/loop.py -> 59/59
      • python -m pytest -q -> 363 passed
  • Extended the local autoresearch harness on 2026-03-30 so hidden cases can run a fully mocked analyze_contract() pass instead of only pure bytecode/policy checks:
    • src/risk_api/auto_bench.py now supports an analysis case kind with mocked RPC and Blockscout responses
    • auto/README.md documents the new case kind for explorer-backed and proxy-runtime coverage
    • tests/test_auto_bench.py now verifies the mocked analysis path
    • local hidden batch added under ignored auto/corpus/*.local.json now pressures:
      • deployer_reputation creator NOT_FOUND
      • deployer_reputation fresh wallet + low tx count
      • proxy implementation NO_CODE
      • proxy FETCH_FAILED vs NO_CODE reason-code separation
      • a reentrancy lookahead-boundary sanity case
    • verification passed:
      • python -m pytest tests/test_auto_bench.py -q -> 4 passed
      • python auto/loop.py -> 57/57
      • python -m pytest -q -> 357 passed
  • Fixed the paid blue-chip false-positive path locally on 2026-03-29:
    • narrowed detect_honeypot_patterns() so ordinary dispatcher/default-REVERT flow no longer counts as a honeypot; the detector now only fires on blacklist-style transfer controls until a stronger transfer-path heuristic exists
    • changed hidden_mint from automatic block to manual_review when it is the main signal, so governed mint-capability tokens do not hard-block by default
    • added/updated reproducible regressions in auto/corpus/public_cases.json, tests/test_patterns.py, tests/test_policy.py, tests/test_engine.py, and tests/test_app.py
    • local live re-check against the paid examples now returns:
      • Base WETH (0x4200000000000000000000000000000000000006) -> allow
      • AERO (0x940181a94A35A4569E4529A3CDfB74e38FD98631) -> manual_review
      • 0x1f9840a85d5aF5bf1D1762F925BDADdC4201F984 -> allow
    • verification passed:
      • python -m pytest -q -> 340 passed
      • python auto/loop.py -> 43/43
  • Committed, deployed, and pushed the combined false-positive + metadata-alignment release on 2026-03-29:
    • commit: dee071e (Fix false positives and align admission-control metadata)
    • flyctl deploy --remote-only succeeded for augurrisk
    • git push origin master succeeded
    • live internal route verification passed for:
      • https://augurrisk.com/
      • https://augurrisk.com/openapi.json
      • https://augurrisk.com/skill.md
      • https://augurrisk.com/llms.txt
      • https://augurrisk.com/llms-full.txt
      • https://augurrisk.com/.well-known/agent-card.json
      • https://augurrisk.com/agent-metadata.json
      • https://augurrisk.com/.well-known/x402
      • https://augurrisk.com/health
    • those live surfaces now reflect the admission control / decision / recommended_policy wording that was still missing before deploy
    • stale registration-script test expectations were also updated so the current metadata payloads and examples validate cleanly
  • Ran the external alignment pass on 2026-03-29:
    • pinned new IPFS metadata CID: QmfCBvB5wdBCTeT1XUiXyXY3z2TmUm1rUnQsqrW58reL6S
    • verified the pinned metadata via https://gateway.pinata.cloud/ipfs/QmfCBvB5wdBCTeT1XUiXyXY3z2TmUm1rUnQsqrW58reL6S
    • updated ERC-8004 agent 19074 to ipfs://QmfCBvB5wdBCTeT1XUiXyXY3z2TmUm1rUnQsqrW58reL6S
      • tx: 24cc2388aa6a2b714f783ffb4f24888c35ed9c761a50854156e01adcce79d733
    • updated x402.jobs resource 4964c164-c748-4cd6-a7a5-0ac33e118b6a
      • public API path: https://api.x402.jobs/api/v1/resources/augurrisk-com/augur-base
    • updated MoltMart service 984bf985-8f69-4237-b0e4-cd5452f1c489
      • authenticated API now shows the admission-control / decision / recommended_policy wording
    • Work402 seller alias Augur already exists as did:erc8004:37906
      • rerunning onboarding returns 409 alias conflict, which confirms the existing DID rather than indicating a fresh failure
    • current external audit status after the update pass and browser recheck:
      • x402.jobs public page shows the new admission-control wording
      • MoltMart public page shows the new admission-control wording
      • Work402 public page shows the new admission-control wording for did:erc8004:37906
      • 8004scan now shows the refreshed Augur page and current admission-control wording on the public agent profile
      • x402.org/ecosystem now shows the refreshed Augur admission-control wording on the public card
      • Coinbase public discovery feed still returned NOT_FOUND over the first 5 pages / 500 items when checked with python scripts/check_cdp_discovery.py --max-pages 5
      • x402list.fun is still stale on risk-api.life.conway.tech and still does not mention augurrisk.com
  • Opened the follow-up curated-listing PR on 2026-03-29:
    • coinbase/x402 PR #1869 (Refresh Augur ecosystem listing copy)
    • updates typescript/site/app/ecosystem/partners-data/augur/metadata.json in the upstream site repo
    • purpose: move the Augur ecosystem card from the old score-first wording to the current admission-control wording
    • follow-up status as of 2026-03-30: the public x402.org/ecosystem card is now showing the refreshed copy, so this no longer blocks the external audit pass
  • Landed a policy-quality follow-up locally on 2026-03-29:
    • commit: f248e80 (Refine managed proxy policy decisions)
    • added a new tracked autoresearch regression for high-score managed proxy admin surfaces
    • policy now returns manual_review instead of auto-block when the signal set is:
      • upgradeable proxy
      • hidden mint / admin-control surface
      • delegatecall and optional suspicious selector signals
      • no harder stop condition like honeypot, selfdestruct, fee manipulation, or unresolved proxy logic
    • local real-contract recheck from the current code:
      • Base USDC (0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913) -> manual_review
      • Base cbBTC (0xcbb7c0000ab88b473b1f5afd9ef808440eed33bf) -> manual_review
    • validation passed:
      • python auto/loop.py -> 48/48
      • python -m pytest tests/test_policy.py tests/test_engine.py -q -> 43 passed
      • python -m pytest -q -> 347 passed
  • Landed the next hidden-batch follow-up locally on 2026-03-29:
    • standard 45-byte EIP-1167 clone wrappers are now recognized as proxies instead of raw DELEGATECALL shells
    • resolve_implementation() now extracts clone targets from runtime bytecode, so wrapper contracts can surface their shared implementation analysis without depending on storage-slot proxies
    • added/promoted regressions in:
      • auto/corpus/public_cases.json
      • tests/test_patterns.py
      • tests/test_scoring.py
      • tests/test_engine.py
    • local holdout that originally failed was minimal_proxy_clone_raw_delegatecall
    • real spot-check after the fix:
      • Beefy/Aerodrome-style Base wrapper 0x09139A80454609B69700836a9eE12Db4b5DBB15f now resolves implementation 0x9818df1bdce8d0e79b982e2c3a93ac821b3c17e0
      • the wrapper shell now reports delegatecall as expected minimal-proxy behavior plus proxy detection, instead of tiny_bytecode + raw-shell framing
      • resulting score on that wrapper moved from a misleading shell-only 25 to an implementation-aware 45; decision stays manual_review because the shared implementation still has real delegatecall/suspicious-selector surface
    • validation passed:
      • python auto/loop.py -> 50/50
      • python -m pytest tests/test_patterns.py tests/test_scoring.py tests/test_engine.py -q -> 70 passed
      • python -m pytest -q -> 353 passed
  • Landed the next hidden-batch follow-up locally on 2026-03-30:
    • Solidity CBOR metadata trailers are now stripped before disassembly, so metadata bytes no longer create fake opcode findings like raw DELEGATECALL
    • added/promoted regressions in:
      • auto/corpus/public_cases.json
      • tests/test_disassembler.py
      • tests/test_engine.py
    • local holdout that originally failed was solidity_metadata_false_delegatecall
    • real effect on the shared wrapper family behind the previous clone-proxy pass:
      • shared implementation 0x9818df1bdce8d0e79b982e2c3a93ac821b3c17e0 now moves from manual_review at 25 down to warn at 10
      • wrapper 0x09139A80454609B69700836a9eE12Db4b5DBB15f now moves from manual_review at 45 down to warn at 30
      • sampled Base clone-wrapper family result after the fix: 80/80 wrappers in the sample now land on the same warn cluster with reason codes upgradeable_proxy, delegatecall_surface, and suspicious_selector_signal
    • validation passed:
      • python auto/loop.py -> 52/52
      • python -m pytest tests/test_disassembler.py tests/test_engine.py tests/test_patterns.py tests/test_scoring.py -q -> 80 passed
      • python -m pytest -q -> 356 passed
  • Committed, pushed, and deployed the Solidity-metadata follow-up on 2026-03-30:
    • commit: fef6a10 (Ignore Solidity metadata during disassembly)
    • git push origin master succeeded
    • flyctl deploy --remote-only --app augurrisk updated the machine image but timed out while polling health because the Machines API hit lease / rate-limit errors
    • flyctl machine start 287d341f3e0ed8 --app augurrisk recovered the app cleanly
    • final live verification:
      • flyctl status --app augurrisk -> machine version 93, state started, 1 passing health check
      • https://augurrisk.com/openapi.json returned the live API document after recovery
  • Found an ops-script safety bug during the external pass:
    • scripts/register_erc8004.py --help does not behave like help; because the script ignores --help, it executed the default register() path and created a second ERC-8004 agent
    • accidental tx: 0d09b847ae49c28dfba251485076170ae0ea45aa3eefe4a131a560c3d3fc45b2
    • decoded minted agent id: 37905
    • fix landed locally the same session:
      • scripts/pin_metadata_ipfs.py, scripts/register_erc8004.py, scripts/register_moltmart.py, and scripts/register_work402.py now use safe argparse entrypoints
      • targeted script tests plus full python -m pytest -q passed afterward
  • Finished the duplicated registration/discovery metadata wording pass locally on 2026-03-29:
    • updated stale registration surfaces in:
      • scripts/pin_metadata_ipfs.py
      • scripts/register_erc8004.py
      • scripts/register_x402jobs.py
      • scripts/register_moltmart.py
      • scripts/register_work402.py
    • aligned those scripts around admission control language and the current output shape (decision, recommended_policy, supporting findings, score)
    • scripts/register_moltmart.py and scripts/register_x402jobs.py no longer advertise the stale risk_level-style response shape
    • syntax check passed:
      • python -m py_compile scripts/pin_metadata_ipfs.py scripts/register_erc8004.py scripts/register_x402jobs.py scripts/register_moltmart.py scripts/register_work402.py
  • Added a concrete registration/discovery alignment checklist on 2026-03-29:
    • docs/REGISTRATIONS.md now explicitly separates:
      • internal app surfaces that update on deploy
      • script-driven third-party listings that require rerunning update flows
      • manual/curated or external-indexed surfaces that only support verification or escalation
    • practical consequence:
      • repo alignment is now deployed on augurrisk.com
      • external listing alignment is still not proven until we rerun the script-driven listing updates and manually audit the live surfaces
    • currently tracked external surfaces that matter for this pass:
      • ERC-8004 / 8004scan
      • x402.jobs
      • MoltMart
      • Work402 (testnet, if still worth maintaining)
      • x402.org/ecosystem
      • Coinbase public discovery feed
      • x402list.fun as an external stale-state check, not a repo-controlled surface
  • Captured the 2026-03-26 production traffic and outcome review so the next session does not over-prioritize copy/distribution cleanup:
    • pulled the durable production analytics store from Fly volume data (/data/analytics.sqlite3) and checked Fly logs around the same window
    • analyzed 2026-03-16 through 2026-03-26 UTC:
      • 3563 total logged events
      • 95 /analyze requests
      • 7 paid /analyze successes, all on 2026-03-25
      • top traffic was mostly discovery/crawler activity on /, /.well-known/agent-card.json, robots.txt, and sitemap.xml
    • confirmed the Fly OOM alert was real but brief:
      • 2026-03-26 07:21:00 UTC: Fly logged Out of memory: Killed process 786 (gunicorn)
      • one proxy-side connection closed before message completed error appeared at the same moment
      • a new worker booted at 2026-03-26 07:21:01 UTC
      • the OOM happened during a burst of / requests from x402audit/1.0, not during the paid /analyze burst
    • operational read from the production window:
      • no app-level 500 or 502 rows appeared in the durable analytics DB for the analyzed window
      • current live route checks returned 200 for /, /openapi.json, /skill.md, /llms.txt, /llms-full.txt, /.well-known/agent-card.json, /agent-metadata.json, and /.well-known/x402
      • paid /analyze calls averaged about 1.25s; unpaid 402 and invalid 422 paths stayed fast
    • product-quality read from the same evidence:
      • the meaningful issue is not uptime; it is that real paid testers hit major contracts and got bad block decisions
      • the paid contracts in the traffic review included Base WETH (0x4200000000000000000000000000000000000006), AERO (0x940181a94A35A4569E4529A3CDfB74e38FD98631), and 0x1f9840a85d5aF5bf1D1762F925BDADdC4201F984
      • local re-check from the current code still blocks all three, so this is not just a historical artifact in old logs
    • plan adjustment from this review:
      • keep the full admission-control wording/metadata alignment pass as valid cleanup
      • but move result quality ahead of metadata alignment and ahead of broader distribution work
      • treat the OOM as a secondary ops follow-up unless it repeats
  • Captured the 2026-03-26 growth/distribution cleanup so the next session starts with the right framing:
    • docs/GrowthExecutionPlan.md now points at docs/PRODUCT_DIRECTION_UPDATE.md as the current strategy source instead of treating the older wedge memo as the newest call
    • the growth plan now separates three channels that should not be conflated:
      • registry/directory maintenance
      • AI-answer visibility / citation work
      • operator ecosystems such as OpenClaw and installable workflow surfaces
    • the old checklist is now framed as baseline history rather than the current sprint, and the next explicit growth items include measured AI-answer visibility work plus one operator-ecosystem experiment
    • bookmark review takeaways from this session:
      • CrowdReply belongs in the LLM-discoverability / AI-answer-visibility bucket, not the registry-cleanup bucket
      • Larry something was almost certainly Larrybrain / Larry marketing in the OpenClaw ecosystem
      • Clawdmarket was not found by that exact name in the repo or the current bookmark export
    • the temporary scratch note docs/X_BOOKMARK_DISTRIBUTION_NOTES.md was merged into docs/GrowthExecutionPlan.md and deleted to avoid doc sprawl
  • Added a product-direction follow-up memo and aligned the public/docs wording with it on 2026-03-19:
    • new memo: docs/PRODUCT_DIRECTION_UPDATE.md
    • supporting market-read memo: docs/SELLING_TO_AGENTS_MEMO.md
    • docs/agent-economy-primer.md now points to the new memo for market/product implications so the primer stays focused on payment/discovery stack layers
    • keeps the engine/wedge, but makes the current direction explicit: Augur should be sold as pre-transaction contract admission control for agents on Base, not mainly as a generic risk-score/report API
    • README.md now leads with admission control and treats the score as supporting output instead of the main product
    • src/risk_api/app.py now aligns the homepage, OpenAPI description, skill.md, llms.txt, llms-full.txt, agent card, plugin manifest, ERC-8004 metadata, and x402 discovery copy around decision / recommended_policy first and the 0-100 score second
    • docs/PRODUCT_WEDGE_MEMO.md and docs/llm_discoverability_synthesis.md now point readers at the newer direction update so older strategy notes do not compete with the current call
    • the new memo captures a durable market test: Augur needs to be a service agents are rational to call instead of computing around, so roadmap work should favor speed, reliability, clearer policy output, and maintained judgment over generic breadth
    • the strategy docs now make the next narrow extension more explicit: destination-aware preflight for actions like deposit, approve, route, or pay can fit the wedge when it validates claimed protocol + chain + recipient consistency, but Augur should still avoid drifting into a generic phishing browser, wallet shield, or broad anti-scam suite
    • docs/SELLING_TO_AGENTS_MEMO.md now also captures the article's concrete trust checklist: publish uptime history, latency percentiles, and accuracy evidence, expose provenance where useful, and return confidence metadata when uncertainty is real
    • docs/X402_ECOSYSTEM_SUBMISSION.md and docs/submissions/x402-ecosystem/metadata.json now describe Augur as admission control rather than primarily as risk scoring
    • docs/outreach.md now uses admission-control language for reusable post angles instead of calling Augur a generic risk screen
    • verification for this wording pass: python -m pytest tests/test_app.py -q passed at 137 passed
    • not yet committed or deployed
  • The previous committed/deployed baseline was the first Etherscan V2 pass:
    • commit: a0547a4 (Move deployer reputation to Etherscan V2)
    • docs-only follow-up commit: aef6f28 (Refresh handover after reputation deploy)
  • Landed the Blockscout follow-up:
    • commit: 936f00b (Switch deployer reputation to Blockscout)
    • src/risk_api/analysis/reputation.py now uses public Base Blockscout endpoints for creator lookup and transaction-history probes
    • get_tx_count() no longer depends on explorer proxy RPC; it now uses a cheap tx-history probe that is exact for low-count wallets and clamped at the threshold for busy wallets
    • missing explorer keys no longer disable deployer reputation; BLOCKSCOUT_API_KEY is optional and only used for higher-rate access
    • src/risk_api/config.py now prefers BLOCKSCOUT_API_KEY and only falls back to legacy ETHERSCAN_API_KEY / BASESCAN_API_KEY
    • public wording in src/risk_api/app.py is now generic explorer-backed copy instead of vendor-specific copy
    • local verification passed:
      • python -m pytest -q
      • python auto/loop.py --allow-failures
      • real no-key smoke check: creator lookup, first-tx lookup, and low-tx probe all worked against Base Blockscout for a live Base contract
    • pushed to origin/master
    • flyctl deploy --remote-only succeeded for augurrisk
    • live checks passed for https://augurrisk.com/health, https://augurrisk.com/openapi.json, https://augurrisk.com/, and https://augurrisk.com/deployer-reputation-api
  • Re-ran the hidden discovery loop from the current baseline on 2026-03-18:
    • python auto/loop.py
    • result stayed green at 32/32
    • no blind spots, holdout disagreements, policy regressions, or serializer/doc drifts surfaced
    • next action remains: add new hidden candidate or holdout cases before changing implementation again
  • Landed the next hidden-batch follow-up on 2026-03-18:
    • commit: c92d499 (Cover more limit-control selector aliases)
    • new local hidden candidates surfaced a fresh alias gap where setMaxWalletAmount(uint256), setMaxHoldAmount(uint256), and setMaxTransferAmount(uint256) still returned clean allow
    • src/risk_api/analysis/selectors.py now treats those selectors as the same fee/limit manipulation family
    • expanded tracked regressions in tests/test_patterns.py, tests/test_scoring.py, and tests/test_engine.py
    • local hidden corpus is now green again at 35/35
    • full local test suite passed at 333 passed
    • pushed to origin/master
    • flyctl deploy --remote-only succeeded for augurrisk
    • live checks passed for https://augurrisk.com/health, https://augurrisk.com/openapi.json, and https://augurrisk.com/
  • Landed the next hidden-batch follow-up on 2026-03-18:
    • commit: 23c1657 (Warn on trading toggle selector aliases)
    • new local hidden candidates surfaced a fresh suspicious-selector gap where setTradingEnabled(bool) and enableTrading() still returned clean allow
    • src/risk_api/analysis/selectors.py now routes those selectors through the existing suspicious-selector warning path alongside other admin trading toggles
    • expanded tracked regressions in tests/test_selectors.py, tests/test_scoring.py, tests/test_engine.py, and tests/test_app.py
    • local hidden corpus is now green again at 37/37
    • full local test suite passed at 336 passed
    • pushed to origin/master
    • flyctl deploy --remote-only succeeded for augurrisk
    • live checks passed for https://augurrisk.com/health, https://augurrisk.com/openapi.json, and https://augurrisk.com/
  • Landed the next hidden-batch follow-up on 2026-03-18:
    • commit: 3cefda6 (Warn on fee bypass selector aliases)
    • new local hidden candidates surfaced a fresh suspicious-selector gap where excludeFromFees(address,bool) and setIsExcludedFromFee(address,bool) still returned clean allow
    • src/risk_api/analysis/selectors.py now routes those selectors through the existing suspicious-selector warning path alongside other selective fee-bypass controls
    • expanded tracked regressions in tests/test_selectors.py, tests/test_scoring.py, tests/test_engine.py, and tests/test_app.py
    • local hidden corpus is now green again at 39/39
    • full local test suite passed at 339 passed
    • pushed to origin/master
    • flyctl deploy --remote-only succeeded for augurrisk
    • live checks passed for https://augurrisk.com/health, https://augurrisk.com/openapi.json, and https://augurrisk.com/
  • Landed the next hidden-batch follow-up on 2026-03-18:
    • commit: 1da269f (Warn on whitelist and cooldown toggle aliases)
    • new local hidden candidates surfaced a fresh suspicious-selector gap where setWhitelistEnabled(bool), setTxCooldownEnabled(bool), and setCooldownEnabled(bool) still returned clean allow
    • src/risk_api/analysis/selectors.py now routes those selectors through the existing suspicious-selector warning path alongside other admin trading controls
    • expanded tracked regressions in tests/test_selectors.py, tests/test_scoring.py, tests/test_engine.py, and tests/test_app.py
    • local hidden corpus is now green again at 42/42
    • full local test suite passed at 342 passed
    • pushed to origin/master
    • flyctl deploy --remote-only succeeded for augurrisk
  • Landed an operational follow-up on 2026-03-18:
    • commit: be812ee (Keep Fly machine running for production)
    • root cause observed after deploy: the only Fly machine fell into stopped, and proxy wake-up attempts hit machines API returned an error: rate limit exceeded
    • immediate recovery was flyctl machine start 78469e3f419218 --app augurrisk
    • permanent repo-side fix: fly.toml now sets auto_stop_machines = 'off' so the single production machine stays up instead of relying on the flaky auto-start path
    • after redeploy, flyctl status --app augurrisk stayed started with checks passing, and public health, openapi.json, and homepage checks returned 200
  • Added a strategy memo that locks the current wedge:
    • docs/PRODUCT_WEDGE_MEMO.md
    • frames Augur as Base contract admission control for agents
    • keeps the product narrow rather than broadening into a full execution-security platform
  • Updated the public copy in README.md, homepage/skill.md/llms.txt/llms-full.txt generation in src/risk_api/app.py, and the growth-plan pointer so the same wedge appears across repo docs and public machine-readable surfaces.
  • Tightened the public wording pass across the main docs and discovery surfaces:
    • standardizes the public headline around Deterministic Base contract risk screening for agents
    • standardizes the short explainer around Screen Base contracts before your agent buys, routes funds, approves, or interacts
    • adds compact use-case education on the homepage and README so the product need is clearer at a glance
  • Did a second public-copy pass after review findings:
    • homepage setup language is plainer
    • use-case pages no longer use Buyer Intent framing
    • llms-full.txt now describes Augur as a paid HTTP API instead of an agent-to-agent API
    • examples/javascript/augur-mcp/README.md is more customer-facing
  • Patched the in-repo MCP wrapper so startup and tool discovery no longer hard-fail when CLIENT_PRIVATE_KEY is unset:
    • examples/javascript/augur-mcp/index.mjs now requires the key only when analyze_base_contract_risk is actually called
    • npm run smoke now passes locally on the read-only path without a wallet key
    • examples/javascript/augur-mcp/README.md now documents the split between read-only startup and paid analyze calls
  • Ran a 12-chat ChatGPT discoverability check for Augur and distilled the results into:
    • docs/llm_discoverability_synthesis.md
    • docs/llm_discoverability_runs_filled.csv
    • docs/llm_discoverability_summary_filled.csv
  • Moved the raw LLM transcript dumps out of docs/ and into the local archive:
    • .codex/research.local/llm-discoverability/
  • Shipped the live proof-of-work report:
    • https://augurrisk.com/reports/base-bluechip-bytecode-snapshot
  • Added a report-specific Open Graph card for the proof page:
    • image route: https://augurrisk.com/og/base-bluechip-bytecode-snapshot.png
    • report pages now use that asset instead of the generic /avatar.png
  • The proof report now:
    • uses the live /analyze response shape in its embedded snapshot JSON
    • includes nested implementation output for proxy examples
    • clearly labels the JSON as a dated snapshot, not a live rerun
  • Added registry-backed report routing in src/risk_api/app.py via /reports/<slug>.
  • Added a public MCP discovery/install surface:
    • live page: https://augurrisk.com/mcp
    • linked from the homepage, llms.txt, and llms-full.txt
  • Added a root agent-facing skill document:
    • live doc: https://augurrisk.com/skill.md
    • linked from the homepage, sitemap, robots, llms.txt, and llms-full.txt
  • Tightened the homepage visual hierarchy:
    • added a stronger brand lockup, hero stats, and denser section intros
    • kept the same public routes and machine-readable entrypoints
  • Clarified homepage wording around capability vs entry pages:
    • renamed the misleading "Use Augur For" block to "Public Entry Pages"
    • explicitly states that those pages are task-specific fronts for the same full 8-detector /analyze pass
  • Brought /mcp into the same visual system as the homepage without adding human-first promo sections:
    • keeps the page focused on local stdio setup, client-side x402, and canonical machine docs
  • Deployed the latest public-surface pass to Fly from master:
    • live commit: 572b206
    • verified live https://augurrisk.com/, https://augurrisk.com/skill.md, and https://augurrisk.com/mcp
  • Packaged and published the MCP wrapper as augurrisk-mcp:
    • npm: https://www.npmjs.com/package/augurrisk-mcp
    • current version: 1.0.1
    • public install path: npx -y augurrisk-mcp
  • Updated the homepage, MCP page, README.md, llms.txt, and llms-full.txt to surface the MCP package directly.
  • Recorded the first Coinbase x402 Discord post in docs/outreach.md.
  • Added OpenClaw (r/OpenClaw / OpenClaw Discord) as a secondary outreach target; avoid treating the AI-only OpenClaw forum as the primary posting surface.
  • Re-verified Coinbase discovery surfaces: x402.org/ecosystem now lists Augur, while the CDP Bazaar feed still does not reliably show an Augur match in public queries.
  • Verified the live deploys on augurrisk.com.
  • Deployed the latest copy pass to Fly from the local worktree and verified live:
    • homepage hero and Explore by Use Case
    • https://augurrisk.com/skill.md
    • https://augurrisk.com/llms-full.txt
    • https://augurrisk.com/honeypot-detection-api
  • Added a real first-pass policy layer to the live /analyze response:
    • new top-level fields: decision and recommended_policy
    • default mapping is now rule-based rather than score-band-only:
      • allow only for clean safe results with no reason codes
      • warn for low results and safe results that still carry non-blocking signals
      • manual_review for medium, unresolved proxy logic, raw DELEGATECALL, or SELFDESTRUCT
      • block for high / critical or honeypot signals
    • recommended_policy now returns action, summary, and stable reason_codes
  • Updated all machine-readable surfaces and examples to reflect the real policy output:
    • OpenAPI examples and AnalysisResult schema in src/risk_api/app.py
    • x402 Bazaar discovery examples
    • README.md, skill.md, llms.txt, and llms-full.txt
    • proof report snapshots in src/risk_api/proof_reports.py
  • Added coverage for the policy layer:
    • new unit tests in tests/test_policy.py
    • engine and app tests now verify decision / recommended_policy
  • Tightened the first-pass policy and example contract after review:
    • raw singleton delegatecall now forces at least manual_review via a policy override instead of slipping through as allow
    • proxy handling now carries structured resolution status (resolved, unresolved, fetch_failed, nested_proxy) from the engine into policy/reason-code derivation
    • OpenAPI examples, machine docs, and proof-report snapshot JSON now round-trip through the live serializer so implementation omission and nested implementation shapes stay aligned
    • OpenAPI now publishes a PolicyReasonCode enum for recommended_policy.reason_codes
  • Put the new autoresearch harness to work with hidden local corpora:
    • added local ignored files auto/corpus/holdout.local.json and auto/candidates/discovered-2026-03-16.local.json
    • first run surfaced four real policy blind spots: hidden_mint_permissive_policy, honeypot_permissive_policy, selfdestruct_warn_regression, and fee_manipulation_safe_allow
    • after tightening derive_policy(), python auto/bench.py --json-out auto/runs/latest.json is green again with those local holdouts loaded
  • Ran the next hidden holdout discovery batch locally:
    • found a new selector gap where pause() silently returned allow because it never reached a detector or policy signal
    • moved pause() onto the existing suspicious-selector path so it now warns with suspicious_selector_signal instead of passing clean
    • expanded the private local corpora with fresh pause(), reentrancy, and proxy fetch_failed holdouts/candidates
    • python auto/loop.py --allow-failures is green again at 26/26 checks
  • Ran a second hidden holdout pass after deploying fccbbb0:
    • found a follow-on selector gap where raw blacklist(address) / addToBlacklist(address) selectors still returned allow if no transfer path was visible
    • kept full honeypot blocking unchanged when transfer selectors are present, but now routes orphan blacklist controls through the existing suspicious-selector warning path
    • expanded the private local corpora again with blacklist-without-transfer holdouts/candidates
    • python auto/loop.py --allow-failures is green again at 28/28 checks
  • Committed, pushed, and deployed two more hidden-holdout batches:
    • commit: fccbbb0 (Warn on pause selectors in autoresearch batch)
    • change: pause() now warns via suspicious_selector_signal instead of silently allowing
    • commit: a3fb26d (Warn on orphan blacklist selectors)
    • change: orphan blacklist(address) / addToBlacklist(address) selectors now warn via the suspicious-selector path when no concrete detector surfaces them
    • both commits were pushed to origin/master
    • both flyctl deploy --remote-only runs succeeded for augurrisk
    • live checks passed for https://augurrisk.com/health and https://augurrisk.com/openapi.json
  • Ran a third hidden holdout pass after deploying a3fb26d:
    • found a fee/limit alias gap where raw setMaxSellAmount(uint256) and setWalletLimit(uint256) selectors still returned clean allow
    • added those aliases to the fee-manipulation family and shared the label matcher between detector surfacing and orphan-selector suppression so they warn at score 15 without extra suspicious-selector points
    • expanded the private local candidate corpus with the new limit-control cases
    • python auto/loop.py --allow-failures is green again at 30/30 checks
  • Committed, pushed, and deployed the third hidden-holdout batch:
    • commit: 71a394c (Warn on fee-limit selector aliases)
    • change: setMaxSellAmount(uint256) and setWalletLimit(uint256) now surface as fee_manipulation rather than silently allowing
    • pushed to origin/master
    • flyctl deploy --remote-only succeeded for augurrisk
    • live checks passed for https://augurrisk.com/health and https://augurrisk.com/openapi.json
  • Ran a fourth hidden holdout pass after deploying 71a394c:
    • found a follow-on transaction-limit alias gap where raw setMaxBuyAmount(uint256), setTxLimit(uint256), and setMaxTxnAmount(uint256) selectors still returned clean allow
    • extended the shared fee/limit alias family so these common anti-whale selectors now surface as fee_manipulation
    • expanded the private local candidate corpus with the new transaction-limit cases
    • python auto/loop.py --allow-failures is green again at 32/32 checks
  • Committed, pushed, and deployed the fourth hidden-holdout batch:
    • commit: 09a75f6 (Warn on tx-limit selector aliases)
    • change: setMaxBuyAmount(uint256), setTxLimit(uint256), and setMaxTxnAmount(uint256) now warn through the fee-manipulation path instead of silently allowing
    • pushed to origin/master
    • flyctl deploy --remote-only succeeded for augurrisk
    • live checks passed for https://augurrisk.com/health and https://augurrisk.com/openapi.json
  • Tightened first-pass policy precedence from the autoresearch findings:
    • hidden_mint and honeypot now block even when the numeric score is only low
    • SELFDESTRUCT now forces at least manual_review even when the numeric score is only low
    • safe results with residual non-blocking reason codes now warn instead of auto-allowing
    • added targeted regressions in tests/test_policy.py, tests/test_engine.py, and tests/test_app.py
  • Added a bounded local autoresearch harness for detector and API-contract regressions:
    • entrypoint: python auto/bench.py
    • latest known-good run: python auto/bench.py --json-out auto/runs/latest.json
    • tracked starter corpus: auto/corpus/public_cases.json
    • agent prompt: auto/program.md
    • reusable benchmark logic: src/risk_api/auto_bench.py
    • local holdouts and candidate discoveries live in ignored *.local.json files under auto/corpus/ and auto/candidates/
    • built-in checks cover policy regressions, serializer/doc drift, OpenAPI examples, machine docs, and proof-report shape
  • Closed a proof-report semantic drift gap after independent review:
    • src/risk_api/proof_reports.py now aligns the embedded WETH and USDC snapshot policy output with current derive_policy() semantics
    • src/risk_api/auto_bench.py now fails if any proof-report snapshot embeds stale decision / recommended_policy values relative to current policy logic
    • tests/test_app.py now asserts proof-report snapshots still match live policy semantics
    • tests/test_auto_bench.py now proves the bench catches stale proof-report policy examples
  • Wired the tracked public autoresearch corpus into GitHub Actions:
    • .github/workflows/typecheck.yml now runs python auto/bench.py auto/corpus/public_cases.json in CI
    • local verification passed with 11/11 public checks green
  • Added a thin autoresearch loop runner for day-to-day use:
    • new wrapper: python auto/loop.py
    • implementation: src/risk_api/auto_loop.py
    • writes auto/runs/latest.json by default and prints a compact failure summary grouped by blind spot
    • supports --allow-failures, --skip-app-contract-checks, optional custom case paths, and optional --json-out
    • covered by tests/test_auto_loop.py
  • Split resolved-proxy eth_getCode == 0x from transport failures:
    • ProxyResolutionStatus now includes no_code
    • PolicyReasonCode now includes proxy_logic_no_code
    • resolved implementation addresses with no deployed bytecode still map to manual_review, but no longer collapse into fetch_failed
    • engine/app/policy tests now cover the Proxy implementation has no bytecode path
  • Promoted the four highest-signal local policy blind spots into the tracked public corpus:
    • promoted to auto/corpus/public_cases.json: hidden mint -> block, honeypot -> block, selfdestruct -> manual_review, fee manipulation -> warn
    • intentionally left the low-score resolved-proxy warn case in local holdouts because the tracked corpus already covers unresolved and nested proxy semantics and should stay compact
  • Committed, pushed, and deployed the autoresearch/policy-hardening batch:
    • commit: 71ba517 (Add autoresearch harness and tighten policy semantics)
    • pushed to origin/master
    • flyctl deploy --remote-only succeeded for augurrisk
  • Committed and deployed the policy-output pass:
    • commit: 9cf5e0f (Add first-pass policy decisions to analyze)
    • verified live https://augurrisk.com/skill.md
    • verified live https://augurrisk.com/llms-full.txt
    • verified live https://augurrisk.com/openapi.json
    • verified live 402 discovery output from GET /analyze
  • Checked the live dashboard/stats surfaces after deployment:
    • /stats and /dashboard are still instance-local operational views, not canonical analytics
    • the most recent visible 402 row can be polluted by our own verification probes
    • curl/... user agents are a useful intentional CLI/script signal, but they do not prove a human was manually at the keyboard

Current Read

  • Current product-scope rule:
    • keep all 8 existing detectors inside the same narrow admission-gate product
    • do not narrow scope by removing detectors like honeypot, proxy, or deployer reputation
    • do not broaden scope into simulation, generalized runtime monitoring, or wallet/session protection
  • Current wording rule:
    • keep Base contract admission control for agents as internal strategy language
    • prefer clearer public phrasing such as Deterministic Base contract risk screening for agents
    • use straightforward user-facing copy like Screen Base contracts before your agent buys, routes funds, approves, or interacts
  • ChatGPT discoverability is currently weak:
    • Augur did not appear unprompted in the 12 blind runs
    • after direct comparison, the model consistently classifies Augur as a serious but narrow Base-only deterministic prefilter
    • repeated perceived gap is transaction simulation plus broader runtime/interactions coverage
  • Treat the LLM result as a distribution/messaging signal first, not as proof that Augur should pivot into a full execution-security platform.
  • Follow-up review of the LLM research sharpened the interpretation:
    • the problem is partly entity resolution (Augur often resolves to unrelated products) as well as generic discoverability
    • at least a couple of blind runs were methodologically contaminated or ambiguous, so the 0/12 headline is directionally useful but not a clean benchmark
    • stronger strategic takeaway is still category ownership and retrievability for a narrow wedge, not feature expansion toward simulation
  • MCP wrapper behavior is now cleaner for demos and onboarding:
    • startup/read-only introspection works without CLIENT_PRIVATE_KEY
    • paid analyze calls still require the key at tool invocation time
  • Public-facing product/discovery surface is now in good shape for promotion:
    • root skill doc is live
    • homepage wording no longer confuses public entry pages with full detector coverage
    • proof page is live
    • report OG card is fixed
    • payment explainer is live
    • MCP setup page is live
    • npm MCP package is live
    • buyer-intent pages are live
  • Current positioning rule: Augur stays agent-first. Prefer machine-readable docs, direct integration paths, and MCP/x402 clarity over social-proof or human-marketing sections.
  • Current messaging rule: keep one plain public headline plus one plain trigger-moment sentence across homepage, README, machine docs, and registration metadata; add brief use-case examples where they clarify why an agent would call Augur.
  • Current discovery/docs rule:
    • /skill.md is the shortest agent quickstart/discovery doc, not a separate product
    • keep core machine surfaces (/skill.md, OpenAPI, llms*.txt, .well-known/*, MCP page) unless there is a clear reason to retire one
    • use-case pages are optional support surfaces; keep them only if they improve clarity or qualified traffic
  • Current product-output rule:
    • Augur now returns explicit first-pass policy outputs: decision and recommended_policy
    • recommended_policy currently includes action, summary, and reason_codes
    • allow should be reserved for clean safe outputs with no reason codes
    • honeypot should still block even at low
    • high-score managed upgradeable assets with mint/admin-control surfaces but no clearer hard-stop signal should default to manual_review, not auto-block
    • hidden_mint should now force at least manual_review, not automatic block, when it is the main signal
    • raw non-proxy delegatecall and SELFDESTRUCT should never auto-allow or stay at plain warn just because the numeric score is low
    • unresolved proxy logic should be carried as structured engine state and stable reason codes, not inferred from human-readable finding titles
    • treat fetch_failed (RPC/lookup failure) separately from no_code (implementation address resolved but has no deployed bytecode)
    • machine-facing examples should be produced through the same serializer as the live /analyze route
    • this is a default first-pass recommendation layer, not a replacement for caller-specific policy logic
  • Current detector-research rule:
    • use auto/bench.py as the bounded local harness for adversarial bytecode, policy edge cases, and API-contract drift
    • run hidden discovery batches serially; let each batch land, deploy, and become the new baseline before starting the next one
    • prefer adding a reproducible case before changing implementation
    • keep local holdout corpora untracked so the loop cannot merely overfit the visible tracked corpus
    • current tracked corpus is intentionally small; the next useful work is adding real hidden holdout cases under auto/corpus/*.local.json
    • keep fee/limit selector alias matching shared between detector surfacing and orphan-selector filtering so limit controls warn at 15 instead of double-counting as suspicious_selector
    • keep transaction-limit aliases like setMaxBuyAmount, setTxLimit, and setMaxTxnAmount in that same shared fee/limit family, along with broader limit-control aliases like setMaxWalletAmount, setMaxHoldAmount, and setMaxTransferAmount
    • keep trading-gate aliases like setTradingEnabled(bool) and enableTrading() on the suspicious-selector warning path alongside other admin toggles like setSwapEnabled(bool)
    • keep selective fee-bypass aliases like excludeFromFees(address,bool) and setIsExcludedFromFee(address,bool) on the suspicious-selector warning path alongside excludeFromFee(address)
    • keep whitelist/cooldown toggles like setWhitelistEnabled(bool), setTxCooldownEnabled(bool), and setCooldownEnabled(bool) on that same suspicious-selector warning path when they surface owner-controlled trading restrictions
    • keep pause() on the suspicious-selector path for now; it should warn instead of silently allowing, but it does not yet justify a dedicated public detector or automatic block
    • if a known malicious selector is present but no concrete detector surfaces it, prefer warning through the suspicious-selector path over silently allowing it
    • proof-report snapshots are allowed to stay dated, but their embedded decision / recommended_policy should still agree with current policy semantics unless you intentionally choose to preserve a historical policy layer and update the drift checks accordingly
  • Current detector weakness read:
    • observed hidden-batch misses have been concentrated in fee_manipulation alias coverage and suspicious_selector fallback coverage, not repeated core honeypot misses
    • the generic honeypot control-flow heuristic was producing obvious false positives on standard dispatcher/default-REVERT patterns in blue-chip contracts, so the detector is now intentionally narrowed to blacklist-style transfer-control signals until a stronger transfer-path heuristic exists
    • reentrancy is also structurally narrow (CALL then nearby SSTORE) and should be treated as heuristic coverage, not deep semantic analysis
    • deployer_reputation is the weakest detector operationally because it depends on explorer APIs; failures can erase signal even when bytecode analysis is healthy
    • after the recent selector/proxy-wrapper/metadata passes, the next hidden-batch marginal value is no longer in more alias churn; it is in under-covered families like deployer_reputation, proxy no_code vs fetch_failed, and reentrancy edge cases
  • Current deployer-reputation fix read:
    • the recommended path is now the landed Blockscout implementation:
      • Base Blockscout public endpoints returned real creator lookup and tx-history data in local smoke tests with no key
      • the detector still preserves the repo rule that external API failure stays distinct from true NOT_FOUND
      • request throttling/retry remains in place for explorer-side failures
      • BLOCKSCOUT_API_KEY is optional for higher limits; ETHERSCAN_API_KEY / BASESCAN_API_KEY are legacy fallbacks only
      • product call for now: keep deployer_reputation in the detector set, but treat it as supporting context rather than a pillar detector
      • practical next step is optional, not blocking: verify one real paid /analyze flow if you want end-to-end proof that deployer-reputation is now showing up again on production traffic
    • the Etherscan result is now background context only:
      • current local key against Etherscan V2 on Base returns Free API access is not supported for this chain...
      • current BaseScan V1 path also returns a deprecation error
    • optional later improvement: evaluate a richer wallet-provenance signal only if real users prove deployer reputation matters enough to justify more dependency or spend
  • Current analytics read:
    • durable production read for 2026-03-16 through 2026-03-26 UTC:
      • 3563 total logged events
      • 95 /analyze requests
      • 7 paid /analyze successes
      • no app-level 500 or 502 rows in the analytics DB for that window
    • the only confirmed Fly OOM in that review window was 2026-03-26 07:21:00 UTC
      • it caused one proxy-side dropped request and a worker restart about a second later
      • it did not overlap with the paid /analyze burst on 2026-03-25
    • the important production concern is result quality, not sustained downtime
      • real paid testers checked Base WETH, AERO, and 0x1f98...F984
      • current local analysis still blocks those contracts
    • keep using the Fly volume DB plus Fly logs together for traffic forensics
      • /stats and /dashboard stay useful hints, but not the source of truth
      • treat curl/... and similar agents as intent signals, not proof of a human at the keyboard
  • coinbase/x402 PR #1515 is merged into main.
  • coinbase/x402 follow-up PR #1869 delivered the wording refresh that is now visible on x402.org/ecosystem.
  • Current execution priority:
    • first: keep the new local hidden batch in place and only change implementation if future python auto/loop.py runs expose a real disagreement in those analysis holdouts
    • second: return to the Coinbase public discovery feed check or paid /analyze smoke evidence
    • third: keep x402list.fun treated as external stale state unless the directory itself updates
  • OpenClaw looks relevant for agent-builder reach, but it should stay behind Base/x402-first distribution.
  • Treat x402.org/ecosystem and the CDP discovery/resources feed as separate surfaces; being live on the former does not imply the latter is queryable.
  • Existing upstream follow-up:
    • determine whether Augur eventually appears in the CDP public discovery feed or whether Coinbase support clarification is needed
  • Separate local side-task status:
    • QMD vault retrieval on this laptop is usable now
    • default strong mode on the 8 GB Intel iGPU machine is structured hybrid lex+vec, not blind reliance on plain auto-expanded qmd query "..."
    • the failure mode observed was intermittent Vulkan GPU out-of-memory on the heaviest local query path, not a broken QMD index
    • C:\Users\justi\Obsidian Vault\Outputs\2026-03-08-qmd-reference.md was updated with current QMD 2.0.1 status, retrieval workflow, and CPU fallback guidance
    • local vault-synth now exists at C:\Users\justi\dev\vault-synth
    • it retrieves notes with QMD, synthesizes with OpenAI, prints answer plus sources, and only saves when --save is passed
    • current Windows implementation uses fused qmd search + qmd vsearch for the default lex+vec path because multiline structured qmd query arguments were brittle through qmd.cmd
    • it can fall back to C:\Users\justi\dev\risk-api\.env for OPENAI_API_KEY if no local vault-synth\.env exists
    • vault-synth now auto-runs qmd --index vault-core update before retrieval by default; use --no-refresh only when you explicitly want speed over freshness
    • this fixed a real stale-index mismatch where QMD served an older QMD reference note than the file on disk
    • vault-synth now excludes its own saved notes from default retrieval so synthesis output does not become a self-referential source on later runs

Recommended Next Steps

Autoresearch Todo

  • Objective 1: put the public autoresearch bench in CI with python auto/bench.py auto/corpus/public_cases.json
  • Objective 2: add a thin auto/loop.py runner that writes auto/runs/latest.json and prints a compact failure summary
  • Objective 3: decide whether proxy slot resolved + implementation bytecode = 0x should stay fetch_failed or get its own proxy-resolution status
  • Objective 4: review local holdout/candidate cases and promote only durable representative regressions into auto/corpus/public_cases.json
  1. Add the next hidden local cases under auto/corpus/*.local.json or auto/candidates/*.local.json, targeting:
    • deployer_reputation NOT_FOUND vs explorer ERROR behavior
    • proxy no_code vs fetch_failed semantics
    • one or two reentrancy edge cases
  2. Run python auto/loop.py and only change implementation if those new cases fail reproducibly.
  3. Re-check the Coinbase/CDP public discovery feed with the refreshed evidence set:
    • real paid production smoke succeeded on 2026-04-03
    • real paid production smoke also succeeded on 2026-04-06 on the live action-aware approve request shape
    • public feed recheck on 2026-04-03 local time still returned NOT_FOUND over the first 5 pages / 500 items
    • if it is still absent after the newer smoke evidence, treat that as the support-escalation packet
  4. Keep x402list.fun classified as stale external state unless the directory itself updates.
  5. If you want stronger evidence before support escalation, do a broader but rate-limited feed scan beyond the first 500 items.
  6. Work through the 2026-03-11 outreach queue in docs/outreach.md, with OpenClaw after the tighter Base/x402 targets.
  7. Revise the LLM discoverability artifacts on the next pass:
    • separate clean runs from contaminated runs
    • capture entity-resolution failures explicitly
    • fill missing rank/provenance fields in the filled CSV
  8. Use the LLM memo to tighten both category wording and entity disambiguation around Augur Risk, augurrisk.com, and Base-first deterministic contract gating before broader promotion.
  9. Do one real paid end-to-end MCP test with a wallet configured before any broader MCP push or npm patch release.
  10. Watch:
  • proof_report_view
  • top_referers
  • /how-payment-works visits
  • unpaid 402 attempts
  • paid requests
  1. If CDP support is not contacted yet, use the 2026-04-03 plus 2026-04-06 paid-smoke successes together with the current NOT_FOUND public-feed scan as the escalation packet.
  2. Only build more proof/demo surfaces if distribution shows confusion or weak conversion.
  3. If more public-page polish happens, keep checking that /skill.md, OpenAPI, and the paid /analyze path remain the dominant integration cues above the fold.
  4. Use real paid-call observations, not only /stats, to decide whether the current allow / warn / manual_review / block mapping matches actual evaluator behavior.
  5. The next highest-value move is distribution, not more product surface:
  • use the live first-party approve example in outreach and demos
  • watch whether it produces qualified action-aware 402 attempts, paid calls, or direct user questions
  • only reopen A-003 or a second action if that usage evidence justifies it
  1. In the next session, start a fresh hidden holdout discovery batch:
  • use python auto/loop.py as the default runner
  • run one batch at a time; do not queue multiple hidden discovery batches before you know what the previous one changed
  • add the next batch of real hidden holdouts under auto/corpus/*.local.json or auto/candidates/*.local.json
  • the most recent local additions already cover dispatcher/default-REVERT, mint-capability-only manual_review, clone-wrapper, and Solidity-metadata behavior
  • the current local hidden batch now also covers analysis-path deployer NOT_FOUND, fresh/low-tx, partial explorer failures, proxy NO_CODE, proxy FETCH_FAILED, and a reentrancy lookahead boundary
  • next target families should now move past that set unless a new real failure points back there
  • prioritize unseen detector/policy edge cases over widening the tracked public corpus immediately
  • only promote a new case into auto/corpus/public_cases.json if it is durable and representative
  1. Automation follow-up:
  • keep serial hidden-batch runs manual for now while the fixes are still shaping the research workflow
  • later, build a guarded local orchestrator that can run N serial batches end-to-end: hidden batch -> validation -> commit -> push -> deploy -> live verify -> next batch
  • first version should stay constrained to narrow selector/policy research surfaces and fail closed on ambiguous results
  1. In the next session, tune C:\Users\justi\dev\vault-synth retrieval quality:
  • compare fused search + vsearch against plain qmd query on questions that should hit outputs/
  • decide whether the lexical branch should stay acronym-first, use a broader distilled keyword query, or use collection-aware hints
  • if vault-synth becomes a regular tool, add its own local .env or move OPENAI_API_KEY to a user-level secret store instead of relying on the risk-api fallback

Tomorrow Start Here

  1. Confirm the deployed app is still healthy and the repo still matches the current code baseline:
    • https://augurrisk.com/health
    • https://augurrisk.com/openapi.json
    • live deployed state now includes the first-party approve example docs follow-up on machine version 104
    • if the next flyctl deploy --remote-only times out during health polling again, check flyctl status --app augurrisk and the live public routes immediately before assuming the deploy failed; on 2026-04-06 the deploy still landed and the machine recovered to healthy at version 103, while the later docs-only follow-up deployed cleanly to version 104
  2. The current narrow approve refinement is already landed on production:
    • optional APPROVE_SPENDER_ALLOWLIST support is live
    • action-aware request observability is live in /stats via action_spender_trust and action_decision
    • a real paid action-aware approve smoke succeeded on 2026-04-06
    • the first-party docs now also show one exact approve request/response example on /, skill.md, llms.txt, and llms-full.txt
    • next product decision is whether live evidence justifies adding an explicit public spender-trust response field (A-003)
  3. The paid-result problem and the wording/deploy work are already landed:
    • Base WETH (0x4200000000000000000000000000000000000006) now returns allow locally
    • AERO (0x940181a94A35A4569E4529A3CDfB74e38FD98631) now returns manual_review locally
    • 0x1f9840a85d5aF5bf1D1762F925BDADdC4201F984 now returns allow locally
    • internal live routes now reflect the admission-control wording
  4. Treat the March 26 OOM as secondary unless it repeats:
    • it caused one brief dropped request during crawler traffic on /
    • it did not overlap with the paid /analyze burst
    • if it happens again, consider a memory bump or more direct memory profiling
  5. The latest hidden discovery rerun is already green:
    • python auto/loop.py passed at 59/59 on 2026-03-30
    • python -m pytest -q passed at 363
    • do not start a new hidden batch until you first add a new local candidate or holdout
  6. The endpoint method-contract follow-up is already landed locally:
    • /analyze now leaves OPTIONS and unsupported methods to Flask instead of masking them with 422
    • POST 422 OpenAPI examples now explicitly cover conflicting query/body addresses plus malformed and non-object JSON bodies
    • python -m pytest tests/test_app.py -q passed at 151
    • next step is the refreshed Coinbase/CDP feed recheck or support escalation, then x402list.fun
  7. After that, audit the live third-party surfaces instead of assuming the repo updates propagated:
    • 8004scan
    • x402.jobs
    • MoltMart
    • Work402 if applicable
    • x402.org/ecosystem
    • Coinbase public discovery feed
    • x402list.fun as an external stale-state check
  8. Runtime proof is now present from both 2026-04-03 and 2026-04-06:
    • a real paid plain /analyze smoke succeeded from the Conway wallet to the live agent wallet
    • a real paid action-aware approve smoke also succeeded on the live app
    • use those successes when reasoning about CDP/discovery visibility versus app-route health
  9. Keep the public copy generic (explorer-backed) unless there is a reason to advertise Blockscout specifically.
  10. For the next research step, start a fresh hidden discovery probe only after adding a new local candidate or holdout:
  • use python auto/loop.py
  • keep batches serial
  • add a new hidden holdout/candidate before changing detector logic again
  • target deployer_reputation, proxy no_code, and reentrancy before spending another batch on selector aliases