2026-04-28-harbor by joyemang33 · Pull Request #30 · FrontierCS/frontier-cs-org

joyemang33 · 2026-04-30T03:15:29Z

OpenReview Submission Thread

Checklist before opening a PR

I am opening a pull request against the main branch of the 2025 repo.

My post and all associated references to it are all lowercase, i.e

  2025-04-28-Sample-Test.md               -> 2025-04-28-sample-test.md
  assets/img/2025-04-28-Sample-Test/ 	-> assets/img/2025-04-28-sample-test/

The title of my PR is exactly the name of my markdown file
- i.e. _posts/2025-04-28-[submission-name].md would require a PR name 2025-04-28-[submission-name]
I have anonymized my post: my author's list is Anonymous, and there is no potential
content which can reveal my/my collaborators identities.
My post matches the formatting requirements, including (but not limited to):
- I have ONLY MODIFIED files in the following locations (failure to do so will result in
  your PR automatically being closed!):
  - a Markdown (or HTML) file in _posts/ with the format _posts/2025-04-28-[submission-name].md (or .html)
  - static image assets added to assets/img/2025-04-28-[submission-name]/
  - interactive HTML figures added to assets/html/2025-04-28-[submission-name]/
  - citations in a bibtex file in assets/bibliography/2025-04-28-[submission-name].bib
- I have a short 2-3 sentence abstract in the description field of my front-matter
- I have a table of contents, formatted using the toc field of my front-matter
- My bibliography is correctly formatted, using a .bibtex file as per the sample post

Any other comments

…tion - Replace em-dashes with colons or sentence breaks throughout body and front-matter - Remove prose parentheses (Plug in, Two strategies callout, Harbor benchmarks list) - Sync opening callout to table: up to 456 turns / 405 tool calls / 531K tokens - Update front-matter description with current data - Drop the body opener that repeated the lead callout's first two sentences - Normalize Pointers list bullets to colon-separated descriptions Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- After the design-principle callout, add three short paragraphs covering: the structural isolation (read-only mounts, single-problem judge mount, HTTP-only channel), the five-step trial flow, and the three judge image modes (per-trial build, locally cached, published frozen). - Add new section "Parity With the Native Eval" with the 10-problem, 3-run-per-side comparison: 68.92% native vs 53.37% Harbor on claude-code@2.1.112 / claude-opus-4-6, plus the oracle-sweep 70.23% baseline on every problem with a shipped reference. Notes which pieces are intentionally aligned (prompt, CLI, chk.cc) and what changes (transport: in-process vs HTTP). - Update TOC to include the new Parity section. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The empirical evidence (105 runs, sustained hundreds of turns) now lands right after the Polyomino example, while "172 Problems, One Command" + "Parity With the Native Eval" move to the end and read as one continuous "here's the adapter, here's parity, here's the command" block leading into "Try It". TOC reordered to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace the placeholder Kimi section with a trace-grounded analysis: - All 17 Opus crashes share the same 96K-output-token-per-call cap (verified across all trajectory.json files); none of Kimi's 2 crashes do. - Walk frontier-cs-220 (Opus 0 / Kimi 1.000): Opus's trajectory shows 4 sequential thinking-only assistant chunks, 0 tool calls, empty artifacts dir; Kimi runs 207 turns with explicit context compaction. - Add frontier-cs-0 as a second mode: Opus self-validates its solver on random cases, judge TLEs 70/70 (returncode=-9); Kimi ships conservative algo for 0.74. - Drop the 17 zeros from Opus's average and its mean rises 35.1 -> 41.8. Header renamed from "In-depth Study of How Kimi Achieves Similar Scores" because the trace says Opus hits a self-imposed ceiling, not that Kimi catches up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

joyemang33 and others added 2 commits April 29, 2026 20:15

1

e3a6ac3

wenhaochai changed the title 1 2026-04-28-harbor Apr 30, 2026

wenhaochai and others added 5 commits April 30, 2026 16:11

update

561c7f1

update

44e4e89

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

2026-04-28-harbor#30

2026-04-28-harbor#30
joyemang33 wants to merge 7 commits into
FrontierCS:mainfrom
joyemang33:calico-blog

joyemang33 commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

joyemang33 commented Apr 30, 2026

OpenReview Submission Thread

Checklist before opening a PR

Any other comments

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants