2026-04-28-harbor#30
Open
joyemang33 wants to merge 7 commits into
Open
Conversation
…tion - Replace em-dashes with colons or sentence breaks throughout body and front-matter - Remove prose parentheses (Plug in, Two strategies callout, Harbor benchmarks list) - Sync opening callout to table: up to 456 turns / 405 tool calls / 531K tokens - Update front-matter description with current data - Drop the body opener that repeated the lead callout's first two sentences - Normalize Pointers list bullets to colon-separated descriptions Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- After the design-principle callout, add three short paragraphs covering: the structural isolation (read-only mounts, single-problem judge mount, HTTP-only channel), the five-step trial flow, and the three judge image modes (per-trial build, locally cached, published frozen). - Add new section "Parity With the Native Eval" with the 10-problem, 3-run-per-side comparison: 68.92% native vs 53.37% Harbor on claude-code@2.1.112 / claude-opus-4-6, plus the oracle-sweep 70.23% baseline on every problem with a shipped reference. Notes which pieces are intentionally aligned (prompt, CLI, chk.cc) and what changes (transport: in-process vs HTTP). - Update TOC to include the new Parity section. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The empirical evidence (105 runs, sustained hundreds of turns) now lands right after the Polyomino example, while "172 Problems, One Command" + "Parity With the Native Eval" move to the end and read as one continuous "here's the adapter, here's parity, here's the command" block leading into "Try It". TOC reordered to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the placeholder Kimi section with a trace-grounded analysis: - All 17 Opus crashes share the same 96K-output-token-per-call cap (verified across all trajectory.json files); none of Kimi's 2 crashes do. - Walk frontier-cs-220 (Opus 0 / Kimi 1.000): Opus's trajectory shows 4 sequential thinking-only assistant chunks, 0 tool calls, empty artifacts dir; Kimi runs 207 turns with explicit context compaction. - Add frontier-cs-0 as a second mode: Opus self-validates its solver on random cases, judge TLEs 70/70 (returncode=-9); Kimi ships conservative algo for 0.74. - Drop the 17 zeros from Opus's average and its mean rises 35.1 -> 41.8. Header renamed from "In-depth Study of How Kimi Achieves Similar Scores" because the trace says Opus hits a self-imposed ceiling, not that Kimi catches up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
OpenReview Submission Thread
Checklist before opening a PR
I am opening a pull request against the
mainbranch of the2025repo.My post and all associated references to it are all lowercase, i.e
The title of my PR is exactly the name of my markdown file
_posts/2025-04-28-[submission-name].mdwould require a PR name2025-04-28-[submission-name]I have anonymized my post: my author's list is
Anonymous, and there is no potentialcontent which can reveal my/my collaborators identities.
My post matches the formatting requirements, including (but not limited to):
your PR automatically being closed!):
_posts/with the format_posts/2025-04-28-[submission-name].md(or.html)assets/img/2025-04-28-[submission-name]/assets/html/2025-04-28-[submission-name]/assets/bibliography/2025-04-28-[submission-name].bibdescriptionfield of my front-mattertocfield of my front-matter.bibtexfile as per the sample postAny other comments