NVIDIA · rapids-bot · Jun 11, 2026 · Jun 10, 2026 · Jun 10, 2026 · Jun 10, 2026
@@ -7,13 +7,14 @@ This benchmark summarizes 3-Tier Evaluation from NVSkills-Eval results for the s
 ## Evaluation Summary
 
 - Skill: `cuopt-numerical-optimization-api-python`
-- Evaluation date: 2026-05-29
+- Evaluation date: 2026-06-10
 - NVSkills-Eval profile: `external`
-- Environment: `local`
-- Dataset: 1 evaluation tasks
-- Attempts per task: 2
+- Environment: `astra-sandbox`
+- Dataset: 4 evaluation tasks
+- Attempts per task: 1
 - Pass threshold: 50%
 - Overall verdict: FAIL
+The skill should be reviewed before NVSkills-Eval publication. **Skill owners should address the applicable findings below and rerun NVSkills-Eval to refresh this benchmark.**
 
 ## Agents Used
 
@@ -42,9 +43,9 @@ Underlying evaluation signals used in this run:
 
 ## Test Tasks
 
-The benchmark dataset contained 1 evaluation tasks:
+The benchmark dataset contained 4 evaluation tasks:
 
-- Positive tasks: 1 tasks where the skill was expected to activate.
+- Positive tasks: 4 tasks where the skill was expected to activate.
 - Negative tasks: 0 tasks where no skill was expected.
 - Unlabeled tasks: 0 tasks where positive/negative intent could not be inferred.
 
@@ -54,39 +55,39 @@ Task composition is derived from the evaluation dataset when possible. Entries w
 
 | Dimension | Num | `claude-code` | `codex` |
 |---|---:|---:|---:|
-| Security | 2 | 100% (+0%) | 100% (+0%) |
-| Correctness | 2 | 100% (+0%) | 82% (+5%) |
-| Discoverability | 2 | 100% (+0%) | 84% (+5%) |
-| Effectiveness | 2 | 79% (-1%) | 40% (-9%) |
-| Efficiency | 2 | 93% (-0%) | 77% (+1%) |
+| Security | 4 | 100% (+0%) | 100% (+0%) |
+| Correctness | 4 | 65% (+29%) | 64% (+8%) |
+| Discoverability | 4 | 50% (+44%) | 44% (+25%) |
+| Effectiveness | 4 | 66% (+17%) | 56% (+3%) |
+| Efficiency | 4 | 61% (+37%) | 44% (+17%) |
 
 Score values show skill-assisted performance. Values in parentheses show uplift versus the no-skill baseline when baseline data is available.
 
 ## Tier 1: Static Validation Summary
 
-Tier 1 validation passed with observations. NVSkills-Eval ran 9 checks and found 15 total findings.
+Tier 1 validation passed with observations. NVSkills-Eval ran 9 checks and found 10 total findings.
 
 Top findings:
 
-- MEDIUM PII/gps_coordinates: GPS coordinates (location information) (`references/qp_examples.md:162`)
-- MEDIUM PII/gps_coordinates: GPS coordinates (location information) (`references/qp_examples.md:163`)
-- MEDIUM PII/gps_coordinates: GPS coordinates (location information) (`references/qp_examples.md:164`)
 - MEDIUM PII/phone_numbers: International phone number (`assets/mps_solver/results.md:48`)
 - MEDIUM PII/phone_numbers: International phone number (`assets/mps_solver/results.md:69`)
+- MEDIUM SCHEMA/body_recommended_section: Missing recommended section: '## Instructions' (`skills/cuopt-numerical-optimization-api-python/SKILL.md`)
+- MEDIUM SCHEMA/body_recommended_section: Missing recommended section: '## Examples' (`skills/cuopt-numerical-optimization-api-python/SKILL.md`)
+- LOW QUALITY/quality_discoverability: Description doesn't mention WHEN to use this skill (`skills/cuopt-numerical-optimization-api-python/SKILL.md`)
 
 ## Tier 2: Deduplication Summary
 
 Tier 2 validation reported findings. NVSkills-Eval ran 2 checks and found 9 total findings.
 
 Top findings:
 
+- HIGH DUPLICATE/duplicate: Duplicate content found across assets/lp_warmstart/README.md and assets/lp_warmstart/model.py:
+  "# LP PDLP Warmstart" in assets/lp_warmstart/README.md (lines 1-5)
+  vs "(module docstring)" in assets/lp_warmstart/model.py (lines 1-4) (`assets/lp_warmstart/README.md:1`)
 - HIGH DUPLICATE/duplicate: Duplicate content found across SKILL.md and assets/mps_solver/README.md and references/qp_examples.md:
   "# Solve" in SKILL.md (lines 63-67)
   vs "# Configure and solve" in assets/mps_solver/README.md (lines 76-80)
   vs "# Solve" in references/qp_examples.md (lines 47-51) (`SKILL.md:63`)
-- HIGH DUPLICATE/duplicate: Duplicate content found across assets/lp_warmstart/README.md and assets/lp_warmstart/model.py:
-  "# LP PDLP Warmstart" in assets/lp_warmstart/README.md (lines 1-5)
-  vs "(module docstring)" in assets/lp_warmstart/model.py (lines 1-4) (`assets/lp_warmstart/README.md:1`)
 - HIGH DUPLICATE/duplicate: Duplicate content found across assets/milp_basic/README.md and assets/milp_basic/model.py:
   "# Minimal MILP" in assets/milp_basic/README.md (lines 1-10)
   vs "(module docstring)" in assets/milp_basic/model.py (lines 1-6) (`assets/milp_basic/README.md:1`)
@@ -97,7 +98,3 @@ Top findings:
   "# Check status (CRITICAL: use PascalCase!)" in SKILL.md (lines 68-74)
   vs "# ✅ CORRECT" in SKILL.md (lines 148-151)
   vs "# Check solution" in assets/mps_solver/README.md (lines 81-85) (`SKILL.md:68`)
-
-## Publication Recommendation
-
-The skill should be reviewed before NVSkills-Eval publication. Skill owners should address the findings above and rerun NVSkills-Eval to refresh this benchmark.
@@ -1,10 +1,10 @@
 [
   {
     "id": "numopt-py-eval-001-lp-api-call-sequence",
-    "question": "I want to solve a small LP (continuous variables only, maximize a linear objective with linear constraints) using the cuOpt Python API. List the API calls in order — name each method, one line per method, no full runnable script.",
+    "question": "I want to solve a small LP (continuous variables only, maximize a linear objective with linear constraints) using the cuOpt Python API. List the API calls in order \u2014 name each method, one line per method, no full runnable script.",
     "expected_skill": "cuopt-numerical-optimization-api-python",
     "expected_script": null,
-    "ground_truth": "The agent produces an ordered list of API calls without a runnable script. The list, in order: (1) Import Problem, CONTINUOUS, and MAXIMIZE from cuopt.linear_programming.problem, and SolverSettings from cuopt.linear_programming.solver_settings. (2) Construct Problem('name'). (3) For each decision variable, call problem.addVariable(lb=..., vtype=CONTINUOUS, name=...). (4) For each constraint, call problem.addConstraint(<linear expression> <= or >= or == <rhs>, name=...). (5) Call problem.setObjective(<linear expression>, sense=MAXIMIZE). (6) Construct SolverSettings(); call set_parameter('time_limit', ...) for time budget. (7) Call problem.solve(settings). (8) Check problem.Status.name in ['Optimal', 'PrimalFeasible'] (PascalCase status names — case-sensitive). (9) Read problem.ObjValue for the objective, and each variable's .getValue() for its optimal value. The agent uses LP (not MILP / QP) because all variables are continuous and the objective is linear. Mentions that status names are PascalCase (Optimal, not OPTIMAL or optimal) — case sensitivity matters.",
+    "ground_truth": "The agent produces an ordered list of API calls without a runnable script. The list, in order: (1) Import Problem, CONTINUOUS, and MAXIMIZE from cuopt.linear_programming.problem, and SolverSettings from cuopt.linear_programming.solver_settings. (2) Construct Problem('name'). (3) For each decision variable, call problem.addVariable(lb=..., vtype=CONTINUOUS, name=...). (4) For each constraint, call problem.addConstraint(<linear expression> <= or >= or == <rhs>, name=...). (5) Call problem.setObjective(<linear expression>, sense=MAXIMIZE). (6) Construct SolverSettings(); call set_parameter('time_limit', ...) for time budget. (7) Call problem.solve(settings). (8) Check problem.Status.name in ['Optimal', 'PrimalFeasible'] (PascalCase status names \u2014 case-sensitive). (9) Read problem.ObjValue for the objective, and each variable's .getValue() for its optimal value. The agent uses LP (not MILP / QP) because all variables are continuous and the objective is linear. Mentions that status names are PascalCase (Optimal, not OPTIMAL or optimal) \u2014 case sensitivity matters.",
     "expected_behavior": [
       "Selects LP (not MILP or QP) given continuous variables and a linear objective",
       "Lists the API calls in order without producing a full runnable script",
@@ -15,5 +15,48 @@
       "Mentions that status names are case-sensitive (PascalCase)",
       "Does not invent method names that are not in the skill"
     ]
+  },
+  {
+    "id": "numopt-py-eval-002-status-case-sensitivity",
+    "question": "My cuOpt Python LP solve runs without error but the result block never executes. Here is the check I wrote: if problem.Status.name == 'OPTIMAL': print(problem.ObjValue). What is wrong and how do I fix it?",
+    "expected_skill": "cuopt-numerical-optimization-api-python",
+    "expected_script": null,
+    "ground_truth": "The check silently fails because cuOpt status names use PascalCase, not ALL_CAPS. The string 'OPTIMAL' never matches. The correct LP status values to check are 'Optimal' and 'PrimalFeasible'. The fixed check is: if problem.Status.name in ['Optimal', 'PrimalFeasible']: print(problem.ObjValue). For MILP the correct values are 'Optimal' and 'FeasibleFound'. This is a common silent bug \u2014 the solve completes successfully but the code path that reads results is skipped because the string comparison always returns False.",
+    "expected_behavior": [
+      "Identifies the bug as a case mismatch \u2014 'OPTIMAL' is wrong, 'Optimal' is correct",
+      "States that cuOpt status names are PascalCase, not ALL_CAPS",
+      "Gives the correct LP check: problem.Status.name in ['Optimal', 'PrimalFeasible']",
+      "Notes that for MILP the passing status is 'FeasibleFound' not 'FEASIBLE_FOUND' or 'FEASIBLEFOUND'",
+      "Explains why this is a silent failure \u2014 no exception is raised, the block just never executes"
+    ]
+  },
+  {
+    "id": "numopt-py-eval-003-integer-vs-continuous-workers",
+    "question": "I am modeling a staffing problem where I need to decide how many nurses to assign to each ward. Should the nurse count variables be INTEGER or CONTINUOUS in the cuOpt Python API, and what vtype constant do I use for each?",
+    "expected_skill": "cuopt-numerical-optimization-api-python",
+    "expected_script": null,
+    "ground_truth": "Nurse counts should be INTEGER because nurses are discrete countable entities \u2014 you cannot assign 2.7 nurses to a ward. The vtype constant is INTEGER (imported from cuopt.linear_programming.problem). The addVariable call would be: problem.addVariable(lb=0, vtype=INTEGER, name='ward_a_nurses'). This makes the problem a MILP, not an LP. CONTINUOUS would be wrong here because it allows fractional values, which are meaningless for headcounts. The rule is: 'how many things' (people, vehicles, machines) \u2192 INTEGER; 'how much of something' (hours, tonnes, dollars) \u2192 CONTINUOUS.",
+    "expected_behavior": [
+      "States nurse counts must be INTEGER because nurses are discrete countable entities",
+      "Names the correct vtype constant: INTEGER (imported from cuopt.linear_programming.problem)",
+      "Shows or describes the addVariable call with vtype=INTEGER",
+      "States this makes the problem MILP, not LP",
+      "Explains why CONTINUOUS is wrong \u2014 it allows fractional nurse counts",
+      "States the rule: countable things \u2192 INTEGER, measurable amounts \u2192 CONTINUOUS"
+    ]
+  },
+  {
+    "id": "numopt-py-eval-004-qp-maximize-workaround",
+    "question": "I want to maximize a quadratic objective using the cuOpt Python QP API. When I pass sense=MAXIMIZE to setObjective, I get an error. What is the correct approach?",
+    "expected_skill": "cuopt-numerical-optimization-api-python",
+    "expected_script": null,
+    "ground_truth": "The cuOpt QP solver only supports MINIMIZE \u2014 MAXIMIZE is rejected for quadratic objectives. The correct workaround is to negate all coefficients in the objective and minimize the negated expression. For example, to maximize -0.04*x1*x1 - 0.02*x2*x2 (a concave quadratic with NSD Q), minimize 0.04*x1*x1 + 0.02*x2*x2 with sense=MINIMIZE. The resulting problem.ObjValue will be the negated maximum; multiply by -1 to recover the true maximum. All variables must remain CONTINUOUS \u2014 integer QP is not supported. The Q matrix of the original maximization problem must be negative semi-definite (NSD) for the problem to be concave and have a finite maximum; after negation it becomes PSD, which is what the solver expects. Maximizing a convex quadratic (positive coefficients) is unbounded and not a meaningful use case.",
+    "expected_behavior": [
+      "States QP only supports MINIMIZE \u2014 MAXIMIZE is rejected",
+      "Gives the correct workaround: negate all objective coefficients and use sense=MINIMIZE",
+      "Notes that problem.ObjValue will be negated and must be multiplied by -1 to get the true maximum",
+      "Reminds that all variables must be CONTINUOUS \u2014 integer QP is not supported",
+      "Does not suggest a non-existent MAXIMIZE_QP or similar invented API"
+    ]
   }
 ]
@@ -7,9 +7,9 @@ This skill is ready for commercial/non-commercial use. <br>
 NVIDIA <br>
 
 ### License/Terms of Use: <br>
-Apache-2.0 <br>
+Apache 2.0 <br>
 ## Use Case: <br>
-Developers and engineers use this skill to formulate and solve linear programming (LP), mixed-integer linear programming (MILP), and quadratic programming (QP) optimization problems using NVIDIA cuOpt's GPU-accelerated Python API. <br>
+Developers and engineers solving linear, mixed-integer, and quadratic programming problems using NVIDIA cuOpt’s GPU-accelerated Python API for scheduling, portfolio optimization, production planning, and least-squares fitting. <br>
 
 ### Deployment Geography for Use: <br>
 Global <br>
@@ -19,25 +19,25 @@ Risk: Review before execution as proposals could introduce incorrect or misleadi
 Mitigation: Review and scan skill before deployment. <br>
 
 ## Reference(s): <br>
+- [QP Examples (least-squares, maximization workaround, matrix form)](references/qp_examples.md) <br>
 - [cuOpt User Guide](https://docs.nvidia.com/cuopt/user-guide/latest/introduction.html) <br>
-- [cuOpt Examples](https://github.com/NVIDIA/cuopt-examples) <br>
-- [QP Examples Reference](references/qp_examples.md) <br>
+- [cuOpt Examples Repository](https://github.com/NVIDIA/cuopt-examples) <br>
 
 
 ## Skill Output: <br>
-**Output Type(s):** [Code, API Calls, Analysis] <br>
-**Output Format:** [Python code with inline solver output] <br>
+**Output Type(s):** [Code, API Calls] <br>
+**Output Format:** [Python code with inline solver configuration] <br>
 **Output Parameters:** [1D] <br>
 **Other Properties Related to Output:** [None] <br>
 
 ## Evaluation Agents Used: <br>
-- claude-code <br>
-- codex <br>
+- `claude-code` <br>
+- `codex` <br>
 
 
 
 ## Evaluation Tasks: <br>
-Evaluated against 1 task with 2 attempts per agent; pass threshold 50%. NVSkills-Eval profile: external. <br>
+Evaluated against 4 evaluation tasks (NVSkills-Eval external profile, astra-sandbox environment, 1 attempt per task). <br>
 
 ## Evaluation Metrics Used: <br>
 Reported benchmark dimensions: <br>
@@ -61,11 +61,11 @@ Underlying evaluation signals used in this run: <br>
 ## Evaluation Results: <br>
 | Dimension | Num | `claude-code` | `codex` |
 |---|---:|---:|---:|
-| Security | 2 | 100% (+0%) | 100% (+0%) |
-| Correctness | 2 | 100% (+0%) | 82% (+5%) |
-| Discoverability | 2 | 100% (+0%) | 84% (+5%) |
-| Effectiveness | 2 | 79% (-1%) | 40% (-9%) |
-| Efficiency | 2 | 93% (-0%) | 77% (+1%) |
+| Security | 4 | 100% (+0%) | 100% (+0%) |
+| Correctness | 4 | 65% (+29%) | 64% (+8%) |
+| Discoverability | 4 | 50% (+44%) | 44% (+25%) |
+| Effectiveness | 4 | 66% (+17%) | 56% (+3%) |
+| Efficiency | 4 | 61% (+37%) | 44% (+17%) |
 
 ## Skill Version(s): <br>
 26.08.00 (source: frontmatter) <br>