Skip to content

chore(skills): import DP-GEN simplify agent skill#1879

Merged
njzjz merged 4 commits into
deepmodeling:masterfrom
njzjz-bot:chore/import-agent-skills
May 8, 2026
Merged

chore(skills): import DP-GEN simplify agent skill#1879
njzjz merged 4 commits into
deepmodeling:masterfrom
njzjz-bot:chore/import-agent-skills

Conversation

@njzjz-bot
Copy link
Copy Markdown
Contributor

@njzjz-bot njzjz-bot commented May 7, 2026

Problem

  • The DP-GEN simplify agent skill currently lives in jinzhezenggroup/computational-chemistry-agent-skills rather than beside the DP-GEN source tree.
  • We want to preserve the upstream skill history during migration instead of copying only the final snapshot.

Change

  • Import the dpgen-simplify skill into skills/.
  • Replay all upstream commits touching either the original simplify/dpgen-simplify path or the renamed machine-learning-potentials/dpgen-simplify path as separate commits.
  • Preserve original authorship/date and record the source commit in each commit body.

Notes

  • Imported paths: simplify/dpgen-simplify, machine-learning-potentials/dpgen-simplify.
  • Destination path: skills/dpgen-simplify.
  • Validation: git diff --check origin/master...HEAD.
  • One final style-only commit reapplies JSON formatting equivalent to pre-commit.ci after rebuilding the branch.

Authored by OpenClaw (model: gpt-5.5)

Summary by CodeRabbit

  • Documentation

    • Added comprehensive skill documentation and reference guides for the dpgen-simplify workflow, including field specifications, validation procedures, and workflow best practices.
  • New Features

    • Added configuration templates for various execution environments (local, server-based, and remote).
    • Added parameter templates and example configurations for workflow setup.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 7, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This pull request introduces a new dpgen-simplify GitHub skill module for the DP-GEN workflow. It defines baseline JSON templates, machine-specific execution profiles (local shell, local Slurm, SSH remote Slurm), a QM7 parameter example, and comprehensive documentation covering skill definition, parameter field reference, machine configuration reference, and execution workflow guidance with validation checklists.

Changes

DP-GEN Simplify Skill Module

Layer / File(s) Summary
Parameter & Machine Templates
skills/dpgen-simplify/assets/param.template.json, skills/dpgen-simplify/assets/machine.template.json
Baseline JSON templates define configuration schema with null/default values for versioning, stage separation (train, model_devi, fp), and resource fields.
Machine Configuration Profiles
skills/dpgen-simplify/assets/machine.template.local-shell.json, machine.template.server-local-slurm.json, machine.template.ssh-remote-slurm.json
Three execution environment templates: local shell with LazyLocalContext, local Slurm with LocalContext, and SSH-based remote Slurm with SSHContext and remote profile settings.
Example Parameter Configuration
skills/dpgen-simplify/assets/param.example.qm7.from-official-docs.json
Concrete QM7 dataset example demonstrating type/mass maps, dataset paths, training hyperparameters, descriptor/fitting networks, learning schedules, loss weights, fingerprint settings, and model deviation thresholds.
Skill Definition & Usage Rules
skills/dpgen-simplify/SKILL.md
Main specification document defining skill purpose, two-file configuration requirement (param.json/machine.json), agent responsibilities, working policy rules, runtime boundaries between launcher and stage jobs, minimum required inputs, param/machine construction guidance, pre-run validation steps, and output contract.
Machine Configuration Reference
skills/dpgen-simplify/references/machine-fields.md
Field guidance for machine.json covering stage separation, runtime profile selection (context_type/batch_type mapping), environment boundary between outer and inner stages, per-stage configuration concerns, and practical construction advice avoiding invented scheduler/environment names.
Parameter Configuration Reference
skills/dpgen-simplify/references/param-fields.md
Field guidance for param.json organized by role: dataset inputs, simplify selection windows, training config, FP setup, and a checklist emphasizing consistency for type mappings, descriptor family, FP backend, and threshold changes.
Execution Workflow & Validation
skills/dpgen-simplify/references/workflow-notes.md
Step-by-step workflow from task confirmation through post-run reporting, including standard command syntax, validation checklist (version/JSON/paths/environment/scheduler settings), directory structure for repeated experiments, and post-run summary checklist.

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'chore(skills): import DP-GEN simplify agent skill' accurately summarizes the main change—importing the dpgen-simplify skill into the repository from an upstream source.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
.github/skills/dpgen-simplify/assets/param.example.qm7.from-official-docs.json (1)

2-8: ⚡ Quick win

Consider documenting or eliminating the type_map duplication.

The type_map array appears at both the root level (lines 2-8) and within default_training_param.model.type_map (lines 26-31). While the values currently match, maintaining identical arrays in two locations creates a maintenance hazard—if one is updated without updating the other, silent inconsistencies could occur that violate the "keep type_map consistent" guidance from the field reference documentation.

Consider one of these approaches:

  • Add a comment warning that both arrays must be kept synchronized
  • Investigate whether dpgen can accept a reference to avoid duplication
  • Add validation in the workflow to ensure both arrays match

Also applies to: 26-31

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
@.github/skills/dpgen-simplify/assets/param.example.qm7.from-official-docs.json
around lines 2 - 8, The root-level "type_map" and
"default_training_param.model.type_map" are duplicated; add a config validation
step when loading/parsing this JSON that explicitly compares these two arrays
(root "type_map" vs "default_training_param.model.type_map") and fails fast with
a clear error/log message showing both values if they differ, or collapse the
duplication by removing the root "type_map" and only keeping
"default_training_param.model.type_map" (or vice versa) if dpgen supports a
single location—implement the equality check in your config-loader/validation
routine so mismatches cannot silently occur.
.github/skills/dpgen-simplify/SKILL.md (2)

341-341: ⚡ Quick win

Add a trailing newline at end of file.

The file should end with a newline character for POSIX compliance and to avoid warnings from some text processing tools.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/skills/dpgen-simplify/SKILL.md at line 341, Add a trailing newline
to the end of SKILL.md so the file terminates with a newline character
(POSIX-compliant); open the file (.github/skills/dpgen-simplify/SKILL.md), go to
the end of the file (after the last line containing the simplify machine
definitions link) and insert a single newline character so the file ends with a
newline.

84-122: ⚡ Quick win

Clarify the section numbering hierarchy.

The current numbering structure is ambiguous:

  • Section "4. Do not invent environment activation commands" (line 84) has standalone content
  • Then subsections "4.1 Outer launcher policy" (line 100) and "4.2 Outer vs inner runtime boundaries" (line 110) appear
  • It's unclear whether 4.1 and 4.2 are nested under section 4 or if they're related follow-ups

Consider either:

  1. Renumbering 4.1 and 4.2 as standalone sections (5 and 6), and shifting "Prefer reproducible output layout" to section 7
  2. Restructuring section 4 to clearly indicate it contains subsections, perhaps with a different heading like "### 4. Environment activation policies" that encompasses all three pieces

This will improve documentation clarity and help readers understand the logical grouping.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/skills/dpgen-simplify/SKILL.md around lines 84 - 122, The section
numbering is inconsistent: "4. Do not invent environment activation commands" is
presented as a main section while "4.1 Outer launcher policy" and "4.2 Outer vs
inner runtime boundaries (critical)" appear to be ambiguous subsections; update
the headings so the hierarchy is clear by either (A) renumbering "4.1" and "4.2"
to standalone sections "5" and "6" and bumping "Prefer reproducible output
layout" to "7", or (B) renaming the main heading to "4. Environment activation
policies" (or similar) and ensure "4.1 Outer launcher policy" and "4.2 Outer vs
inner runtime boundaries" are nested under it, keeping the exact text for the
policies unchanged; adjust the heading markers for "4. Do not invent environment
activation commands", "4.1 Outer launcher policy", "4.2 Outer vs inner runtime
boundaries (critical)", and "Prefer reproducible output layout" accordingly so
the document structure is unambiguous.
.github/skills/dpgen-simplify/references/machine-fields.md (1)

97-97: ⚡ Quick win

Add a trailing newline at end of file.

The file should end with a newline character for POSIX compliance and to avoid warnings from some text processing tools.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/skills/dpgen-simplify/references/machine-fields.md at line 97, Add a
POSIX-compliant trailing newline to the end of the file by ensuring the last
line ("if the user already has a working template, patch it instead of rewriting
everything") is terminated with a newline character; update the file so its
final byte is a newline to avoid tooling warnings.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In
@.github/skills/dpgen-simplify/assets/param.example.qm7.from-official-docs.json:
- Around line 2-8: The root-level "type_map" and
"default_training_param.model.type_map" are duplicated; add a config validation
step when loading/parsing this JSON that explicitly compares these two arrays
(root "type_map" vs "default_training_param.model.type_map") and fails fast with
a clear error/log message showing both values if they differ, or collapse the
duplication by removing the root "type_map" and only keeping
"default_training_param.model.type_map" (or vice versa) if dpgen supports a
single location—implement the equality check in your config-loader/validation
routine so mismatches cannot silently occur.

In @.github/skills/dpgen-simplify/references/machine-fields.md:
- Line 97: Add a POSIX-compliant trailing newline to the end of the file by
ensuring the last line ("if the user already has a working template, patch it
instead of rewriting everything") is terminated with a newline character; update
the file so its final byte is a newline to avoid tooling warnings.

In @.github/skills/dpgen-simplify/SKILL.md:
- Line 341: Add a trailing newline to the end of SKILL.md so the file terminates
with a newline character (POSIX-compliant); open the file
(.github/skills/dpgen-simplify/SKILL.md), go to the end of the file (after the
last line containing the simplify machine definitions link) and insert a single
newline character so the file ends with a newline.
- Around line 84-122: The section numbering is inconsistent: "4. Do not invent
environment activation commands" is presented as a main section while "4.1 Outer
launcher policy" and "4.2 Outer vs inner runtime boundaries (critical)" appear
to be ambiguous subsections; update the headings so the hierarchy is clear by
either (A) renumbering "4.1" and "4.2" to standalone sections "5" and "6" and
bumping "Prefer reproducible output layout" to "7", or (B) renaming the main
heading to "4. Environment activation policies" (or similar) and ensure "4.1
Outer launcher policy" and "4.2 Outer vs inner runtime boundaries" are nested
under it, keeping the exact text for the policies unchanged; adjust the heading
markers for "4. Do not invent environment activation commands", "4.1 Outer
launcher policy", "4.2 Outer vs inner runtime boundaries (critical)", and
"Prefer reproducible output layout" accordingly so the document structure is
unambiguous.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 867352cf-1a0b-4a82-ba4a-7dae01aa2874

📥 Commits

Reviewing files that changed from the base of the PR and between f4f74a2 and 4630b3c.

📒 Files selected for processing (10)
  • .github/skills/dpgen-simplify/SKILL.md
  • .github/skills/dpgen-simplify/assets/machine.template.json
  • .github/skills/dpgen-simplify/assets/machine.template.local-shell.json
  • .github/skills/dpgen-simplify/assets/machine.template.server-local-slurm.json
  • .github/skills/dpgen-simplify/assets/machine.template.ssh-remote-slurm.json
  • .github/skills/dpgen-simplify/assets/param.example.qm7.from-official-docs.json
  • .github/skills/dpgen-simplify/assets/param.template.json
  • .github/skills/dpgen-simplify/references/machine-fields.md
  • .github/skills/dpgen-simplify/references/param-fields.md
  • .github/skills/dpgen-simplify/references/workflow-notes.md

@njzjz-bot njzjz-bot force-pushed the chore/import-agent-skills branch from 4630b3c to 18b6360 Compare May 7, 2026 19:02
hyb1109 and others added 4 commits May 8, 2026 00:43
Imported from jinzhezenggroup/computational-chemistry-agent-skills.
Upstream-Commit: jinzhezenggroup/computational-chemistry-agent-skills@885861f
Upstream-Paths:
- simplify/dpgen-simplify
- machine-learning-potentials/dpgen-simplify
…ng-potentials (deepmodeling#65)

Imported from jinzhezenggroup/computational-chemistry-agent-skills.
Upstream-Commit: jinzhezenggroup/computational-chemistry-agent-skills@a019524
Upstream-Paths:
- simplify/dpgen-simplify
- machine-learning-potentials/dpgen-simplify
…eepmodeling#67)

Imported from jinzhezenggroup/computational-chemistry-agent-skills.
Upstream-Commit: jinzhezenggroup/computational-chemistry-agent-skills@0475acb
Upstream-Paths:
- simplify/dpgen-simplify
- machine-learning-potentials/dpgen-simplify
Apply the same JSON indentation as pre-commit.ci after rebuilding the history-preserving migration branch.

Authored by OpenClaw (model: gpt-5.5)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@skills/dpgen-simplify/assets/machine.template.json`:
- Around line 12-17: The stage templates' "resources" objects (e.g., the block
showing "number_node", "cpu_per_node", "gpu_per_node", "group_size") are missing
the required "source_list" field; update every stage template resources object
in machine.template.json to include "source_list" (an array or null as used
convention in this file) so generated scheduler configs include inner-stage
environment activation per SKILL.md—ensure you add "source_list" alongside the
existing keys in every resources block referenced (also at the other occurrences
around lines 27-32 and 42-47).

In `@skills/dpgen-simplify/references/param-fields.md`:
- Line 31: The note for init_data_sys is a sentence fragment ("Can be empty when
starting fully from `pick_data`"); rewrite it as a complete sentence mentioning
the subject and context — e.g. "The init_data_sys field can be empty when
starting fully from `pick_data`." — by editing the line in
skills/dpgen-simplify/references/param-fields.md where `init_data_sys` is
described so the note reads as a full sentence and remains clear and concise.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f196057c-5ad8-4011-906b-59a1f7dd9bc7

📥 Commits

Reviewing files that changed from the base of the PR and between 4630b3c and a8d60ba.

📒 Files selected for processing (10)
  • skills/dpgen-simplify/SKILL.md
  • skills/dpgen-simplify/assets/machine.template.json
  • skills/dpgen-simplify/assets/machine.template.local-shell.json
  • skills/dpgen-simplify/assets/machine.template.server-local-slurm.json
  • skills/dpgen-simplify/assets/machine.template.ssh-remote-slurm.json
  • skills/dpgen-simplify/assets/param.example.qm7.from-official-docs.json
  • skills/dpgen-simplify/assets/param.template.json
  • skills/dpgen-simplify/references/machine-fields.md
  • skills/dpgen-simplify/references/param-fields.md
  • skills/dpgen-simplify/references/workflow-notes.md
✅ Files skipped from review due to trivial changes (7)
  • skills/dpgen-simplify/assets/param.template.json
  • skills/dpgen-simplify/assets/machine.template.local-shell.json
  • skills/dpgen-simplify/assets/machine.template.server-local-slurm.json
  • skills/dpgen-simplify/assets/machine.template.ssh-remote-slurm.json
  • skills/dpgen-simplify/references/workflow-notes.md
  • skills/dpgen-simplify/assets/param.example.qm7.from-official-docs.json
  • skills/dpgen-simplify/references/machine-fields.md

Comment on lines +12 to +17
"resources": {
"number_node": null,
"cpu_per_node": null,
"gpu_per_node": null,
"group_size": null
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add resources.source_list to all stage templates for scheduler compatibility.

SKILL.md requires explicit inner-stage environment activation via resources.source_list, but this base template omits the field in every stage. That mismatch can lead to invalid scheduler configs being generated from this template.

💡 Suggested patch
     "train": {
         "command": "dp",
         "machine": {
             "batch_type": null,
             "context_type": null,
             "local_root": "./",
             "remote_root": null
         },
         "resources": {
             "number_node": null,
             "cpu_per_node": null,
             "gpu_per_node": null,
-            "group_size": null
+            "group_size": null,
+            "source_list": null
         }
     },
@@
         "resources": {
             "number_node": null,
             "cpu_per_node": null,
             "gpu_per_node": null,
-            "group_size": null
+            "group_size": null,
+            "source_list": null
         }
     },
@@
         "resources": {
             "number_node": null,
             "cpu_per_node": null,
             "gpu_per_node": null,
-            "group_size": null
+            "group_size": null,
+            "source_list": null
         }
     }

Also applies to: 27-32, 42-47

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@skills/dpgen-simplify/assets/machine.template.json` around lines 12 - 17, The
stage templates' "resources" objects (e.g., the block showing "number_node",
"cpu_per_node", "gpu_per_node", "group_size") are missing the required
"source_list" field; update every stage template resources object in
machine.template.json to include "source_list" (an array or null as used
convention in this file) so generated scheduler configs include inner-stage
environment activation per SKILL.md—ensure you add "source_list" alongside the
existing keys in every resources block referenced (also at the other occurrences
around lines 27-32 and 42-47).


List of initial system indices for training.

Can be empty when starting fully from `pick_data`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix sentence fragment for init_data_sys note.

This line is missing a subject and reads as a fragment. Make it a complete sentence for clarity.

✏️ Suggested patch
-Can be empty when starting fully from `pick_data`.
+It can be empty when starting fully from `pick_data`.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Can be empty when starting fully from `pick_data`.
It can be empty when starting fully from `pick_data`.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@skills/dpgen-simplify/references/param-fields.md` at line 31, The note for
init_data_sys is a sentence fragment ("Can be empty when starting fully from
`pick_data`"); rewrite it as a complete sentence mentioning the subject and
context — e.g. "The init_data_sys field can be empty when starting fully from
`pick_data`." — by editing the line in
skills/dpgen-simplify/references/param-fields.md where `init_data_sys` is
described so the note reads as a full sentence and remains clear and concise.

@njzjz njzjz merged commit c618be3 into deepmodeling:master May 8, 2026
3 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants