chore(skills): import DP-GEN simplify agent skill by njzjz-bot · Pull Request #1879 · deepmodeling/dpgen

njzjz-bot · 2026-05-07T18:52:01Z

Problem

The DP-GEN simplify agent skill currently lives in jinzhezenggroup/computational-chemistry-agent-skills rather than beside the DP-GEN source tree.
We want to preserve the upstream skill history during migration instead of copying only the final snapshot.

Change

Import the dpgen-simplify skill into skills/.
Replay all upstream commits touching either the original simplify/dpgen-simplify path or the renamed machine-learning-potentials/dpgen-simplify path as separate commits.
Preserve original authorship/date and record the source commit in each commit body.

Notes

Imported paths: simplify/dpgen-simplify, machine-learning-potentials/dpgen-simplify.
Destination path: skills/dpgen-simplify.
Validation: git diff --check origin/master...HEAD.
One final style-only commit reapplies JSON formatting equivalent to pre-commit.ci after rebuilding the branch.

Authored by OpenClaw (model: gpt-5.5)

Summary by CodeRabbit

Documentation
- Added comprehensive skill documentation and reference guides for the dpgen-simplify workflow, including field specifications, validation procedures, and workflow best practices.
New Features
- Added configuration templates for various execution environments (local, server-based, and remote).
- Added parameter templates and example configurations for workflow setup.

coderabbitai · 2026-05-07T18:52:15Z

📝 Walkthrough

Walkthrough

This pull request introduces a new dpgen-simplify GitHub skill module for the DP-GEN workflow. It defines baseline JSON templates, machine-specific execution profiles (local shell, local Slurm, SSH remote Slurm), a QM7 parameter example, and comprehensive documentation covering skill definition, parameter field reference, machine configuration reference, and execution workflow guidance with validation checklists.

Changes

DP-GEN Simplify Skill Module

Layer / File(s)	Summary
Parameter & Machine Templates `skills/dpgen-simplify/assets/param.template.json`, `skills/dpgen-simplify/assets/machine.template.json`	Baseline JSON templates define configuration schema with null/default values for versioning, stage separation (`train`, `model_devi`, `fp`), and resource fields.
Machine Configuration Profiles `skills/dpgen-simplify/assets/machine.template.local-shell.json`, `machine.template.server-local-slurm.json`, `machine.template.ssh-remote-slurm.json`	Three execution environment templates: local shell with `LazyLocalContext`, local Slurm with `LocalContext`, and SSH-based remote Slurm with `SSHContext` and remote profile settings.
Example Parameter Configuration `skills/dpgen-simplify/assets/param.example.qm7.from-official-docs.json`	Concrete QM7 dataset example demonstrating type/mass maps, dataset paths, training hyperparameters, descriptor/fitting networks, learning schedules, loss weights, fingerprint settings, and model deviation thresholds.
Skill Definition & Usage Rules `skills/dpgen-simplify/SKILL.md`	Main specification document defining skill purpose, two-file configuration requirement (`param.json`/`machine.json`), agent responsibilities, working policy rules, runtime boundaries between launcher and stage jobs, minimum required inputs, param/machine construction guidance, pre-run validation steps, and output contract.
Machine Configuration Reference `skills/dpgen-simplify/references/machine-fields.md`	Field guidance for `machine.json` covering stage separation, runtime profile selection (`context_type`/`batch_type` mapping), environment boundary between outer and inner stages, per-stage configuration concerns, and practical construction advice avoiding invented scheduler/environment names.
Parameter Configuration Reference `skills/dpgen-simplify/references/param-fields.md`	Field guidance for `param.json` organized by role: dataset inputs, simplify selection windows, training config, FP setup, and a checklist emphasizing consistency for type mappings, descriptor family, FP backend, and threshold changes.
Execution Workflow & Validation `skills/dpgen-simplify/references/workflow-notes.md`	Step-by-step workflow from task confirmation through post-run reporting, including standard command syntax, validation checklist (version/JSON/paths/environment/scheduler settings), directory structure for repeated experiments, and post-run summary checklist.

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'chore(skills): import DP-GEN simplify agent skill' accurately summarizes the main change—importing the dpgen-simplify skill into the repository from an upstream source.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (4)

.github/skills/dpgen-simplify/assets/param.example.qm7.from-official-docs.json (1)
2-8: ⚡ Quick win

Consider documenting or eliminating the type_map duplication.

The type_map array appears at both the root level (lines 2-8) and within default_training_param.model.type_map (lines 26-31). While the values currently match, maintaining identical arrays in two locations creates a maintenance hazard—if one is updated without updating the other, silent inconsistencies could occur that violate the "keep type_map consistent" guidance from the field reference documentation.

Consider one of these approaches:

Add a comment warning that both arrays must be kept synchronized

Investigate whether dpgen can accept a reference to avoid duplication

Add validation in the workflow to ensure both arrays match

Also applies to: 26-31
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
@.github/skills/dpgen-simplify/assets/param.example.qm7.from-official-docs.json
around lines 2 - 8, The root-level "type_map" and
"default_training_param.model.type_map" are duplicated; add a config validation
step when loading/parsing this JSON that explicitly compares these two arrays
(root "type_map" vs "default_training_param.model.type_map") and fails fast with
a clear error/log message showing both values if they differ, or collapse the
duplication by removing the root "type_map" and only keeping
"default_training_param.model.type_map" (or vice versa) if dpgen supports a
single location—implement the equality check in your config-loader/validation
routine so mismatches cannot silently occur.
.github/skills/dpgen-simplify/SKILL.md (2)
341-341: ⚡ Quick win

Add a trailing newline at end of file.

The file should end with a newline character for POSIX compliance and to avoid warnings from some text processing tools.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/skills/dpgen-simplify/SKILL.md at line 341, Add a trailing newline
to the end of SKILL.md so the file terminates with a newline character
(POSIX-compliant); open the file (.github/skills/dpgen-simplify/SKILL.md), go to
the end of the file (after the last line containing the simplify machine
definitions link) and insert a single newline character so the file ends with a
newline.
84-122: ⚡ Quick win

Clarify the section numbering hierarchy.

The current numbering structure is ambiguous:

Section "4. Do not invent environment activation commands" (line 84) has standalone content

Then subsections "4.1 Outer launcher policy" (line 100) and "4.2 Outer vs inner runtime boundaries" (line 110) appear

It's unclear whether 4.1 and 4.2 are nested under section 4 or if they're related follow-ups

Consider either:

Renumbering 4.1 and 4.2 as standalone sections (5 and 6), and shifting "Prefer reproducible output layout" to section 7

Restructuring section 4 to clearly indicate it contains subsections, perhaps with a different heading like "### 4. Environment activation policies" that encompasses all three pieces

This will improve documentation clarity and help readers understand the logical grouping.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/skills/dpgen-simplify/SKILL.md around lines 84 - 122, The section
numbering is inconsistent: "4. Do not invent environment activation commands" is
presented as a main section while "4.1 Outer launcher policy" and "4.2 Outer vs
inner runtime boundaries (critical)" appear to be ambiguous subsections; update
the headings so the hierarchy is clear by either (A) renumbering "4.1" and "4.2"
to standalone sections "5" and "6" and bumping "Prefer reproducible output
layout" to "7", or (B) renaming the main heading to "4. Environment activation
policies" (or similar) and ensure "4.1 Outer launcher policy" and "4.2 Outer vs
inner runtime boundaries" are nested under it, keeping the exact text for the
policies unchanged; adjust the heading markers for "4. Do not invent environment
activation commands", "4.1 Outer launcher policy", "4.2 Outer vs inner runtime
boundaries (critical)", and "Prefer reproducible output layout" accordingly so
the document structure is unambiguous.
.github/skills/dpgen-simplify/references/machine-fields.md (1)
97-97: ⚡ Quick win

Add a trailing newline at end of file.

The file should end with a newline character for POSIX compliance and to avoid warnings from some text processing tools.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/skills/dpgen-simplify/references/machine-fields.md at line 97, Add a
POSIX-compliant trailing newline to the end of the file by ensuring the last
line ("if the user already has a working template, patch it instead of rewriting
everything") is terminated with a newline character; update the file so its
final byte is a newline to avoid tooling warnings.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In
@.github/skills/dpgen-simplify/assets/param.example.qm7.from-official-docs.json:
- Around line 2-8: The root-level "type_map" and
"default_training_param.model.type_map" are duplicated; add a config validation
step when loading/parsing this JSON that explicitly compares these two arrays
(root "type_map" vs "default_training_param.model.type_map") and fails fast with
a clear error/log message showing both values if they differ, or collapse the
duplication by removing the root "type_map" and only keeping
"default_training_param.model.type_map" (or vice versa) if dpgen supports a
single location—implement the equality check in your config-loader/validation
routine so mismatches cannot silently occur.

In @.github/skills/dpgen-simplify/references/machine-fields.md:
- Line 97: Add a POSIX-compliant trailing newline to the end of the file by
ensuring the last line ("if the user already has a working template, patch it
instead of rewriting everything") is terminated with a newline character; update
the file so its final byte is a newline to avoid tooling warnings.

In @.github/skills/dpgen-simplify/SKILL.md:
- Line 341: Add a trailing newline to the end of SKILL.md so the file terminates
with a newline character (POSIX-compliant); open the file
(.github/skills/dpgen-simplify/SKILL.md), go to the end of the file (after the
last line containing the simplify machine definitions link) and insert a single
newline character so the file ends with a newline.
- Around line 84-122: The section numbering is inconsistent: "4. Do not invent
environment activation commands" is presented as a main section while "4.1 Outer
launcher policy" and "4.2 Outer vs inner runtime boundaries (critical)" appear
to be ambiguous subsections; update the headings so the hierarchy is clear by
either (A) renumbering "4.1" and "4.2" to standalone sections "5" and "6" and
bumping "Prefer reproducible output layout" to "7", or (B) renaming the main
heading to "4. Environment activation policies" (or similar) and ensure "4.1
Outer launcher policy" and "4.2 Outer vs inner runtime boundaries" are nested
under it, keeping the exact text for the policies unchanged; adjust the heading
markers for "4. Do not invent environment activation commands", "4.1 Outer
launcher policy", "4.2 Outer vs inner runtime boundaries (critical)", and
"Prefer reproducible output layout" accordingly so the document structure is
unambiguous.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 867352cf-1a0b-4a82-ba4a-7dae01aa2874

📥 Commits

Reviewing files that changed from the base of the PR and between f4f74a2 and 4630b3c.

📒 Files selected for processing (10)

.github/skills/dpgen-simplify/SKILL.md
.github/skills/dpgen-simplify/assets/machine.template.json
.github/skills/dpgen-simplify/assets/machine.template.local-shell.json
.github/skills/dpgen-simplify/assets/machine.template.server-local-slurm.json
.github/skills/dpgen-simplify/assets/machine.template.ssh-remote-slurm.json
.github/skills/dpgen-simplify/assets/param.example.qm7.from-official-docs.json
.github/skills/dpgen-simplify/assets/param.template.json
.github/skills/dpgen-simplify/references/machine-fields.md
.github/skills/dpgen-simplify/references/param-fields.md
.github/skills/dpgen-simplify/references/workflow-notes.md

Imported from jinzhezenggroup/computational-chemistry-agent-skills. Upstream-Commit: jinzhezenggroup/computational-chemistry-agent-skills@885861f Upstream-Paths: - simplify/dpgen-simplify - machine-learning-potentials/dpgen-simplify

…ng-potentials (deepmodeling#65) Imported from jinzhezenggroup/computational-chemistry-agent-skills. Upstream-Commit: jinzhezenggroup/computational-chemistry-agent-skills@a019524 Upstream-Paths: - simplify/dpgen-simplify - machine-learning-potentials/dpgen-simplify

…eepmodeling#67) Imported from jinzhezenggroup/computational-chemistry-agent-skills. Upstream-Commit: jinzhezenggroup/computational-chemistry-agent-skills@0475acb Upstream-Paths: - simplify/dpgen-simplify - machine-learning-potentials/dpgen-simplify

Apply the same JSON indentation as pre-commit.ci after rebuilding the history-preserving migration branch. Authored by OpenClaw (model: gpt-5.5)

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@skills/dpgen-simplify/assets/machine.template.json`:
- Around line 12-17: The stage templates' "resources" objects (e.g., the block
showing "number_node", "cpu_per_node", "gpu_per_node", "group_size") are missing
the required "source_list" field; update every stage template resources object
in machine.template.json to include "source_list" (an array or null as used
convention in this file) so generated scheduler configs include inner-stage
environment activation per SKILL.md—ensure you add "source_list" alongside the
existing keys in every resources block referenced (also at the other occurrences
around lines 27-32 and 42-47).

In `@skills/dpgen-simplify/references/param-fields.md`:
- Line 31: The note for init_data_sys is a sentence fragment ("Can be empty when
starting fully from `pick_data`"); rewrite it as a complete sentence mentioning
the subject and context — e.g. "The init_data_sys field can be empty when
starting fully from `pick_data`." — by editing the line in
skills/dpgen-simplify/references/param-fields.md where `init_data_sys` is
described so the note reads as a full sentence and remains clear and concise.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f196057c-5ad8-4011-906b-59a1f7dd9bc7

📥 Commits

Reviewing files that changed from the base of the PR and between 4630b3c and a8d60ba.

📒 Files selected for processing (10)

skills/dpgen-simplify/SKILL.md
skills/dpgen-simplify/assets/machine.template.json
skills/dpgen-simplify/assets/machine.template.local-shell.json
skills/dpgen-simplify/assets/machine.template.server-local-slurm.json
skills/dpgen-simplify/assets/machine.template.ssh-remote-slurm.json
skills/dpgen-simplify/assets/param.example.qm7.from-official-docs.json
skills/dpgen-simplify/assets/param.template.json
skills/dpgen-simplify/references/machine-fields.md
skills/dpgen-simplify/references/param-fields.md
skills/dpgen-simplify/references/workflow-notes.md

✅ Files skipped from review due to trivial changes (7)

skills/dpgen-simplify/assets/param.template.json
skills/dpgen-simplify/assets/machine.template.local-shell.json
skills/dpgen-simplify/assets/machine.template.server-local-slurm.json
skills/dpgen-simplify/assets/machine.template.ssh-remote-slurm.json
skills/dpgen-simplify/references/workflow-notes.md
skills/dpgen-simplify/assets/param.example.qm7.from-official-docs.json
skills/dpgen-simplify/references/machine-fields.md

coderabbitai · 2026-05-08T00:46:58Z

+        "resources": {
+            "number_node": null,
+            "cpu_per_node": null,
+            "gpu_per_node": null,
+            "group_size": null
+        }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add resources.source_list to all stage templates for scheduler compatibility.

SKILL.md requires explicit inner-stage environment activation via resources.source_list, but this base template omits the field in every stage. That mismatch can lead to invalid scheduler configs being generated from this template.

💡 Suggested patch

"train": { "command": "dp", "machine": { "batch_type": null, "context_type": null, "local_root": "./", "remote_root": null }, "resources": { "number_node": null, "cpu_per_node": null, "gpu_per_node": null, - "group_size": null + "group_size": null, + "source_list": null } }, @@ "resources": { "number_node": null, "cpu_per_node": null, "gpu_per_node": null, - "group_size": null + "group_size": null, + "source_list": null } }, @@ "resources": { "number_node": null, "cpu_per_node": null, "gpu_per_node": null, - "group_size": null + "group_size": null, + "source_list": null } }

Also applies to: 27-32, 42-47

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skills/dpgen-simplify/assets/machine.template.json` around lines 12 - 17, The stage templates' "resources" objects (e.g., the block showing "number_node", "cpu_per_node", "gpu_per_node", "group_size") are missing the required "source_list" field; update every stage template resources object in machine.template.json to include "source_list" (an array or null as used convention in this file) so generated scheduler configs include inner-stage environment activation per SKILL.md—ensure you add "source_list" alongside the existing keys in every resources block referenced (also at the other occurrences around lines 27-32 and 42-47).

coderabbitai · 2026-05-08T00:46:58Z

+
+List of initial system indices for training.
+
+Can be empty when starting fully from `pick_data`.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix sentence fragment for init_data_sys note.

This line is missing a subject and reads as a fragment. Make it a complete sentence for clarity.

✏️ Suggested patch

-Can be empty when starting fully from `pick_data`. +It can be empty when starting fully from `pick_data`.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Can be empty when starting fully from `pick_data`.

It can be empty when starting fully from `pick_data`.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skills/dpgen-simplify/references/param-fields.md` at line 31, The note for init_data_sys is a sentence fragment ("Can be empty when starting fully from `pick_data`"); rewrite it as a complete sentence mentioning the subject and context — e.g. "The init_data_sys field can be empty when starting fully from `pick_data`." — by editing the line in skills/dpgen-simplify/references/param-fields.md where `init_data_sys` is described so the note reads as a full sentence and remains clear and concise.

coderabbitai Bot reviewed May 7, 2026

View reviewed changes

njzjz-bot force-pushed the chore/import-agent-skills branch from 4630b3c to 18b6360 Compare May 7, 2026 19:02

hyb1109 and others added 4 commits May 8, 2026 00:43

style(skills): format dpgen JSON templates

a8d60ba

Apply the same JSON indentation as pre-commit.ci after rebuilding the history-preserving migration branch. Authored by OpenClaw (model: gpt-5.5)

njzjz-bot force-pushed the chore/import-agent-skills branch from 18b6360 to a8d60ba Compare May 8, 2026 00:44

njzjz-bot mentioned this pull request May 8, 2026

chore(sync): track DP-GEN skill jinzhezenggroup/computational-chemistry-agent-skills#76

Merged

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

njzjz merged commit c618be3 into deepmodeling:master May 8, 2026
3 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(skills): import DP-GEN simplify agent skill#1879

chore(skills): import DP-GEN simplify agent skill#1879
njzjz merged 4 commits into
deepmodeling:masterfrom
njzjz-bot:chore/import-agent-skills

njzjz-bot commented May 7, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 7, 2026 •

edited

Loading

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 8, 2026

Uh oh!

coderabbitai Bot May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		List of initial system indices for training.

		Can be empty when starting fully from `pick_data`.

	Can be empty when starting fully from `pick_data`.
	It can be empty when starting fully from `pick_data`.

Conversation

njzjz-bot commented May 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

njzjz-bot commented May 7, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 7, 2026 •

edited

Loading