refactor: rename thread_count to files_per_rank by ronaldw07 · Pull Request #67 · google/ml-flashpoint

ronaldw07 · 2026-03-03T00:51:24Z

What

Renames all thread_count parameters (and write_thread_count at the user-facing API level) to files_per_rank / write_files_per_rank for transparency.

Why

The parameter controls how many files each rank writes to during checkpointing — not a concurrency thread count. There is a separate, unrelated thread count concept in the codebase (torch_thread_count), which made the old name confusing. The new name makes the intent immediately clear.

Changes

thread_count → files_per_rank in MemoryStorageWriter, DefaultMLFlashpointCheckpointSaver.write_data(), and MLFlashpointCheckpointSaver.write_data() (abstract)
write_thread_count → write_files_per_rank in wrapper_util.py user-facing API
self._thread_count → self._files_per_rank instance variable in MemoryStorageWriter
Updated all tests and docs accordingly

Closes #66

gemini-code-assist · 2026-03-03T00:51:54Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the checkpointing mechanism by renaming the thread_count parameter to files_per_rank. This change addresses potential confusion with other thread-related concepts in the codebase, such as torch_thread_count, by providing a more precise name that clearly indicates its role in managing the number of files written per rank during the checkpointing process. The refactoring improves code readability and maintainability by ensuring that parameter names accurately convey their intended functionality.

Highlights

Parameter Renaming: The thread_count parameter has been renamed to files_per_rank across the codebase to accurately reflect its function of controlling the number of files each rank writes during checkpointing, rather than a concurrency thread count.
User-Facing API Update: The user-facing API parameter write_thread_count in wrapper_util.py has been updated to write_files_per_rank for improved clarity and consistency.
Internal Variable Renaming: The internal instance variable self._thread_count in MemoryStorageWriter has been renamed to self._files_per_rank.
Documentation and Test Updates: All relevant documentation (changelog, troubleshooting, user guide) and test cases have been updated to reflect the new parameter names.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

docs/changelog.md
- Updated changelog entry to reflect files_per_rank instead of thread_count for writer configuration.
- Updated changelog entry to reflect write_files_per_rank instead of write_thread_count for user exposure.
docs/troubleshooting.md
- Updated troubleshooting guide to refer to write_files_per_rank for increasing the number of files per rank.
docs/user-guide.md
- Updated user guide example to use write_files_per_rank instead of write_thread_count.
src/ml_flashpoint/adapter/megatron/save_strategies.py
- Updated MemoryStorageWriter initialization to pass files_per_rank instead of thread_count.
src/ml_flashpoint/adapter/nemo/wrapper_util.py
- Renamed write_thread_count parameter to write_files_per_rank in wrap_trainer_and_auto_resume_with_mlflashpoint function signature.
- Updated docstring for write_files_per_rank to clarify its meaning as files per rank.
- Passed write_files_per_rank to internal components instead of write_thread_count.
- Renamed write_thread_count parameter to write_files_per_rank in wrap_trainer_checkpoint_io_with_mlflashpoint function signature.
- Updated docstring for write_files_per_rank to clarify its meaning as files per rank.
- Updated validation logic to check write_files_per_rank instead of write_thread_count.
- Passed files_per_rank to MemoryStorageWriter initialization instead of thread_count.
src/ml_flashpoint/adapter/pytorch/memory_storage_writer.py
- Renamed thread_count parameter to files_per_rank in MemoryStorageWriter constructor.
- Updated docstring for files_per_rank to clarify its meaning.
- Renamed internal instance variable _thread_count to _files_per_rank.
- Updated warning message and assignment to use files_per_rank.
- Updated prepare_write_data_buckets to pass self._files_per_rank as bucket_count.
- Updated write_staged_data_buckets to pass self._files_per_rank as files_per_rank.
src/ml_flashpoint/core/checkpoint_saver.py
- Renamed thread_count parameter to files_per_rank in write_data method signature.
- Updated docstring for files_per_rank to clarify its meaning.
- Updated debug log message to use files_per_rank instead of thread_count.
- Updated comment to refer to files_per_rank instead of thread_count.
- Renamed thread_count parameter to files_per_rank in write_data method implementation.
- Updated calculation of torch_thread_count to use files_per_rank.
- Updated debug log message to use files_per_rank instead of thread_count.
- Updated debug log message for spawning writer threads to use files_per_rank.
tests/adapter/megatron/test_save_strategies.py
- Updated MemoryStorageWriter assertion to check files_per_rank instead of thread_count.
- Renamed expected_thread_count parameter to expected_files_per_rank in test function.
- Updated test function name to test_async_save_reinitializes_storage_writer_with_files_per_rank.
- Updated test docstring to refer to files_per_rank.
- Updated assignment of storage_writer attribute to _files_per_rank.
- Updated MemoryStorageWriter assertion to check files_per_rank instead of thread_count.
tests/adapter/nemo/test_wrapper_util.py
- Updated wrap_trainer_checkpoint_io_with_mlflashpoint call to use write_files_per_rank.
- Renamed thread_count_kwarg and expected_thread_count to files_per_rank_kwarg and expected_files_per_rank in test parameters.
- Updated test function name to test_write_files_per_rank_forwarding.
- Updated test docstring to refer to write_files_per_rank.
- Updated function call to use files_per_rank_kwarg.
- Updated assertion to check files_per_rank instead of thread_count.
- Updated test function name to test_validation_invalid_write_files_per_rank.
- Updated error message match to write_files_per_rank.
- Updated function call to use write_files_per_rank.
- Renamed thread_count_kwarg and expected_thread_count to files_per_rank_kwarg and expected_files_per_rank in test parameters.
- Updated test function name to test_write_files_per_rank_forwarding.
- Updated test docstring to refer to write_files_per_rank.
- Updated function call to use files_per_rank_kwarg.
- Updated assertion to check files_per_rank instead of thread_count.
tests/adapter/pytorch/test_memory_storage_writer.py
- Updated assertion to check _files_per_rank instead of _thread_count.
- Renamed thread_count and expected_thread_count to files_per_rank and expected_files_per_rank in test parameters.
- Updated test function name to test_init_files_per_rank.
- Updated test docstring to refer to _files_per_rank.
- Updated MemoryStorageWriter initialization to use files_per_rank.
- Updated assertion to check _files_per_rank instead of _thread_count.
- Renamed thread_count to files_per_rank in test parameters.
- Updated test function name to test_prepare_write_data_buckets_with_files_per_rank.
- Updated test docstring to refer to files_per_rank.
- Updated MemoryStorageWriter initialization to use files_per_rank.
- Updated CheckpointContainerId path to reflect files_per_rank.
- Updated prepare_write_data assertion to use files_per_rank.
- Renamed thread_count to files_per_rank in test parameters.
- Updated test function name to test_write_staged_data_buckets_with_explicit_files_per_rank.
- Updated test docstring to refer to files_per_rank.
- Updated MemoryStorageWriter initialization to use files_per_rank.
- Updated CheckpointContainerId path to reflect files_per_rank.
- Updated write_data assertion to use files_per_rank.
- Updated write_data assertion to use files_per_rank.
- Updated write_data assertion to use files_per_rank.
- Renamed thread_count to files_per_rank in test parameters.
- Updated test function name to test_write_data_files_per_rank_less_than_1.
- Updated CheckpointContainerId path to reflect files_per_rank.
- Updated write_data call to use files_per_rank.
- Updated write_data call to use files_per_rank.
tests/core/test_checkpoint_saver.py
- Updated write_data call to use files_per_rank instead of thread_count.
- Updated write_data call to use files_per_rank instead of thread_count.
- Renamed thread_count to files_per_rank in test parameters.
- Updated CheckpointContainerId path to reflect files_per_rank.
- Updated prepare_write_data call to use files_per_rank.
- Updated write_data call to use files_per_rank.
- Renamed thread_count to files_per_rank in test parameters.
- Updated CheckpointContainerId path to reflect files_per_rank.
- Updated prepare_write_data call to use files_per_rank.
- Updated write_data call to use files_per_rank.
- Renamed thread_count to files_per_rank in test parameters.
- Updated CheckpointContainerId path to reflect files_per_rank.
- Updated prepare_write_data call to use files_per_rank.
- Updated write_data call to use files_per_rank.
- Renamed thread_count to files_per_rank in test parameters.
- Updated CheckpointContainerId path to reflect files_per_rank.
- Updated prepare_write_data call to use files_per_rank.
- Updated write_data call to use files_per_rank.
- Renamed thread_count to files_per_rank in test parameters.
- Updated CheckpointContainerId path to reflect files_per_rank.
- Updated prepare_write_data call to use files_per_rank.
- Updated write_data call to use files_per_rank.
- Renamed thread_count to files_per_rank in test parameters.
- Updated CheckpointContainerId path to reflect files_per_rank.
- Updated prepare_write_data call to use files_per_rank.
- Updated write_data call to use files_per_rank.
- Renamed thread_count to files_per_rank in test parameters.
- Updated CheckpointContainerId path to reflect files_per_rank.
- Updated prepare_write_data call to use files_per_rank.
- Updated write_data call to use files_per_rank.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request is a great refactoring that renames thread_count to files_per_rank. The new name is much more descriptive and transparent about what the parameter controls, avoiding confusion with other thread-related concepts in the codebase. The changes are applied consistently across documentation, source code, and tests. I've found one minor issue in a test docstring, which I've commented on.

tests/adapter/nemo/test_wrapper_util.py

Renames all occurrences of thread_count (and write_thread_count at the user-facing API level) to files_per_rank / write_files_per_rank for transparency. The parameter controls how many files each rank writes to, not a concurrency thread count, which is a separate concept in the codebase. Closes google#66 # Conflicts: # src/ml_flashpoint/adapter/megatron/save_strategies.py # src/ml_flashpoint/adapter/nemo/wrapper_util.py # src/ml_flashpoint/adapter/pytorch/memory_storage_writer.py # tests/adapter/megatron/test_save_strategies.py

- Update outdated docstring in test_validation_invalid_write_files_per_rank - Fix E501 line-too-long errors introduced by the thread_count -> files_per_rank rename

ronaldw07 · 2026-03-03T22:43:26Z

Fixed the outdated docstring flagged in the review — updated it to refer to write_files_per_rank. Also fixed four E501 line-too-long lint errors in the test files that were introduced by the rename (the longer parameter name pushed some lines past the 120-char limit). Full ruff check now matches main (only pre-existing errors unrelated to this PR remain).

g-husam

Thanks for this! Make sure to resolve the conflicts

docs/changelog.md

…ename-thread-count-to-files-per-rank

ronaldw07 · 2026-03-08T20:26:44Z

Resolved the merge conflict in wrapper_util.py (from the use_cached_ckpt_structure feature added in main) and merged upstream main into the branch. Also reverted the two historical commit message lines in changelog.md back to their original wording per the inline feedback.

…ename-thread-count-to-files-per-rank

src/ml_flashpoint/adapter/nemo/wrapper_util.py

github-actions · 2026-03-13T01:10:12Z

Python Code Coverage Summary

Package	Line Rate	Branch Rate	Health
src.ml_flashpoint	100%	100%	✔
src.ml_flashpoint.adapter	100%	100%	✔
src.ml_flashpoint.adapter.megatron	98%	95%	✔
src.ml_flashpoint.adapter.nemo	98%	94%	✔
src.ml_flashpoint.adapter.pytorch	99%	92%	✔
src.ml_flashpoint.checkpoint_object_manager	92%	91%	➖
src.ml_flashpoint.core	96%	92%	✔
src.ml_flashpoint.replication	81%	81%	❌
Summary	95% (2098 / 2210)	91% (484 / 530)	➖

Minimum allowed line rate is 90%

github-actions · 2026-03-13T01:10:15Z

C++ Code Coverage Summary

Package	Line Rate	Branch Rate	Health
src.ml_flashpoint.checkpoint_object_manager.buffer_object	93%	54%	✔
src.ml_flashpoint.checkpoint_object_manager.object_manager	70%	37%	❌
src.ml_flashpoint.replication.transfer_service	79%	40%	❌
Summary	81% (916 / 1126)	43% (688 / 1604)	➖

Minimum allowed line rate is 80%

…-rank' into rename-thread-count-to-files-per-rank

g-husam · 2026-03-27T00:30:13Z

please make sure to always run the linter locally and the tests you are writing/modifying. right now looks like ruff check is failing

- Fix missed write_thread_count reference in wrapper_util.py - Rename thread_count property to files_per_rank in save_strategies.py - Update _thread_count to _files_per_rank in test_wrapper_util.py - Update thread_count to files_per_rank in test_checkpoint_io.py - Merge upstream/main to resolve conflict in wrapper_util.py

gemini-code-assist bot reviewed Mar 3, 2026

View reviewed changes

tests/adapter/nemo/test_wrapper_util.py Outdated Show resolved Hide resolved

ronaldw07 force-pushed the rename-thread-count-to-files-per-rank branch from 128e11a to f291c11 Compare March 3, 2026 21:32

fix(tests): update stale docstring and fix lint errors

bf5ffd7

- Update outdated docstring in test_validation_invalid_write_files_per_rank - Fix E501 line-too-long errors introduced by the thread_count -> files_per_rank rename

g-husam reviewed Mar 6, 2026

View reviewed changes

docs/changelog.md Outdated Show resolved Hide resolved

ronaldw07 added 2 commits March 8, 2026 13:00

revert(docs): restore historical commit messages in changelog

d7a4e3a

Merge branch 'main' of https://github.com/google/ml-flashpoint into r…

353a891

…ename-thread-count-to-files-per-rank

Merge branch 'main' of https://github.com/google/ml-flashpoint into r…

bce6888

…ename-thread-count-to-files-per-rank

ronaldw07 force-pushed the rename-thread-count-to-files-per-rank branch from 8eeb937 to bce6888 Compare March 10, 2026 22:09

ronaldw07 and others added 2 commits March 10, 2026 15:12

fix(tests): wrap long function signature to satisfy E501 lint rule

84f54df

Merge branch 'main' into rename-thread-count-to-files-per-rank

814a55e

g-husam reviewed Mar 13, 2026

View reviewed changes

src/ml_flashpoint/adapter/nemo/wrapper_util.py Outdated Show resolved Hide resolved

src/ml_flashpoint/adapter/nemo/wrapper_util.py Outdated Show resolved Hide resolved

ronaldw07 and others added 6 commits March 12, 2026 22:35

Merge branch 'main' into rename-thread-count-to-files-per-rank

c7d0ee7

docs: clarify write_files_per_rank docstring in wrapper_util

4dc67a6

Merge remote-tracking branch 'origin/rename-thread-count-to-files-per…

fb6e578

…-rank' into rename-thread-count-to-files-per-rank

Merge branch 'main' into rename-thread-count-to-files-per-rank

0745c9b

Merge branch 'main' into rename-thread-count-to-files-per-rank

eb5a166

Merge branch 'main' into rename-thread-count-to-files-per-rank

ca50ff8

ronaldw07 and others added 2 commits March 31, 2026 23:44

Merge branch 'main' into rename-thread-count-to-files-per-rank

4ee835a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: rename thread_count to files_per_rank#67

refactor: rename thread_count to files_per_rank#67
ronaldw07 wants to merge 15 commits intogoogle:mainfrom
ronaldw07:rename-thread-count-to-files-per-rank

ronaldw07 commented Mar 3, 2026

Uh oh!

gemini-code-assist bot commented Mar 3, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

ronaldw07 commented Mar 3, 2026

Uh oh!

g-husam left a comment •

edited

Loading

Uh oh!

Uh oh!

ronaldw07 commented Mar 8, 2026

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 13, 2026

Uh oh!

github-actions bot commented Mar 13, 2026

Uh oh!

g-husam commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ronaldw07 commented Mar 3, 2026

What

Why

Changes

Uh oh!

gemini-code-assist bot commented Mar 3, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

ronaldw07 commented Mar 3, 2026

Uh oh!

g-husam left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ronaldw07 commented Mar 8, 2026

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 13, 2026

Python Code Coverage Summary

Uh oh!

github-actions bot commented Mar 13, 2026

C++ Code Coverage Summary

Uh oh!

g-husam commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

g-husam left a comment •

edited

Loading