Skip to content

refactor: rename thread_count to files_per_rank#67

Open
ronaldw07 wants to merge 15 commits intogoogle:mainfrom
ronaldw07:rename-thread-count-to-files-per-rank
Open

refactor: rename thread_count to files_per_rank#67
ronaldw07 wants to merge 15 commits intogoogle:mainfrom
ronaldw07:rename-thread-count-to-files-per-rank

Conversation

@ronaldw07
Copy link
Copy Markdown
Contributor

What

Renames all thread_count parameters (and write_thread_count at the user-facing API level) to files_per_rank / write_files_per_rank for transparency.

Why

The parameter controls how many files each rank writes to during checkpointing — not a concurrency thread count. There is a separate, unrelated thread count concept in the codebase (torch_thread_count), which made the old name confusing. The new name makes the intent immediately clear.

Changes

  • thread_countfiles_per_rank in MemoryStorageWriter, DefaultMLFlashpointCheckpointSaver.write_data(), and MLFlashpointCheckpointSaver.write_data() (abstract)
  • write_thread_countwrite_files_per_rank in wrapper_util.py user-facing API
  • self._thread_countself._files_per_rank instance variable in MemoryStorageWriter
  • Updated all tests and docs accordingly

Closes #66

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the checkpointing mechanism by renaming the thread_count parameter to files_per_rank. This change addresses potential confusion with other thread-related concepts in the codebase, such as torch_thread_count, by providing a more precise name that clearly indicates its role in managing the number of files written per rank during the checkpointing process. The refactoring improves code readability and maintainability by ensuring that parameter names accurately convey their intended functionality.

Highlights

  • Parameter Renaming: The thread_count parameter has been renamed to files_per_rank across the codebase to accurately reflect its function of controlling the number of files each rank writes during checkpointing, rather than a concurrency thread count.
  • User-Facing API Update: The user-facing API parameter write_thread_count in wrapper_util.py has been updated to write_files_per_rank for improved clarity and consistency.
  • Internal Variable Renaming: The internal instance variable self._thread_count in MemoryStorageWriter has been renamed to self._files_per_rank.
  • Documentation and Test Updates: All relevant documentation (changelog, troubleshooting, user guide) and test cases have been updated to reflect the new parameter names.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • docs/changelog.md
    • Updated changelog entry to reflect files_per_rank instead of thread_count for writer configuration.
    • Updated changelog entry to reflect write_files_per_rank instead of write_thread_count for user exposure.
  • docs/troubleshooting.md
    • Updated troubleshooting guide to refer to write_files_per_rank for increasing the number of files per rank.
  • docs/user-guide.md
    • Updated user guide example to use write_files_per_rank instead of write_thread_count.
  • src/ml_flashpoint/adapter/megatron/save_strategies.py
    • Updated MemoryStorageWriter initialization to pass files_per_rank instead of thread_count.
  • src/ml_flashpoint/adapter/nemo/wrapper_util.py
    • Renamed write_thread_count parameter to write_files_per_rank in wrap_trainer_and_auto_resume_with_mlflashpoint function signature.
    • Updated docstring for write_files_per_rank to clarify its meaning as files per rank.
    • Passed write_files_per_rank to internal components instead of write_thread_count.
    • Renamed write_thread_count parameter to write_files_per_rank in wrap_trainer_checkpoint_io_with_mlflashpoint function signature.
    • Updated docstring for write_files_per_rank to clarify its meaning as files per rank.
    • Updated validation logic to check write_files_per_rank instead of write_thread_count.
    • Passed files_per_rank to MemoryStorageWriter initialization instead of thread_count.
  • src/ml_flashpoint/adapter/pytorch/memory_storage_writer.py
    • Renamed thread_count parameter to files_per_rank in MemoryStorageWriter constructor.
    • Updated docstring for files_per_rank to clarify its meaning.
    • Renamed internal instance variable _thread_count to _files_per_rank.
    • Updated warning message and assignment to use files_per_rank.
    • Updated prepare_write_data_buckets to pass self._files_per_rank as bucket_count.
    • Updated write_staged_data_buckets to pass self._files_per_rank as files_per_rank.
  • src/ml_flashpoint/core/checkpoint_saver.py
    • Renamed thread_count parameter to files_per_rank in write_data method signature.
    • Updated docstring for files_per_rank to clarify its meaning.
    • Updated debug log message to use files_per_rank instead of thread_count.
    • Updated comment to refer to files_per_rank instead of thread_count.
    • Renamed thread_count parameter to files_per_rank in write_data method implementation.
    • Updated calculation of torch_thread_count to use files_per_rank.
    • Updated debug log message to use files_per_rank instead of thread_count.
    • Updated debug log message for spawning writer threads to use files_per_rank.
  • tests/adapter/megatron/test_save_strategies.py
    • Updated MemoryStorageWriter assertion to check files_per_rank instead of thread_count.
    • Renamed expected_thread_count parameter to expected_files_per_rank in test function.
    • Updated test function name to test_async_save_reinitializes_storage_writer_with_files_per_rank.
    • Updated test docstring to refer to files_per_rank.
    • Updated assignment of storage_writer attribute to _files_per_rank.
    • Updated MemoryStorageWriter assertion to check files_per_rank instead of thread_count.
  • tests/adapter/nemo/test_wrapper_util.py
    • Updated wrap_trainer_checkpoint_io_with_mlflashpoint call to use write_files_per_rank.
    • Renamed thread_count_kwarg and expected_thread_count to files_per_rank_kwarg and expected_files_per_rank in test parameters.
    • Updated test function name to test_write_files_per_rank_forwarding.
    • Updated test docstring to refer to write_files_per_rank.
    • Updated function call to use files_per_rank_kwarg.
    • Updated assertion to check files_per_rank instead of thread_count.
    • Updated test function name to test_validation_invalid_write_files_per_rank.
    • Updated error message match to write_files_per_rank.
    • Updated function call to use write_files_per_rank.
    • Renamed thread_count_kwarg and expected_thread_count to files_per_rank_kwarg and expected_files_per_rank in test parameters.
    • Updated test function name to test_write_files_per_rank_forwarding.
    • Updated test docstring to refer to write_files_per_rank.
    • Updated function call to use files_per_rank_kwarg.
    • Updated assertion to check files_per_rank instead of thread_count.
  • tests/adapter/pytorch/test_memory_storage_writer.py
    • Updated assertion to check _files_per_rank instead of _thread_count.
    • Renamed thread_count and expected_thread_count to files_per_rank and expected_files_per_rank in test parameters.
    • Updated test function name to test_init_files_per_rank.
    • Updated test docstring to refer to _files_per_rank.
    • Updated MemoryStorageWriter initialization to use files_per_rank.
    • Updated assertion to check _files_per_rank instead of _thread_count.
    • Renamed thread_count to files_per_rank in test parameters.
    • Updated test function name to test_prepare_write_data_buckets_with_files_per_rank.
    • Updated test docstring to refer to files_per_rank.
    • Updated MemoryStorageWriter initialization to use files_per_rank.
    • Updated CheckpointContainerId path to reflect files_per_rank.
    • Updated prepare_write_data assertion to use files_per_rank.
    • Renamed thread_count to files_per_rank in test parameters.
    • Updated test function name to test_write_staged_data_buckets_with_explicit_files_per_rank.
    • Updated test docstring to refer to files_per_rank.
    • Updated MemoryStorageWriter initialization to use files_per_rank.
    • Updated CheckpointContainerId path to reflect files_per_rank.
    • Updated write_data assertion to use files_per_rank.
    • Updated write_data assertion to use files_per_rank.
    • Updated write_data assertion to use files_per_rank.
    • Renamed thread_count to files_per_rank in test parameters.
    • Updated test function name to test_write_data_files_per_rank_less_than_1.
    • Updated CheckpointContainerId path to reflect files_per_rank.
    • Updated write_data call to use files_per_rank.
    • Updated write_data call to use files_per_rank.
  • tests/core/test_checkpoint_saver.py
    • Updated write_data call to use files_per_rank instead of thread_count.
    • Updated write_data call to use files_per_rank instead of thread_count.
    • Renamed thread_count to files_per_rank in test parameters.
    • Updated CheckpointContainerId path to reflect files_per_rank.
    • Updated prepare_write_data call to use files_per_rank.
    • Updated write_data call to use files_per_rank.
    • Renamed thread_count to files_per_rank in test parameters.
    • Updated CheckpointContainerId path to reflect files_per_rank.
    • Updated prepare_write_data call to use files_per_rank.
    • Updated write_data call to use files_per_rank.
    • Renamed thread_count to files_per_rank in test parameters.
    • Updated CheckpointContainerId path to reflect files_per_rank.
    • Updated prepare_write_data call to use files_per_rank.
    • Updated write_data call to use files_per_rank.
    • Renamed thread_count to files_per_rank in test parameters.
    • Updated CheckpointContainerId path to reflect files_per_rank.
    • Updated prepare_write_data call to use files_per_rank.
    • Updated write_data call to use files_per_rank.
    • Renamed thread_count to files_per_rank in test parameters.
    • Updated CheckpointContainerId path to reflect files_per_rank.
    • Updated prepare_write_data call to use files_per_rank.
    • Updated write_data call to use files_per_rank.
    • Renamed thread_count to files_per_rank in test parameters.
    • Updated CheckpointContainerId path to reflect files_per_rank.
    • Updated prepare_write_data call to use files_per_rank.
    • Updated write_data call to use files_per_rank.
    • Renamed thread_count to files_per_rank in test parameters.
    • Updated CheckpointContainerId path to reflect files_per_rank.
    • Updated prepare_write_data call to use files_per_rank.
    • Updated write_data call to use files_per_rank.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a great refactoring that renames thread_count to files_per_rank. The new name is much more descriptive and transparent about what the parameter controls, avoiding confusion with other thread-related concepts in the codebase. The changes are applied consistently across documentation, source code, and tests. I've found one minor issue in a test docstring, which I've commented on.

Renames all occurrences of thread_count (and write_thread_count at the
user-facing API level) to files_per_rank / write_files_per_rank for
transparency. The parameter controls how many files each rank writes to,
not a concurrency thread count, which is a separate concept in the
codebase.

Closes google#66

# Conflicts:
#	src/ml_flashpoint/adapter/megatron/save_strategies.py
#	src/ml_flashpoint/adapter/nemo/wrapper_util.py
#	src/ml_flashpoint/adapter/pytorch/memory_storage_writer.py
#	tests/adapter/megatron/test_save_strategies.py
@ronaldw07 ronaldw07 force-pushed the rename-thread-count-to-files-per-rank branch from 128e11a to f291c11 Compare March 3, 2026 21:32
- Update outdated docstring in test_validation_invalid_write_files_per_rank
- Fix E501 line-too-long errors introduced by the thread_count -> files_per_rank rename
@ronaldw07
Copy link
Copy Markdown
Contributor Author

Fixed the outdated docstring flagged in the review — updated it to refer to write_files_per_rank. Also fixed four E501 line-too-long lint errors in the test files that were introduced by the rename (the longer parameter name pushed some lines past the 120-char limit). Full ruff check now matches main (only pre-existing errors unrelated to this PR remain).

Copy link
Copy Markdown
Collaborator

@g-husam g-husam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! Make sure to resolve the conflicts

@ronaldw07
Copy link
Copy Markdown
Contributor Author

Resolved the merge conflict in wrapper_util.py (from the use_cached_ckpt_structure feature added in main) and merged upstream main into the branch. Also reverted the two historical commit message lines in changelog.md back to their original wording per the inline feedback.

@ronaldw07 ronaldw07 force-pushed the rename-thread-count-to-files-per-rank branch from 8eeb937 to bce6888 Compare March 10, 2026 22:09
@github-actions
Copy link
Copy Markdown

Python Code Coverage Summary

Code Coverage

Package Line Rate Branch Rate Health
src.ml_flashpoint 100% 100%
src.ml_flashpoint.adapter 100% 100%
src.ml_flashpoint.adapter.megatron 98% 95%
src.ml_flashpoint.adapter.nemo 98% 94%
src.ml_flashpoint.adapter.pytorch 99% 92%
src.ml_flashpoint.checkpoint_object_manager 92% 91%
src.ml_flashpoint.core 96% 92%
src.ml_flashpoint.replication 81% 81%
Summary 95% (2098 / 2210) 91% (484 / 530)

Minimum allowed line rate is 90%

@github-actions
Copy link
Copy Markdown

C++ Code Coverage Summary

Code Coverage

Package Line Rate Branch Rate Health
src.ml_flashpoint.checkpoint_object_manager.buffer_object 93% 54%
src.ml_flashpoint.checkpoint_object_manager.object_manager 70% 37%
src.ml_flashpoint.replication.transfer_service 79% 40%
Summary 81% (916 / 1126) 43% (688 / 1604)

Minimum allowed line rate is 80%

@g-husam
Copy link
Copy Markdown
Collaborator

g-husam commented Mar 27, 2026

please make sure to always run the linter locally and the tests you are writing/modifying. right now looks like ruff check is failing

ronaldw07 and others added 2 commits March 31, 2026 23:44
- Fix missed write_thread_count reference in wrapper_util.py
- Rename thread_count property to files_per_rank in save_strategies.py
- Update _thread_count to _files_per_rank in test_wrapper_util.py
- Update thread_count to files_per_rank in test_checkpoint_io.py
- Merge upstream/main to resolve conflict in wrapper_util.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rename thread_count to files_per_rank

2 participants