Fix DeepCompile ZeRO-3 release parameter lifetime#8032
Conversation
7b2ae31 to
04ca757
Compare
|
The gathered buffer is registered in |
7af789e to
f075f74
Compare
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
f075f74 to
d715e86
Compare
|
Thank you for your review, @eternalNight! The actual issue is that, after #7489, the release op does not release the underlying gathered buffer storage. It is not an early-release issue. I updated the PR description to clarify this. This PR is much simpler now. |
Thanks for the clarification! That makes the picture much clearer. I still have one doubt, though. The You may capture a torch memory history which records exactly when (by recording the call stack) each storage is finally released. There may be more hints in the call stack about the residual reference. |
PR #7489 made ZeRO-3 all-gather allocate a padded base buffer for uneven shards and return a true-shape view into that buffer. That means the registry tensor and the tensor returned to the compiled graph no longer necessarily share the same
TensorImpl, although they still share the same underlying storage.The existing release path only did
set_data(empty)on the registry tensor before unregistering it. With the new base/view relationship, that clears the registry-side tensor metadata but does not resize the sharedStorageImplstill referenced by returned views. As a result, the padded gathered allocation can remain live after the finalrelease_param.This patch keeps the release graph ordering unchanged and makes final non-persistent release resize the registered gathered storage to 0 bytes before unregistering it.