Skip to content

Fix NaN gradients in Gaussian backend probability backward#165

Open
Hugh-888 wants to merge 2 commits into
TuringQ:mainfrom
Hugh-888:hotfix/hafnian_grad
Open

Fix NaN gradients in Gaussian backend probability backward#165
Hugh-888 wants to merge 2 commits into
TuringQ:mainfrom
Hugh-888:hotfix/hafnian_grad

Conversation

@Hugh-888

Copy link
Copy Markdown
Collaborator

Summary

This PR adds thresholding to the polynomial product term in the hafnian coefficient calculation to improve numerical stability for very small intermediate values.

Motivation

In Gaussian probability calculations, the hafnian polynomial expansion can produce product terms whose factors are extremely small. These terms may underflow or create unstable gradients in downstream autograd paths, especially when computing full probability dictionaries.

Changes

  • Adds a threshold argument to poly_lambda.
  • Masks polynomial factors whose absolute value is below the threshold before computing the product.
  • Keeps the change localized to the hafnian coefficient calculation.

Notes

This thresholding acts as a numerical pruning step, so very small polynomial product contributions may be set to zero.

@Hugh-888 Hugh-888 requested review from Jooyuza and sansiro77 June 10, 2026 02:51
@Hugh-888 Hugh-888 added the bugfix Fix bugs label Jun 10, 2026

@sansiro77 sansiro77 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里建议按最终乘积的幅度做 pruning,而不是按 poly_list 的单个因子判断。

当前写法会漏掉这类情况:每个因子都大于 1e-30,但多个因子相乘后落到 complex64 的极小区间,比如 ~1e-40,反传仍可能 NaN。可以考虑用 log 空间判断:

real_dtype = poly_list.real.dtype if poly_list.is_complex() else poly_list.dtype
threshold = 1 / torch.finfo(real_dtype).max
keep = torch.log(poly_list.detach().abs()).sum() > math.log(threshold)

另外纯态路径里建议把 abs(hafnian(...)) ** 2 改成 haf.real.square() + haf.imag.square()。两者数学等价,但后者避开 complex AbsBackward

z = torch.tensor(1e-40 + 1e-41j, dtype=torch.cfloat, requires_grad=True)
loss = z.abs() ** 2
loss.backward()
print(z.grad)  # tensor(nan+nanj)

z = torch.tensor(1e-40 + 1e-41j, dtype=torch.cfloat, requires_grad=True)
loss = z.real.square() + z.imag.square()
loss.backward()
print(z.grad)  # finite



def poly_lambda(submat: torch.Tensor, int_partition: list, power: int, loop: bool = False) -> torch.Tensor:
def poly_lambda(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不管是对prod的单个元素还是最终乘积做阈值判断都不能正确解决问题(比如阈值以下的元素被乘以0,导致其梯度强制变为0了)。感觉这里最方便的处理还是先把submat变为cdouble,然后return的时候变回原始类型

@sansiro77 sansiro77 changed the title Solve NaN gradients in Gaussian backend probability backward Fix NaN gradients in Gaussian backend probability backward Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix Fix bugs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NaN gradients in Gaussian backend probability backward for mixed init_state

2 participants