Skip to content

[agent][ssd] Fix NameError: nand_endurance undefined in parse_micron_info()#646

Open
rustiqly wants to merge 5 commits into
sonic-net:masterfrom
rustiqly:fix/micron-nand-endurance
Open

[agent][ssd] Fix NameError: nand_endurance undefined in parse_micron_info()#646
rustiqly wants to merge 5 commits into
sonic-net:masterfrom
rustiqly:fix/micron-nand-endurance

Conversation

@rustiqly
Copy link
Copy Markdown
Contributor

What I did

Fix NameError in parse_micron_info() where nand_endurance is referenced but never defined.

How I did it

  • Parse nand_endurance from SMART attributes using _parse_re() with the same NAND_Endurance regex pattern used in parse_virtium_info()
  • Guard the health calculation with nand_endurance != NOT_AVAILABLE check
  • Remove unused erase_fail_count variable (was fetched but never used in the calculation)

How to verify it

Run on a Micron SSD where Percent_Lifetime_Used and Percent_Lifetime_Remain are both unavailable — should no longer crash with NameError.

Which release branch to backport

master

Description for the changelog

Fix NameError crash in Micron SSD health parsing due to undefined nand_endurance variable.

Fixes: #644

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@rustiqly rustiqly requested review from StormLiangMS and yxieca April 6, 2026 16:17
@yxieca
Copy link
Copy Markdown
Contributor

yxieca commented Apr 10, 2026

The nand_endurance fix is correct — it was indeed undefined and would crash. However, I am not sure about removing the erase_fail_count check from the guard condition.

The original code checked both average_erase_count and erase_fail_count before computing health. While erase_fail_count is not used in the formula, it may have been an intentional sanity gate — i.e., do not report health percentage if we cannot read erase failure data from the drive.

Suggestion: keep erase_fail_count in the guard and add nand_endurance:

if average_erase_count != NOT_AVAILABLE and erase_fail_count != NOT_AVAILABLE and nand_endurance != NOT_AVAILABLE:

This way we fix the NameError without changing the original gating logic. Also, erase_fail_count should be kept as a parsed variable even if unused in the formula — removing it changes the behavior silently.


This comment was posted by an AI agent on behalf of @yxieca.

@yxieca
Copy link
Copy Markdown
Contributor

yxieca commented Apr 10, 2026

@rustiqly please consider adding unit test.

@yxieca yxieca requested a review from judyjoseph April 10, 2026 16:45
@rustiqly
Copy link
Copy Markdown
Contributor Author

@yxieca acknowledged, I will work on adding the unit test coverage.

@rustiqly rustiqly force-pushed the fix/micron-nand-endurance branch from 865174d to 69fa5f7 Compare May 11, 2026 09:17
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@rustiqly
Copy link
Copy Markdown
Contributor Author

Added Micron SSD unit coverage in 69fa5f7 for the NAND endurance fallback path, covering the case where Percent_Lifetime_Used is unavailable and health is computed from average erase count + NAND endurance.\n\nLocal test note: pytest -q -o addopts='' tests/test_ssd.py -k micron_ssd is blocked on this host by missing sonic_py_common dependency during test collection.

@rustiqly
Copy link
Copy Markdown
Contributor Author

Pushed follow-up c589d68 for #646 to fix the Azure unit-test failure.

Root cause from CI:

test_micron_ssd_health_from_nand_endurance
AssertionError: assert 'N/A' == 75.0

The new fallback path was passing the full parsed SMART attribute tail into float() instead of extracting the raw value column. The fix now uses average_erase_count.split()[-1] and nand_endurance.split()[-1] before computing health.

Validation on this host:

python3 -m py_compile sonic_platform_base/sonic_storage/ssd.py

The targeted pytest collection is still blocked locally by missing sonic_py_common, but this addresses the exact assertion reported by Azure.

@rustiqly
Copy link
Copy Markdown
Contributor Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@rustiqly
Copy link
Copy Markdown
Contributor Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@rustiqly
Copy link
Copy Markdown
Contributor Author

Pushed follow-up c4bfc51 for #646 to fix the Azure unit failure.

Root cause: NAND_Endurance in the SMART attribute table appears as a normal SMART row (999 NAND_Endurance ... 3028). The previous regex only handled a bare label/value form and did not skip the SMART row fields, so the fallback left health as N/A.

Change: parse the final numeric field from the SMART row for NAND_Endurance (and the analogous Virtium Average_Erase_Count path).

Validation:

  • python3 -m py_compile sonic_platform_base/sonic_storage/ssd.py
  • Standalone regex verification against SMART rows for NAND_Endurance=3028 and Average_Erase_Count=757

Local pytest is blocked on this host because sonic_py_common is not installed in the system Python environment; Azure has the repo test environment and should now exercise tests/test_ssd.py::TestSsd::test_micron_ssd_health_from_nand_endurance.

/azp run

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@rustiqly
Copy link
Copy Markdown
Contributor Author

Pushed follow-up 2ff5458 for #646 to fix the new Azure Python unit failure.

Root cause: the previous NAND_Endurance regex change accidentally altered the existing Virtium parser to read the last column (0) instead of the endurance raw value (20000), causing the health fallback to return N/A.

Fix: restored the Virtium-specific raw-value parsing and kept the Micron NAND_Endurance fallback as a separate SMART-table parse.

Validation: confirmed the regex extraction for both Virtium (20000) and Micron (3028) sample rows locally. Full pytest could not run on this host because the local environment is missing sonic_py_common; Azure will run the complete test suite.

@rustiqly
Copy link
Copy Markdown
Contributor Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines failed to run 1 pipeline(s).

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@rustiqly rustiqly force-pushed the fix/micron-nand-endurance branch from 2ff5458 to 27365fa Compare May 14, 2026 14:07
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

rustiqly added 5 commits May 18, 2026 07:10
…info()

parse_micron_info() references nand_endurance which was never defined
in this method (only exists in parse_virtium_info). Parse it from
SMART attributes using the same regex pattern as the Virtium parser.
Also remove the unused erase_fail_count variable.

Fixes: sonic-net#644

Signed-off-by: Rustiqly <rustiqly@users.noreply.github.com>
Signed-off-by: Rustiqly <rustiqly@users.noreply.github.com>
Signed-off-by: Rustiqly <rustiqly@users.noreply.github.com>
Signed-off-by: Rustiqly <rustiqly@users.noreply.github.com>
Signed-off-by: Rustiqly <rustiqly@users.noreply.github.com>
@rustiqly rustiqly force-pushed the fix/micron-nand-endurance branch from 27365fa to 70efdf5 Compare May 18, 2026 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ssd] NameError: nand_endurance undefined in parse_micron_info()

3 participants