Skip to content

Create 202511_azd branch on sonic-utilities.msft#327

Closed
weimingx wants to merge 615 commits into
Azure:masterfrom
weimingx:202511_azd
Closed

Create 202511_azd branch on sonic-utilities.msft#327
weimingx wants to merge 615 commits into
Azure:masterfrom
weimingx:202511_azd

Conversation

@weimingx
Copy link
Copy Markdown

Create 202511_azd branch on sonic-utilities.msft

noaOrMlnx and others added 30 commits January 14, 2025 15:30
…U validators (#3658)

[Mellanox] Add Mellanox-SN5610N-C256S2, Mellanox-SN5610N-C224O8 to GCU validators
Add Arista 7060X6-64 to gcu validator file.

What I did
Added the Arista-7060X6-64DE and Arista-7060X6-64PE hwsku platforms to the th5 entry in the gcu_field_operation_validators.conf.json file.

How to verify it
Run the sonic-mgmt gcu tests and confirm that apply-patch now works for hwskus under these platforms.

Applicable Backport Branches
202405
202411
* [show][interfaces] Add proposal for show interfaces history

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* change name

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* corrections

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* add changes

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* correct syntax

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* fix comments

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

---------

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>
What I did
Add strict YANG validation for full config command.

How I did it
Fail if found table with no YANG support

How to verify it
UT
… (#3693)

- What I did
Some platforms may have multiple primary disks which makes it ambiguous to determine the device when there is no device specific in the ssdhealth command

- How I did it
Update the show platform command to check platform.json

- How to verify it
UT's
vkarri@85964d14e169:/sonic/src/sonic-utilities$ pytest-3 tests/show_platform_test.py -k "ssdhealth" -v
collected 8 items / 6 deselected / 2 selected

tests/show_platform_test.py::TestShowPlatformSsdhealth::test_ssdhealth PASSED                                                                                                                                [ 50%]
tests/show_platform_test.py::TestShowPlatformSsdhealth::test_ssdhealth_default_device PASSED                                                                                                                 [100%]
Verified the CLI is printing the health output for expected device

Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
- What I did
Remove debug dump import by default, Starting 202411, debug dump started to import libdashapi to parse DASH objects.

Keeping this in default import path is causing CLI execution time to increase.

- How I did it
Make the import on-demand

- How to verify it
UT and verified on the image
[config] Exit with non-zero when qos reload fail
* [show][interface] Add changes for interface errors command

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* remove redundant lines

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* add correction

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* back reverted

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* add ch

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* fix static

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* add phg

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* add prest

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* add fgr

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* add all

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* remove all redundant spaces

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* add key

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* add cgre

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

---------

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>
* [show][interface] Add changes for show interface flap command

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* fix files

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* add ch

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* add ch

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* pep8

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* fix lines

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* fix test

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* add test vals

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* wrap lines

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* fix tests

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* add fixes

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* fix all

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* fix tests

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* add tests

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* fix nit

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* fix

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* add alll

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

* add fd

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>

---------

Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>
* [Acl] Display rule and table info written to APP DB

Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
* CLI support for SmartSwitch PMON

* imad minor fixes

* Did some cleanup for backward compatibility

* removed the column wrapping

* Made it backward compatible and removed textwrap and added ut to PR

* 1. There was a duplication of part of a function and that has been
   addressed. 2. The DPU reboot-cause data is fetched directly fromn the
chassis_state_db now

* reboot_cause and system_health are obtained directly from chassisStateDB
now

* The expected and result are the same but the test is throwing an error,
temporarily bypassing the check

* Let us get the build going and then look into the test mockup

* Implemented as per the pmon hld, also made some improvements in the
implementation

* Fixed the key for CHASSIS_MODULE_INFO_TABLE entries

* Fixed "show reboot-cause all" and "show reboot-cause history all"

* Addressing review comments

* Checking if the test issue still exists

* Resolving SA errors triggered due to reboot_cause_test

* Resolved pre-commit issues

* Resolved pre-commit issues

* Improving coverage

* Fixed SA related warnings

* Did some cleanup

* Minor improvements and fixes

* Adding tests for system health

* Adding more system health related tests

* Fixed a minor issue

* Fixed long line SA issue

* Trying to please SA

* Trying to improve coverage

* import mock

* Fixed a typo

* mocking DB

* Fixed syntax issues

* DB mock fix

* removed unused import

* creating ut for dpu state

* Improving coverage

* Fixed a typo

* Adjusted the reboot-cause key as per the updated hld

* Added fix to gracefully handle sytem health DB keys not present case

* Addressed minor review comments

* Addressed review comments.  Commented out system-health support until
phase:2

* Resolved minor issues and SA failures

* Added role to PORT table in config_db.  Using role to differentiate
npu-dpu data plane connection in SmartSwitch with Dpc being the role.
Did a minor cleanup.

* Resolving pre-commit check error related to line > 120

* Trying to avoid pre-commit issues

* Testing SA and precommit checks

* Making it backward compatible

* Resolving column size and whitespace issue

* Working on SA issue

* Testing SA and UT

* Added 2 spaces before inline comment

* Enabling "show system-health dpu" cli alone.  The rest of the dpu health
is differed for now.

* Fixed SA issues

* Adde new line at EOF

* Enabling the UT for the CLI "show system-health dpu"

* Resolved SA issues

* Resolved a SA issue

* Added smartswitch specific "reboot-cause" and "reboot-cause history" CLI
extensions

* Removed the phase:2 related system-health cli extensions as a seperate
PR will be raised eventually for phase:2

* Using smartswitch qualifier for the clie extensions

* Fixed SA issues

* mocking device_info for test cases

* import patch in tests

* Debugging test failure

* Fixing SA issues

* fixing sa issues

* Debugging sa issues

* trying to resolve sa issues

* fixed indentation

* debugging

* debugging

* debugging

* debugging

* Debugging

* debugging

* debugging

* Debugging

* Debugging

* Debuggingg

* Debugging

* Debugging

* Debugging

* Debugging

* Debugging

* Debugging

* Debugging

* Debugging

* Debugging

* Debugging

* Debugging

* Debugging

* Debugging

* Debugging

* Debugging

* Debugging

* Debuggingg

* Debugging

* Debugging

* Debugging

* Debugging

* Debugging

* Debugging

* Debugging

* Debugging

* Debugging

* Debugging

* Debugging

* Debugging

* Debugging

* Debugging

* Removing the test to build an image

* Removed mock import

* Improving coverage

* pleasing SA

* Fixing tests for design changes as per review comments

* Resolving test failure

* fixed indentation

* cleaned up the test case

* Addressed review comments in Command-Reference.md and trying to improve
coverage

* Improving coverage

* Fixed a test issue

* Addressed review comments

* Addressed review comment.  Reading DPUs list from config_db.json

* Improving coverage

* Resolved SA error

* Trying to improve coverage. Also, reading from platform.json

* adding json import in the test

* Fixed a test failure

* Fixed SA error

* Exercising the new function in test

* Removed a blank line

* fixing mock issue

* Trying a different approach

* working on coverage

* debugging

* debugging

* Debugging

* Increasing coverage

* improving coverage

* Adjusting the show cli implementation to align with the reboot-cause
changes such as 1. STATE_DB vs CHASSIS_STATE_DB and the key info

* Fixing a minor issue

* Removed ID column from the "show system-health dpu DPUx" cli as per the new requirement

* Addressed default dpu admin status for dark-mode and seamless migration
to lightup mode

* Resolving SA issue

* Resolved a typo

* Added checks to see if module_name is valid in the "config chassis
modules startup DPUx" cli aand also moved all the required utilities to
the common file

* Fixed white space issues

* Cleaned unwanted import

* Fixed build issues

* missedout the fixes in a couple of files

* With the recent code the app_db multi_asic.PORT_ROLE is Dpc for DPU
ports, earlier this was not the case. So removing the additional check.

* As the port role issue is no longer seen in smartswitch, cleaning up the
related chnages.

* Using the verbose define for TYPE_DPC in the CLI, if there is a specific
requirement to keep 'TYPE_DPC = Dpc", which is the role, then we will
revert it

* Reverting intfutil_test.py

* Using the common API to get_dpu_list

* Removed unused import json

* Addressed review comments

* Did some minor cleanp

* Fix: SA error

* Addressed review comments

* Addressed review comments

* Addressed review comments

* Addressed review comments

* Addressed review comments

* Addressed review comments

* Addressed review comments

* Addressed review comments

* Addressed review comments

* Addressed review comments

* Addressed review comments

* Addressed review comments

* Addressed review comments

* Addressed review comments

* Addressed review comments

* Addressed review comments

* Addressed review comments

* Addressed review comments

* Addressed review comments

* Added fix for issue:21372 - Device name column shows NPU instead of module name

* Added fix for issue:21372 - Fixing the device name colum in the cli output

* Added a few review comments
Remove partially installer image when image install failed.

What I did
When install image failed, partially installed image not removed, which may cause disk full issue on small disk device.

How I did it
Handle install image SystemExit exception and uninstall image.

How to verify it
Manually verify.
Pass all test case
* Add recover hardware config if load golden config.

* fix format

* Refactor with exist function.

* Fix format

* Remove deprecated tests

* fix asicid

* remove unrelated changes

* Remove unrelated changes.

* Recover the old implementation

* Remove unused test.

* Recover the old tests

* Adjust comments.
What I did
On Chassis-packet supervisor, show interface counter -d all returns no data

How I did it
Fixed the condition in portstat.py to use right path for Chassis Packet RP to collect counters.
Added condition to collect link state information.

How to verify it
Run 'show interface counters -d all' on Chassis-Packet Supervisor
Fix ssdhealth failure on VS platform
…ps (#3739)

* Make 'show ip bgp summary' work even when we don't have any peer groups configured.

* fixing the indentation issues
What I did
Fix the calls for spanning-tree commands in dump script.
During call to generate techsupport, we can see that the spanning tree commands fail:

.
Error: No such command "spanning_tree".
timeout --foreground 5m bash -c "dummy_cleanup_method ()
.
How I did it
Change from show spanning_tree to the actual command show spanning-tree

How to verify it
Call show techsupport
- What I did
Add support for Mellanox SN5640 new platform and hwSKUs

- How I did it
Add new files and folders to support the new platform and modified relevant existing files

- How to verify it
Install image with the change on Mellanox SN5640 setup and run tests
* display proper message with proper errno for kvm.

* this change should cover all flags.

* add unit test.

* fix typo.

* fix unit test.

* whitespace EOF

* delete dead variable.
* Optimize lag_keepalive by crafting the LACPDU packet ourselves

Instead of waiting for a LACPDU packet to be sent and capturing that
(which involves waiting roughly 30 seconds), get the necessary
information from teamd and craft it ourselves. This means that the
60-second wait in making sure that a LACPDU packet is captured and the
keepalive script is ready can be largely eliminated (this has been
reduced to 10 seconds to make sure the script has a chance to craft the
packets and send some LACPDUs).

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

* Fix pre-commit errors

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

* Keep a socket open, and reuse that for sending LACPDUs

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

* Add logic to fork into background after collecting information

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

* Start lag_keepalive before OA pause, and fork after building packets

Start lag_keepalive before pausing orchagent, so that there's less of a
delay between when orchagent is paused and when kexec happens, and so
that fewer events/changes aren't handled by orchagent.

Additionally, add an option into the lag_keepalive script to fork into
the background after generating the LACPDUs and opening sockets, but
before sending the actual packets. This serves as a sort-of error check
to make sure that it is at least able to send LACPDU packets, and didn't
bail out early.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

---------

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
* Fixing 'show ip bgp neighbor <ip>' in frr unified config mode by making sure to get the neighbor key from config_db either in 'x.x.x.x' format or alternatively in 'default|x.x.x.x' format as is the case with unified mode

* fixing the neighbor table entry match to correctly show neighbor name in unified frr config mgmt mode
* [QOS] Skip showing unnecessary warning message

Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
… fs to allow remote user login. (#3700)

Improve SONiC disk checker to handle disk full case and mount overlay fs to allow remote user login.

This PR depends on DB schema change: sonic-net/sonic-buildimage#21351

What I did
Currently disk checker only handle RO disk case, but when disk no free space, remote user also can't login.
How I did it
Check disk free space and mount overlay fs to allow TACACS login.
How to verify it
Create big file on device, make device no free space. then login with remote user should success.
* [FC] remove FC delay field

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Currently, in Cisco 8800 chassis, PFCWD is only enabled for front end ports, not on backplane. As we have PFC enabled for backplane ports, need to enable pfcwd there too.

What I did.
Enable PFCWD for backplane ports.

How I did it
Include backplane ports into the port list to be enabled for pfcwd

How to verify it
manually copied the file to device, and run pfcwd start_default.
…tion (#3763)

What I did
Added the options -a and --all to scripts/vnet_route_check.py. Both options are equivalent. If none of them is provided, then when finding the VNET routes that are in APP DB but not in ASIC DB, we will ignore routes in APP DB that are not active.
Mock tests in tests/test_vnet_route_check.py are added to test this behavior.

How I did it
If -a and --all are not provided, we first filter routes in APP DB to find active routes and then check which active routes are not in ASIC DB. The status of each route is retrieved from STATE DB. If a route is not found in STATE DB, then it is considered to be active.

How to verify it
You can verify the behavior by running mock tests in tests/test_vnet_route_check.py, or by manually running the vnet_route_check.py script on a DUT.
* Add golden config check

* fix format

* Addressing comment.

* fix tests.

* fix format

* fix common function

* Fix UT

* fix test

* reset mock

* fix ut

* fix ut

* remove unnecessary test

* Add negative test

* fix format

* remove empty dict check

* Add none check

* fix condition

* fix invalid golden
mssonicbld and others added 28 commits March 17, 2026 04:13
…nt units (#4359)

<!--
 Please make sure you've read and understood our contributing guidelines:
 https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

 failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` **

 If this is a bug fix, make sure your description includes "closes #xxxx",
 "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related
 issue when the PR is merged.

 If you are adding/modifying/removing any command or utility script, please also
 make sure to add/modify/remove any unit tests from the tests
 directory as appropriate.

 If you are modifying or removing an existing 'show', 'config' or 'sonic-clear'
 subcommand, or you are adding a new subcommand, please make sure you also
 update the Command Line Reference Guide (doc/Command-Reference.md) to reflect
 your changes.

 Please provide the following information:
-->

Propagating sonic-net/sonic-host-services#320 for managing `generated/transient` units by SONiC Package Manager

#### What I did
Fixed application upgrade flow for Multi ASIC platforms

#### How I did it
Handled `systemctl` `enabled/disabled` action for `generated/transient` units

#### How to verify it

1. Add app repository
```
sudo sonic-package-manager repository add <app> <repo_url> --default-reference=1.0.0
```

2. Install app v1.0.0
```
sudo sonic-package-manager install <app>==1.0.0 -y
```

3. Enable the feature and wait for container to come up
```
sudo config feature state <app> enabled
```

4. Upgrade to v2.0.0 - THIS triggers the bug
```
sudo sonic-package-manager install -y <app>==2.0.0
```

#### Previous command output (if the output of a command-line utility has changed)

SHELL:
```
root@sonic:/home/admin# sudo sonic-package-manager install -y cpu-report==2.0.0
Execute systemctl action stop on cpu-report service
Execute systemctl action disable on cpu-report service
removed /usr/lib/systemd/system/cpu-report.service
removed /usr/local/bin/cpu-report.sh
removed /usr/bin/cpu-report.sh
removed /etc/sonic/cpu-report_reconcile
removed /etc/systemd/system/cpu-report.service.d
generated /usr/bin/cpu-report.sh
generated /usr/local/bin/cpu-report.sh
generated /usr/lib/systemd/system/cpu-report.service
cpu-report entry is added to AUTO_TECHSUPPORT_FEATURE table
Execute systemctl action enable on cpu-report service
Failed to enable unit: Unit /run/systemd/generator/cpu-report.service is transient or generated
cpu-report entry is added to AUTO_TECHSUPPORT_FEATURE table
removed /usr/lib/systemd/system/cpu-report.service
removed /usr/local/bin/cpu-report.sh
removed /usr/bin/cpu-report.sh
removed /etc/sonic/cpu-report_reconcile
removed /etc/systemd/system/cpu-report.service.d
generated /usr/bin/cpu-report.sh
generated /usr/local/bin/cpu-report.sh
generated /usr/lib/systemd/system/cpu-report.service
Execute systemctl action enable on cpu-report service
Failed to enable unit: Unit /run/systemd/generator/cpu-report.service is transient or generated
error: failed in rollback: Failed to execute "['systemctl', 'enable', 'cpu-report']"
Failed to install cpu-report==2.0.0: Failed to upgrade cpu-report: Failed to execute "['systemctl', 'enable', 'cpu-report']"
```

SYSLOG:
```
2026 Feb 10 14:31:39.193944 sonic WARNING systemctl enable for generated/transient unit cpu-report
```

#### New command output (if the output of a command-line utility has changed)

SHELL:
```
root@sonic:/home/admin# sonic-package-manager install cpu-report==2.0.0 -y
Execute systemctl action stop on cpu-report service
Execute systemctl action disable on cpu-report service
warning: Skipping systemctl disable for generated/transient unit cpu-report
removed /usr/lib/systemd/system/cpu-report.service
removed /usr/local/bin/cpu-report.sh
removed /usr/bin/cpu-report.sh
removed /etc/sonic/cpu-report_reconcile
removed /etc/systemd/system/cpu-report.service.d
generated /usr/bin/cpu-report.sh
generated /usr/local/bin/cpu-report.sh
generated /usr/lib/systemd/system/cpu-report.service
cpu-report entry is added to AUTO_TECHSUPPORT_FEATURE table
Execute systemctl action enable on cpu-report service
warning: Skipping systemctl enable for generated/transient unit cpu-report
Execute systemctl action start on cpu-report service
```

SYSLOG:
```
2026 Feb 10 16:18:55.531288 sonic INFO sonic-package-manager: Execute systemctl action disable on cpu-report service
2026 Feb 10 16:18:55.541984 sonic WARNING sonic-package-manager: Skipping systemctl disable for generated/transient unit cpu-report
2026 Feb 10 16:18:55.542262 sonic INFO sonic-package-manager: removed /usr/lib/systemd/system/cpu-report.service
2026 Feb 10 16:18:55.542481 sonic INFO sonic-package-manager: removed /usr/local/bin/cpu-report.sh
2026 Feb 10 16:18:55.542676 sonic INFO sonic-package-manager: removed /usr/bin/cpu-report.sh
2026 Feb 10 16:18:55.542861 sonic INFO sonic-package-manager: removed /etc/sonic/cpu-report_reconcile
2026 Feb 10 16:18:55.543674 sonic INFO sonic-package-manager: removed /etc/systemd/system/cpu-report.service.d
2026 Feb 10 16:18:56.190582 sonic INFO sonic-package-manager: generated /usr/bin/cpu-report.sh
2026 Feb 10 16:18:56.201750 sonic INFO sonic-package-manager: generated /usr/local/bin/cpu-report.sh
2026 Feb 10 16:18:56.218809 sonic INFO sonic-package-manager: generated /usr/lib/systemd/system/cpu-report.service
2026 Feb 10 16:18:56.894541 sonic INFO sonic-package-manager: cpu-report entry is added to AUTO_TECHSUPPORT_FEATURE table
2026 Feb 10 16:18:56.894689 sonic INFO sonic-package-manager: Execute systemctl action enable on cpu-report service
2026 Feb 10 16:18:56.905258 sonic WARNING sonic-package-manager: Skipping systemctl enable for generated/transient unit cpu-report
2026 Feb 10 16:18:56.905380 sonic INFO sonic-package-manager: Execute systemctl action start on cpu-report service
```

**Note:**
* system logger fix also resolves the issue of missing application tag and very first word of the message

#### A picture of a cute animal (not mandatory but encouraged)
```
 .---. .-----------
 / \ __ / ------
 / / \( )/ -----
 ////// ' \/ ` ---
 //// / // : : ---
 // / / /` '--
// //..\\
 ====UU====UU====
 '//||\\`
 ''``
```

Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
## Description
Currently `generate_sysinfo()` unconditionally overwrites `mac`, `platform`, and `asic_id` in the golden config with values from the running config or hardware detection. This prevents users from intentionally overriding these fields via `config load_minigraph -o` or `config override-config-table`.

### Problem
When a user provides an explicit `mac` in their golden config (e.g., for MAC migration, virtual environments, or chassis scenarios), it gets silently overwritten before the override is applied:

1. `load_minigraph` calls `sonic-cfggen -H` which sets MAC from hardware
2. `override_config_by()` → `override_config_table()` → `generate_sysinfo()`
3. `generate_sysinfo()` reads MAC from running config (set by step 1) and **unconditionally overwrites** the golden config's MAC

### Fix
Change `generate_sysinfo()` to only backfill `mac`, `platform`, and `asic_id` when they are **not explicitly present** in the golden config. If the golden config provides these values, they are preserved.

This maintains backward compatibility — golden configs without these fields (the common case) still get auto-populated from hardware/running config.

## Motivation and Context
Users cannot override `DEVICE_METADATA.localhost.mac` via golden config override, even when explicitly specified.

## How Has This Been Tested?
Code review and manual verification of the logic change.

## Type of change
- [x] Bug fix

Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
Manual backport due to cherry-pick conflicts

Signed-off-by: Peter <peterbailey@arista.com>
What I did
It supports to detect the route with offload False or without offload, which can capture queued route, because queued route doesn’t have offload, but for rejected route, the offload is True. It can’t detect the rejected route.
So, we add a new detection check for the value of key failed for route entries, which can cover both rejected and queued routes.
It will help to detect rejected route and queued route on device.

How I did it
Append failed route prefix into failed list, if it's not empty, script will print error message into syslog

        # Check for failed state
        if entry.get('failed', False):
            failed_rt.append(route_prefix)
How to verify it
For rejected route:
The output of route_check.py

Some routes have failed state in FRR : ['0.0.0.0/0', '192.168.128.0/25', '192.168.128.128/25', '192.168.136.0/25', '192.168.136.128/25', '192.168.144.0/25', '192.168.144.128/25', '192.168.152.0/25', '192.168.152.128/25', '192.168.160.0/25', '192.168.160.128/25', '192.168.168.0/25', '192.168.168.128/25', '192.168.176.0/25', '192.168.176.128/25', '192.168.184.0/25', '192.168.184.128/25', '192.168.192.0/25', '192.168.192.128/25', '192.168.200.0/25', '192.168.200.128/25', '192.168.208.0/25', '192.168.208.128/25', '192.168.216.0/25', '192.168.216.128/25', '192.168.224.0/25', '192.168.224.128/25', '192.168.232.0/25', '192.168.232.128/25', '192.168.240.0/25', '192.168.240.128/25', '192.168.248.0/25', '192.168.248.128/25', '192.169.0.0/25', '192.169.0.128/25', '192.169.104.0/25', '192.169.104.128/25', '192.169.112.0/25', '192.169.112.128/25', '192.169.120.0/25', '192.169.120.128/25', '192.169.128.0/25', '192.169.128.128/25', '192.169.136.0/25', '192.169.136.128/25', '192.169.144.0/25', '192.16
Failure results: {{
    "": {
        "failed_FRR_routes": [
            "0.0.0.0/0",
            "192.168.128.0/25",
            "192.168.128.128/25",
            "192.168.136.0/25",
            "192.168.136.128/25",
            "192.168.144.0/25",
            "192.168.144.128/25",
            "192.168.152.0/25",
            "192.168.152.128/25",
            "192.168.160.0/25",
            "192.168.160.128/25",
            "192.168.168.0/25",
            "192.168.168.128/25",
            "192.168.176.0/25",
            "192.168.176.128/25",
            "192.168.184.0/25",
            "192.168.184.128/25",
            "192.168.192.0/25",
            "192.168.192.128/25",
            "192.168.200.0/25",
            "192.168.200.128/25",
            "192.168.208.0/25",
            "192.168.208.128/25",
            "192.168.216.0/25",
            "192.168.216.128/25",
            "192.168.224.0/25",
            "192.168.224.128/25",
            "192.168.232.0/25",
            "192.168.232
Failed. Look at reported mismatches above
For rejected route:
the output of route_check.py

Some routes have failed state in FRR : ['0.0.0.0/0', '192.168.128.0/25', '192.168.128.128/25', '192.168.136.0/25', '192.168.136.128/25', '192.168.144.0/25', '192.168.144.128/25', '192.168.152.0/25', '192.168.152.128/25', '192.168.160.0/25', '192.168.160.128/25', '192.168.168.0/25', '192.168.168.128/25', '192.168.176.0/25', '192.168.176.128/25', '192.168.184.0/25', '192.168.184.128/25', '192.168.192.0/25', '192.168.192.128/25', '192.168.200.0/25', '192.168.200.128/25', '192.168.208.0/25', '192.168.208.128/25', '192.168.216.0/25', '192.168.216.128/25', '192.168.224.0/25', '192.168.224.128/25', '192.168.232.0/25', '192.168.232.128/25', '192.168.240.0/25', '192.168.240.128/25', '192.168.248.0/25', '192.168.248.128/25', '192.169.0.0/25', '192.169.0.128/25', '192.169.104.0/25', '192.169.104.128/25', '192.169.112.0/25', '192.169.112.128/25', '192.169.120.0/25', '192.169.120.128/25', '192.169.128.0/25', '192.169.128.128/25', '192.169.136.0/25', '192.169.136.128/25', '192.169.144.0/25', '192.16
Failure results: {{
    "": {
        "missed_FRR_routes": [
            {
                "destSelected": true,
                "distance": 20,
                "failed": true,
                "installedNexthopGroupId": 39146,
                "internalFlags": 8,
                "internalNextHopActiveNum": 4,
                "internalNextHopNum": 4,
                "internalStatus": 168,
                "metric": 0,
                "nexthopGroupId": 39146,
                "nexthops": [
                    {
                        "active": true,
                        "afi": "ipv4",
                        "fib": true,
                        "flags": 3,
                        "interfaceIndex": 6,
                        "interfaceName": "PortChannel102",
                        "ip": "10.0.0.1",
                        "rmapSource": "10.1.0.32",
                        "weight": 1
                    },
                    {
                        "active": true,

Failed. Look at reported mismatches above

Signed-off-by: Priyansh Tratiya <ptratiya@microsoft.com>
Co-authored-by: Zhaohui Sun <94606222+ZhaohuiS@users.noreply.github.com>
…IPv4 or IPv6 route (#4177) (#4363)

* remove early return



* add unit tests for IP4 only and IP6 only scenarios



* fix incorrect UT data format



* fix UT data



* add early return for non-json case



---------

Signed-off-by: BYGX-wcr <wcr@live.cn>
Signed-off-by: Priyansh Tratiya <ptratiya@microsoft.com>
Co-authored-by: Changrong Wu <wcr@live.cn>
…sessions (#4368)

## What I did

Fixed is_port_mirror_capability_supported() so that ERSPAN sessions (direction=None) are not blocked by the PORT_INGRESS_MIRROR_CAPABLE / PORT_EGRESS_MIRROR_CAPABLE capability check.

## Root cause

PR #4089 added a capability check that reads PORT_INGRESS_MIRROR_CAPABLE and PORT_EGRESS_MIRROR_CAPABLE from STATE_DB SWITCH_CAPABILITY|switch. For ERSPAN sessions, direction=None was treated as 'check both', but:
1. These capability flags only apply to SPAN (port mirror) sessions, not ERSPAN (which uses source/destination IPs, not ports)
2. Platforms that don't populate these STATE_DB keys return None, which != 'true', so the function incorrectly returns False (unsupported)

PR #4159 partially addressed the multi-ASIC namespace issue but did not fix the fundamental problem for ERSPAN sessions with no src/dst port specified.

## How I fixed it

- **For ERSPAN (direction=None)**: Return True immediately. PORT_INGRESS/EGRESS_MIRROR_CAPABLE does not apply to ERSPAN sessions.
- **For SPAN (direction != None)**: Treat absent STATE_DB key (None value) as 'supported' for backward compatibility with platforms that don't populate SWITCH_CAPABILITY table entries.

## How to verify it

Unit tests updated in 	ests/config_mirror_session_test.py:
- Added Test 4 to verify behavior when STATE_DB keys are absent (all return True)
- Updated Test 2 and Test 3 assertions for direction=None to expect True

Fixes: sonic-net/sonic-mgmt#21690

Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
<!--
 Please make sure you've read and understood our contributing guidelines:
 https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

 failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` **

 If this is a bug fix, make sure your description includes "closes #xxxx",
 "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related
 issue when the PR is merged.

 If you are adding/modifying/removing any command or utility script, please also
 make sure to add/modify/remove any unit tests from the tests
 directory as appropriate.

 If you are modifying or removing an existing 'show', 'config' or 'sonic-clear'
 subcommand, or you are adding a new subcommand, please make sure you also
 update the Command Line Reference Guide (doc/Command-Reference.md) to reflect
 your changes.

 Please provide the following information:
-->

#### What I did

Fixed `sonic-installer install` failing during `migrate_sonic_packages()` when `/etc/resolv.conf` in the new image is a symlink to `/run/resolvconf/resolv.conf`.

The failure occurs because the `cp` command at `main.py:386` follows the symlink through the overlay mount. Since the symlink target is an absolute path, it resolves to the **host's** `/run/resolvconf/resolv.conf` — the same file as the source. `cp` detects same source and destination inode and exits with:

```
cp: '/etc/resolv.conf' and '/tmp/image-<version>-fs/etc/resolv.conf' are the same file
```

This was introduced by the `build_debian.sh` change that replaced `touch` with `ln -sf /run/resolvconf/resolv.conf` for `/etc/resolv.conf` in the image filesystem.

#### How I did it

Check whether `/etc/resolv.conf` in the chroot is a symlink or a regular file, and handle each case appropriately:

- **Symlink** (images with `resolvconf` package installed): Read the symlink target via `readlink` (e.g. `/run/resolvconf/resolv.conf`), then create the target file inside the chroot with the host's DNS content. The symlink then resolves correctly inside the chroot. This avoids touching the symlink itself, so the overlay upper dir's `etc/resolv.conf` is never modified and the new image boots with the symlink intact. No cleanup is needed — the target lives under `/run`, which is a tmpfs recreated at every boot.

- **Regular file** (images without `resolvconf`, or where the build process explicitly creates a regular file via `touch`): Overwrite directly with the host's DNS content. No backup/restore is needed — the original file is empty (cleared during build), and after reboot the `resolv-config` service reconfigures DNS from CONFIG_DB.

The previous backup-overwrite-restore pattern has been removed since it is unnecessary in both cases.

#### How to verify it

1. Start with a switch running an image where `/etc/resolv.conf` is a symlink:
 ```bash
 # Confirm symlink exists
 ls -la /etc/resolv.conf
 # Expected: /etc/resolv.conf -> /run/resolvconf/resolv.conf

 # Ensure only one image is installed (clean state)
 sudo sonic-installer list
 # If the target image is already present, remove it:
 sudo sonic-installer remove <target-image> -y
 ```

2. Run `sonic-installer install` with an image that also has the symlink:
 ```bash
 sudo sonic-installer install <image-path> -y
 ```

3. Verify:
 - Installation completes without `cp: ... are the same file` error
 - `sonic-installer list` shows the new image as default
 - After reboot, `/etc/resolv.conf` is still a symlink to `/run/resolvconf/resolv.conf`
 - DNS resolution works

Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>

#### Previous command output (if the output of a command-line utility has changed)

#### New command output (if the output of a command-line utility has changed)
#### What I did

On DPU platforms (pensando, nvidia-bluefield), the `routeCheck` monit service is removed from the monit config (sonic-net/sonic-buildimage#26214) since route checking is not applicable on DPUs. However, `config reload` unconditionally tries to `monit unmonitor/monitor routeCheck`, causing:
```
Disabling container and routeCheck monitoring ...
There is no service named "routeCheck"
```

Added a `_monit_service_exists()` helper and guarded `routeCheck` monit calls in `_stop_services()` and `_restart_services()` so they only execute when the service is present in monit.

#### How I did it

- Added `_monit_service_exists(service)` helper that runs `sudo monit summary <service>` and returns `True`/`False` based on exit code.
- In `_stop_services()`: only call `monit unmonitor routeCheck` if `_monit_service_exists("routeCheck")` returns `True`.
- In `_restart_services()`: only call `monit monitor routeCheck` and `_wait_for_monit_service_monitored("routeCheck")` if the service exists.

#### How to verify it

1. On a DPU platform (nvidia-bluefield / pensando) with routeCheck removed from monit config:
 - Run `config reload -y` — should complete without `"There is no service named routeCheck"` error
2. On a non-DPU platform where routeCheck exists in monit:
 - Run `config reload -y` — routeCheck monitoring should still be disabled/re-enabled as before

#### Previous command output (if the output of a command-line utility has changed)

On DPU with routeCheck removed from monit:
```
root@sonic-dpu-3:/home/admin# config reload -y
Acquired lock on /etc/sonic/reload.lock
Disabling container and routeCheck monitoring ...
There is no service named "routeCheck"
Released lock on /etc/sonic/reload.lock
```

#### New command output (if the output of a command-line utility has changed)

On DPU with routeCheck removed from monit:
```
root@sonic-dpu-3:/home/admin# config reload -y
Acquired lock on /etc/sonic/reload.lock
Disabling container monitoring ...
...
Enabling container monitoring ...
Released lock on /etc/sonic/reload.lock
```

Companion PR: sonic-net/sonic-buildimage#26214
Fixes: sonic-net/sonic-buildimage#26225

Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
<!--
 Please make sure you've read and understood our contributing guidelines:
 https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

 failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` **

 If this is a bug fix, make sure your description includes "closes #xxxx",
 "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related
 issue when the PR is merged.

 If you are adding/modifying/removing any command or utility script, please also
 make sure to add/modify/remove any unit tests from the tests
 directory as appropriate.

 If you are modifying or removing an existing 'show', 'config' or 'sonic-clear'
 subcommand, or you are adding a new subcommand, please make sure you also
 update the Command Line Reference Guide (doc/Command-Reference.md) to reflect
 your changes.

 Please provide the following information:
-->

#### What I did

Update `show bfd summary` to aggregate BFD sessions across all ASIC namespaces when no `-n <namespace>` is provided.
Extend multi-ASIC BFD tests and expected output for the all-ASIC summary.

#### How I did it

#### How to verify it
Run the `show bfd summary` with the change can give you all BFD sessions across all ASIC namespaces.
BTW, running the classic `sudo ip netns exec <namespace> show bfd summary` will still give you the BFD sessions of that namespace, so there's no regression
```
admin@dut:~$ sudo ip netns exec asic1 show bfd summary
Total number of BFD sessions: 220
Peer Addr Interface Vrf State Type Local Addr TX Interval RX Interval Multiplier Multihop Local Discriminator
----------------------------- ----------- ------- ------- ------------ ----------------------------- ------------- ------------- ------------ ---------- ---------------------
2603:10b0:607:7:0:a:eb64:8700 default default Up async_active 2603:10b0:607:7:0:a:eb64:8b00 50 50 3 false 127
2603:10b0:607:7:0:a:eb62:8900 default default Up async_active 2603:10b0:607:7:0:a:eb62:8b00 50 50 3 false 104
2603:10b0:607:7:0:a:eb64:8300 default default Up async_active 2603:10b0:607:7:0:a:eb64:8b00 50 50 3 false 211
10.235.96.10 default default Up async_active 10.235.96.8 50 50 3 false 18
10.235.96.138 default default Up async_active 10.235.96.139 50 50 3 false 25
2603:10b0:607:7:0:a:eb62:c00 default default Up async_active 2603:10b0:607:7:0:a:eb62:800 50 50 3 false 108
2603:10b0:607:7:0:a:eb60:8700 default default Up async_active 2603:10b0:607:7:0:a:eb60:8b00 50 50 3 false 73
2603:10b0:607:7:0:a:eb61:8c00 default default Up async_active 2603:10b0:607:7:0:a:eb61:8b00 50 50 3 false 93
...
```

#### Previous command output (if the output of a command-line utility has changed)
```
admin@dut:~$ show bfd summary
Total number of BFD sessions: 0
Peer Addr Interface Vrf State Type Local Addr TX Interval RX Interval Multiplier Multihop Local Discriminator
----------- ----------- ----- ------- ------ ------------ ------------- ------------- ------------ ---------- ---------------------
```

#### New command output (if the output of a command-line utility has changed)
```
admin@dut:~$ show bfd summary
Total number of BFD sessions: 660
Peer Addr Interface Vrf State Type Local Addr TX Interval RX Interval Multiplier Multihop Local Discriminator
------------------- ----------- ------- ------- ------------ ------------------- ------------- ------------- ------------ ---------- ---------------------
20.0.13.6 default default Up async_active 20.0.13.1 50 50 3 false 27
2603:10e2:400:2::7 default default Up async_active 2603:10e2:400:2::1 50 50 3 false 106
20.0.12.6 default default Up async_active 20.0.12.1 50 50 3 false 136
2603:10e2:400:14::9 default default Up async_active 2603:10e2:400:14::1 50 50 3 false 94
2603:10e2:400:6::5 default default Up async_active 2603:10e2:400:6::1 50 50 3 false 159
2603:10e2:400:14::7 default default Up async_active 2603:10e2:400:14::1 50 50 3 false 92
2603:10e2:400:11::8 default default Up async_active 2603:10e2:400:11::1 50 50 3 false 73
20.0.10.9 default default Up async_active 20.0.10.1 50 50 3 false 11
20.0.10.3 default default Up async_active 20.0.10.1 50 50 3 false 132
20.0.5.2 default default Up async_active 20.0.5.1 50 50 3 false 143
20.0.11.7 default default Up async_active 20.0.11.1 50 50 3 false 15
2603:10e2:400:1::2 default default Up async_active 2603:10e2:400:1::1 50 50 3 false 155
20.0.13.3 default default Up async_active 20.0.13.1 50 50 3 false 137
2603:10e2:400:14::2 default default Up async_active 2603:10e2:400:14::1 50 50 3 false 89
20.0.1.9 default default Up async_active 20.0.1.1 50 50 3 false 5
20.0.5.5 default default Up async_active 20.0.5.1 50 50 3 false 145
2603:10e2:400:5::3 default default Up async_active 2603:10e2:400:5::1 50 50 3 false 158
20.0.14.6 default default Up async_active 20.0.14.1 50 50 3 false 33
...
```

Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
What I did
The Vrf Pool is being increased to 4096 by PR : sonic-net/sonic-swss#4168
Therefore the mgmt vrf tabled id is being moved to 6000.
Updating the "show mgmt-vrf routes" for the same.

Signed-off-by: ypcisco <ypcisco@gmail.com>
…ne (#4424)

<!--
 Please make sure you've read and understood our contributing guidelines:
 https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

 failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` **

 If this is a bug fix, make sure your description includes "closes #xxxx",
 "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related
 issue when the PR is merged.

 If you are adding/modifying/removing any command or utility script, please also
 make sure to add/modify/remove any unit tests from the tests
 directory as appropriate.

 If you are modifying or removing an existing 'show', 'config' or 'sonic-clear'
 subcommand, or you are adding a new subcommand, please make sure you also
 update the Command Line Reference Guide (doc/Command-Reference.md) to reflect
 your changes.

 Please provide the following information:
-->

#### What I did
When config reload is done, BGP_PEER_CONFIGURED_TABLE is not cleared of pervious entries of BGP peers. Added logic to clean that up during config reload
#### How I did it
Connected to state db and cleared up all keys starting with BGP_PEER_CONFIGURED_TABLE|*
#### How to verify it
delete a peer from config_db.json and perform config reload. Peer should be deleted from state db table BGP_PEER_CONFIGURED_TABLE

Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
#### Previous command output (if the output of a command-line utility has changed)

#### New command output (if the output of a command-line utility has changed)
<!--
 Please make sure you've read and understood our contributing guidelines:
 https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

 failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` **

 If this is a bug fix, make sure your description includes "closes #xxxx",
 "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related
 issue when the PR is merged.

 If you are adding/modifying/removing any command or utility script, please also
 make sure to add/modify/remove any unit tests from the tests
 directory as appropriate.

 If you are modifying or removing an existing 'show', 'config' or 'sonic-clear'
 subcommand, or you are adding a new subcommand, please make sure you also
 update the Command Line Reference Guide (doc/Command-Reference.md) to reflect
 your changes.

 Please provide the following information:
-->

#### What I did
sonic-net/sonic-buildimage#24829
Fixed a bug in the "show fabric isolation" command output.
When the fabric port sees CRC/FEC-Uncorrectable error, the Fabric monitor feature Isolates the port and sets the ISOALTED=1 and also AUTO_ISOLATED=1 in the STATE_DB for that port in the FABRIC_PORT_TABLE. The field ISOLATED and CONFIG_ISOLATED are always present for all the fabric ports in FABRIC_PORT_TABLE. However the field AUTO_ISOLATED is not present always and added only when the port is auto ISOLATED.
Due to the bug in the FabricIsolation cli script, the Auto Isolated is shown 1 for all the ports printed in the show command after the port which is actually isolated. In the output shown below, the port 165 is isolated, however Auto-isolated is shown 1 for all the ports after 165 which is wrong.
<img width="415" height="245" alt="image" src="https://github.com/user-attachments/assets/26df22fd-615f-44c7-9b9a-3e7e44bef735" />

#### How I did it
Initialized the variable correctly inside the loop.
#### How to verify it
Induced the CRC error for one of the port and after that port is isolated, verified the "show fabric isolation" output shows the correct output.

<img width="342" height="239" alt="image" src="https://github.com/user-attachments/assets/749d13c0-9892-4f2c-af9a-70a49d7ee2e5" />

Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>

#### Previous command output (if the output of a command-line utility has changed)

#### New command output (if the output of a command-line utility has changed)
<!--
 Please make sure you've read and understood our contributing guidelines:
 https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

 failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` **

 If this is a bug fix, make sure your description includes "closes #xxxx",
 "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related
 issue when the PR is merged.

 If you are adding/modifying/removing any command or utility script, please also
 make sure to add/modify/remove any unit tests from the tests
 directory as appropriate.

 If you are modifying or removing an existing 'show', 'config' or 'sonic-clear'
 subcommand, or you are adding a new subcommand, please make sure you also
 update the Command Line Reference Guide (doc/Command-Reference.md) to reflect
 your changes.

 Please provide the following information:
-->

#### What I did

Doing a reboot on SB enabled systems and check dmesg,

```
root@sonic/home/admin# dmesg -W
[33875.287197] ima: impossible to appraise a kernel image without a file descriptor; try using kexec_file_load syscall.
```

This is fixed in fast-reboot/warm-reboot script here sonic-net/sonic-utilities@317e649 but was missed in reboot script

#### How I did it

Use the -a argument with kexec

```
 -a, --kexec-syscall-auto Use file based syscall for kexec and fall
 back to the compatibility syscall when file based
 syscall is not supported or the kernel did not
 understand the image
```

Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>

#### How to verify it

#### Previous command output (if the output of a command-line utility has changed)

#### New command output (if the output of a command-line utility has changed)
<!--
 Please make sure you've read and understood our contributing guidelines:
 https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

 failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` **

 If this is a bug fix, make sure your description includes "closes #xxxx",
 "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related
 issue when the PR is merged.

 If you are adding/modifying/removing any command or utility script, please also
 make sure to add/modify/remove any unit tests from the tests
 directory as appropriate.

 If you are modifying or removing an existing 'show', 'config' or 'sonic-clear'
 subcommand, or you are adding a new subcommand, please make sure you also
 update the Command Line Reference Guide (doc/Command-Reference.md) to reflect
 your changes.

 Please provide the following information:
-->

#### What I did
For `fwutil show` command which displays the usage/help message reduce the time taken by lazily importing PlatformDataProvider. This reduced the average time taken by ~50%.

#### How I did it
Use a singleton PlatformDataProvider in fwutil/main.py
#### How to verify it
Before the change
```
Running 'fwutil show' 10 times (gap 5s)...
Run 1: 972 ms
Run 2: 1058 ms
Run 3: 948 ms
Run 4: 1213 ms
Run 5: 1507 ms
Run 6: 1235 ms
Run 7: 1553 ms
Run 8: 1037 ms
Run 9: 1000 ms
Run 10: 1037 ms
---- fwutil show stats ----
Avg: 1156 ms
Min: 948 ms
Max: 1553 ms
```
After the change
```
Running 'fwutil show' 10 times (gap 5s)...
Run 1: 496 ms
Run 2: 482 ms
Run 3: 466 ms
Run 4: 445 ms
Run 5: 482 ms
Run 6: 463 ms
Run 7: 780 ms
Run 8: 662 ms
Run 9: 653 ms
Run 10: 659 ms
---- fwutil show stats ----
Avg: 558 ms
Min: 445 ms
Max: 780 ms
```

Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>

#### Previous command output (if the output of a command-line utility has changed)

#### New command output (if the output of a command-line utility has changed)
…ndle OID update (#4429)

<!--
 Please make sure you've read and understood our contributing guidelines:
 https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

 failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` **

 If this is a bug fix, make sure your description includes "closes #xxxx",
 "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related
 issue when the PR is merged.

 If you are adding/modifying/removing any command or utility script, please also
 make sure to add/modify/remove any unit tests from the tests
 directory as appropriate.

 If you are modifying or removing an existing 'show', 'config' or 'sonic-clear'
 subcommand, or you are adding a new subcommand, please make sure you also
 update the Command Line Reference Guide (doc/Command-Reference.md) to reflect
 your changes.

 Please provide the following information:
-->

#### What I did
- Updated port_speed_change_validator to restart the telemetry container when a port speed is changed via GCU.
- Added unit tests to verify the correct behavior for port speed changes and no-change scenarios.

Microsoft ADO: 36692456

This is short term plan to restart telemetry in GCU.

Long term plan is telemetry reload automatically. Microsoft ADO: 36496651
#### How I did it
- Modified the validator to detect any port speed change and call _service_restart("telemetry") when detected, ensuring telemetry restarts and picks up the new port OID.
- Implemented tests using mocks for subprocess calls to simulate and verify the restart logic.

#### How to verify it
- Run the unit tests in service_validator_test.py to confirm that telemetry is restarted only when a port speed changes, and not for unrelated changes.
- Check that all tests pass, indicating correct validator and restart behavior.
- Test bed
```shell
stli@STG02-0101-0400-02T2-lc03:~$ show int sta
 Interface Lanes Speed MTU FEC Alias Vlan Oper Admin Type Asym PFC
--------------- --------------------------------------- ------- ----- ----- ----------------------- --------------- ------ ------- ------ ----------
...
 Ethernet64 264,265,266,267,268,269,270,271 400G 9100 rs FourHundredGigE0/3/0/8 routed down down N/A off

stli@STG02-0101-0400-02T2-lc03:~$ cat gcu_patch.json
 [
 {
 "op": "add",
 "path": "/asic0/PORT/Ethernet64",
 "value": {
 "index": "8",
 "description": "STG02-0101-0110-17T1:etp12",
 "alias": "HundredGigE0/3/0/8",
 "pfc_asym": "off",
 "fec": "rs",
 "speed": "100000",
 "mtu": "9100",
 "tpid": "0x8100",
 "lanes": "264,265,266,267",
 "asic_port_name": "Eth64-ASIC0",
 "role": "Ext",
 "admin_status": "up"
 }
 }
 ]

stli@STG02-0101-0400-02T2-lc03:~$ sudo config apply-patch gcu_patch.json
Patch Applier: asic0: Patch application starting.
Patch Applier: asic0: Patch: [{"op": "add", "path": "/PORT/Ethernet64", "value": {"index": "8", "description": "STG02-0101-0110-17T1:etp12", "alias": "HundredGigE0/3/0/8", "pfc_asym": "off", "fec": "rs", "speed": "100000", "mtu": "9100", "tpid": "0x8100", "lanes": "264,265,266,267", "asic_port_name": "Eth64-ASIC0", "role": "Ext", "admin_status": "up"}}]
Patch Applier: asic0 getting current config db.
Patch Applier: asic0: simulating the target full config after applying the patch.
Patch Applier: asic0: validating all JsonPatch operations are permitted on the specified fields
Patch Applier: asic0: validating target config does not have empty tables,
 since they do not show up in ConfigDb.
Patch Applier: asic0: sorting patch updates.
Patch Applier: The asic0 patch was converted into 6 changes:
Patch Applier: asic0: applying 6 changes in order:
Patch Applier: failure_prs.log skip_prs.log [{"op": "remove", "path": "/CABLE_LENGTH/AZURE/Ethernet64"}]
Patch Applier: failure_prs.log skip_prs.log [{"op": "replace", "path": "/PORT/Ethernet64/description", "value": "STG02-0101-0110-17T1:etp12"}]
Patch Applier: failure_prs.log skip_prs.log [{"op": "replace", "path": "/PORT/Ethernet64/speed", "value": "100000"}]
> /usr/local/lib/python3.11/dist-packages/generic_config_updater/services_validator.py(71)port_speed_change_validator()
-> return _service_restart("telemetry")
(Pdb) old_speed
'400000'
(Pdb) upd_speed
'100000'
(Pdb) c
Job for telemetry.service failed because start of the service was attempted too often.
See "systemctl status telemetry.service" and "journalctl -xeu telemetry.service" for details.
To force a start use "systemctl reset-failed telemetry.service"
followed by "systemctl start telemetry.service" again.
Patch Applier: failure_prs.log skip_prs.log [{"op": "remove", "path": "/PORT/Ethernet64"}]
Patch Applier: failure_prs.log skip_prs.log [{"op": "add", "path": "/PORT/Ethernet64", "value": {"index": "8", "description": "STG02-0101-0110-17T1:etp12", "alias": "HundredGigE0/3/0/8", "pfc_asym": "off", "fec": "rs", "speed": "100000", "mtu": "9100", "tpid": "0x8100", "lanes": "264,265,266,267", "asic_port_name": "Eth64-ASIC0", "role": "Ext", "admin_status": "up"}}]
Patch Applier: failure_prs.log skip_prs.log [{"op": "add", "path": "/CABLE_LENGTH/AZURE/Ethernet64", "value": "300m"}]
Patch Applier: asic0: verifying patch updates are reflected on ConfigDB.
Patch Applier: asic0 patch application completed.
Patch applied successfully.

stli@STG02-0101-0400-02T2-lc03:~$ show int sta
 Interface Lanes Speed MTU FEC Alias Vlan Oper Admin Type Asym PFC
--------------- --------------------------------------- ------- ----- ----- ----------------------- --------------- ------ ------- ------ ----------
...
 Ethernet64 264,265,266,267 100G 9100 rs HundredGigE0/3/0/8 routed down up N/A off

stli@STG02-0101-0400-02T2-lc03:~$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
aec6bcdb096d soniccr1.azurecr.io/chassis-port-counter-monitor:20260122 "/usr/local/bin/dock…" 7 days ago Up 7 days chassis-port-counter-monitor
2bb42c2ef60a docker-sonic-telemetry:latest "/usr/local/bin/supe…" 7 weeks ago Up About a minute telemetry
84e4a349cd4f docker-snmp:latest "/usr/bin/docker-snm…" 7 weeks ago Up 7 weeks snmp

stli@STG02-0101-0400-02T2-lc03:~$ sudo grep -i "spawned: 'telemetry'" /var/log/syslog
2026 Feb 5 02:08:13.183970 STG02-0101-0400-02T2-lc03 INFO telemetry#supervisord 2026-02-05 02:08:13,183 INFO spawned: 'telemetry' with pid 24
```

Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>

#### Previous command output (if the output of a command-line utility has changed)

#### New command output (if the output of a command-line utility has changed)
#### Why I did it

Many tests for various packages use /tmp/tmp.XXXXXXXX or /tmp/tmpi_XXXXX as the temporary file or directory pattern for mktemp. Since the same slave container is used for multiple simultaneous builds, destroying an in-progress build's temporary file or directory will cause those builds to fail.

While this has existed for a year, it appears the introduction of Trixie has reordered the builds a bit so that packages using the temp file patterns impacted are built simultaneously.

#### How I did it

It appears /tmp/tmpx and /tmp/tmpy are the toplevel directories created, clear those.

#### How to verify it

Run a trixie build with high parallelism.

#### Which release branch to backport

- [x] 202511

Fixes sonic-net/sonic-buildimage#25424

Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
<!--
 Please make sure you've read and understood our contributing guidelines:
 https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

 failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` **

 If this is a bug fix, make sure your description includes "closes #xxxx",
 "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related
 issue when the PR is merged.

 If you are adding/modifying/removing any command or utility script, please also
 make sure to add/modify/remove any unit tests from the tests
 directory as appropriate.

 If you are modifying or removing an existing 'show', 'config' or 'sonic-clear'
 subcommand, or you are adding a new subcommand, please make sure you also
 update the Command Line Reference Guide (doc/Command-Reference.md) to reflect
 your changes.

 Please provide the following information:
-->

#### Why I did
[PR #25398](sonic-net/sonic-buildimage#25398) changed the host's Docker config to use -H fd://. During image installation, sonic-installer copies these host options to run a temporary dockerd inside a chroot. Since systemd isn't running in the chroot to create and handover a socket dockerd crashes, causing sonic-package-manager migrate to fail.

#### What I did
Sanitized the copied DOCKER_OPTS so the temporary dockerd can start safely without systemd socket activation.
#### How I did it
- In sonic_installer/main.py (get_docker_opts), replaced fd:// with unix://.
- Updated mock arguments in test_sonic_installer.py to verify the sanitization logic.

#### How to verify it

Run sudo sonic-installer install <image.bin> and verify the installation completes successfully without failing during the package migration step.

Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
<!--
 Please make sure you've read and understood our contributing guidelines:
 https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

 failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` **

 If this is a bug fix, make sure your description includes "closes #xxxx",
 "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related
 issue when the PR is merged.

 If you are adding/modifying/removing any command or utility script, please also
 make sure to add/modify/remove any unit tests from the tests
 directory as appropriate.

 If you are modifying or removing an existing 'show', 'config' or 'sonic-clear'
 subcommand, or you are adding a new subcommand, please make sure you also
 update the Command Line Reference Guide (doc/Command-Reference.md) to reflect
 your changes.

 Please provide the following information:
-->
fixes #4292
#### What I did
The `generate_dump` script creates a symlink to itself in the dump directory before generating the initial tar archive. However, due to a typo in the cleanup step, the symlink was not correctly removed. As a result, when the dump directory was appended later using `save_to_tar`, the `generate_dump` entry was added a second time.

This patch corrects the cleanup target so that `generate_dump` symlink is properly removed, preventing duplicate.
#### How I did it
Updated the cleanup step in `generate_dump`:
From:
`$RM $V -f $TARDIR/sonic_dump`
To:
`$RM $V -f $TARDIR/generate_dump`
#### How to verify it
1. Run `show techsupport` on a SONiC target device.
2. Inspect the resulting tar archive in the path `/var/dump`. It should look like `sonic_dump_sonic_20260116_001237.tar.gz`.
3. Confirm that the dump directory should only contain a single `generate_dump`.

**Which release branch to backport**

- [x] 202305
- [x] 202311
- [x] 202405
- [x] 202411
- [x] 202505
- [x] 202511
- [x] 202605

Signed-off-by: Yi Xu <Yi.Xu@lumentum.com>
Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
…ti-ASIC platforms (#4433)

<!--
 Please make sure you've read and understood our contributing guidelines:
 https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

 failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` **

 If this is a bug fix, make sure your description includes "closes #xxxx",
 "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related
 issue when the PR is merged.

 If you are adding/modifying/removing any command or utility script, please also
 make sure to add/modify/remove any unit tests from the tests
 directory as appropriate.

 If you are modifying or removing an existing 'show', 'config' or 'sonic-clear'
 subcommand, or you are adding a new subcommand, please make sure you also
 update the Command Line Reference Guide (doc/Command-Reference.md) to reflect
 your changes.

 Please provide the following information:
-->

#### What I did
Added multi-ASIC platform support for `show interfaces flap` command
#### How I did it
- Added `multi_asic_click_options` decorator to support `-n/--namespace` and `-d/--display` options
- Updated the command to iterate across all namespaces and aggregate flap data from each ASIC's APP_DB
- Added flap mock data to multi-ASIC mock tables (`asic0/appl_db.json`, `asic1/appl_db.json`)
- Added multi-ASIC test file `tests/flap_test.py`
#### How to verify it
```bash
# Run multi-ASIC tests
pytest tests/flap_test.py -v

# On a multi-ASIC device
show interfaces flap
show interfaces flap -n asic0
```
#### Previous command output (if the output of a command-line utility has changed)
```bash
admin@sonic:~$ show interfaces flap
Interface Flap Count Admin Oper Link Down TimeStamp(UTC) Link Up TimeStamp(UTC)
----------- ------------ ------- ------- -------------------------- ------------------------
etp1 Never Unknown Unknown Never Never
etp2 Never Unknown Unknown Never Never
etp3 Never Unknown Unknown Never Never
etp4 Never Unknown Unknown Never Never
...

admin@sonic:~$ show interfaces flap -n asic0
Usage: show interfaces flap [OPTIONS] [INTERFACENAME]
Try "show interfaces flap -h" for help.

Error: no such option: -n
```
#### New command output (if the output of a command-line utility has changed)
```bash
admin@sonic:~$ show interfaces flap
Interface Flap Count Admin Oper Link Down TimeStamp(UTC) Link Up TimeStamp(UTC)
----------- ------------ ------- ------ -------------------------- ------------------------
Ethernet0 4097 Up Up Sat Feb 21 11:00:41 2026 Sat Feb 21 11:00:59 2026
Ethernet8 4035 Up Up Sat Feb 21 11:00:41 2026 Sat Feb 21 11:00:59 2026
Ethernet16 4015 Up Up Sat Feb 21 11:01:23 2026 Sat Feb 21 11:01:41 2026
Ethernet24 4019 Up Up Sat Feb 21 11:01:23 2026 Sat Feb 21 11:01:41 2026
...

admin@sonic:~$ show interfaces flap -n asic0
Interface Flap Count Admin Oper Link Down TimeStamp(UTC) Link Up TimeStamp(UTC)
----------- ------------ ------- ------ -------------------------- ------------------------
Ethernet0 4097 Up Up Sat Feb 21 11:00:41 2026 Sat Feb 21 11:00:59 2026
Ethernet8 4035 Up Up Sat Feb 21 11:00:41 2026 Sat Feb 21 11:00:59 2026
Ethernet16 4015 Up Up Sat Feb 21 11:01:23 2026 Sat Feb 21 11:01:41 2026
Ethernet24 4019 Up Up Sat Feb 21 11:01:23 2026 Sat Feb 21 11:01:41 2026
...
```

Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
Fixes issue sonic-net/sonic-utilities#4375

Currently when using `sudo ip netns exec <namespace> counterpoll <args>` it will ignore the namespace and use the default namespace. This patch fixes that behavior to use namespace that the command is running in.

This is how the command worked previously but a regression was introduced breaking this behaviour.

<!--
 Please make sure you've read and understood our contributing guidelines:
 https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

 failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` **

 If this is a bug fix, make sure your description includes "closes #xxxx",
 "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related
 issue when the PR is merged.

 If you are adding/modifying/removing any command or utility script, please also
 make sure to add/modify/remove any unit tests from the tests
 directory as appropriate.

 If you are modifying or removing an existing 'show', 'config' or 'sonic-clear'
 subcommand, or you are adding a new subcommand, please make sure you also
 update the Command Line Reference Guide (doc/Command-Reference.md) to reflect
 your changes.

 Please provide the following information:
-->

#### What I did
I changed the default value of the -n arg to be the namespace we are running the command in.

#### How to verify it

I verified that with my change the namespace being run in will be the default namespace chosen.

```
$ sonic-db-cli -n asic0 CONFIG_DB hgetall "FLEX_COUNTER_TABLE|PG_DROP"
{'FLEX_COUNTER_STATUS': 'enable'}
$ sudo ip netns exec asic0 counterpoll pg-drop disable
$ sonic-db-cli -n asic0 CONFIG_DB hgetall "FLEX_COUNTER_TABLE|PG_DROP"
{'FLEX_COUNTER_STATUS': 'disable'}
```

 I also verified that the command exits gracefully if the user attempts to confuse it with 2 namespaces like so:
```
sudo ip netns exec asic0 counterpoll pg-drop -n asic1 disable
Usage: counterpoll pg-drop [OPTIONS] COMMAND [ARGS]...
Try 'counterpoll pg-drop --help' for help.

Error: Invalid value for '-n' / '--namespace': 'asic1' is not 'asic0'.
```

#### Previous command output (if the output of a command-line utility has changed)
```
$ sonic-db-cli -n asic0 CONFIG_DB hgetall "FLEX_COUNTER_TABLE|PG_DROP"
{'FLEX_COUNTER_STATUS': 'enable'}
$ sudo ip netns exec asic0 counterpoll pg-drop disable
$ sonic-db-cli -n asic0 CONFIG_DB hgetall "FLEX_COUNTER_TABLE|PG_DROP"
{'FLEX_COUNTER_STATUS': 'enable'}
```

#### New command output (if the output of a command-line utility has changed)
Seen above

Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
… match on interface name (#4435)

<!--
 Please make sure you've read and understood our contributing guidelines:
 https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

 failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` **

 If this is a bug fix, make sure your description includes "closes #xxxx",
 "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related
 issue when the PR is merged.

 If you are adding/modifying/removing any command or utility script, please also
 make sure to add/modify/remove any unit tests from the tests
 directory as appropriate.

 If you are modifying or removing an existing 'show', 'config' or 'sonic-clear'
 subcommand, or you are adding a new subcommand, please make sure you also
 update the Command Line Reference Guide (doc/Command-Reference.md) to reflect
 your changes.

 Please provide the following information:
-->

#### Why I did
config interface ip remove falsely blocks IP removal when the target interface name is a substring of another interface that has a static route. For example, removing the last IP from Ethernet24 fails because "Ethernet24" in "...Ethernet240..." evaluates to True.
```
# Given: static route already exists via Ethernet240
admin@sonic:~$ show ip route vrf all static | grep Ethernet240
S>* 10.1.0.7/32 [1/0] via 18.2.202.1, Ethernet240, weight 1, 1d15h47m

# Remove IPv4 so IPv6 becomes the last IP on Ethernet24
admin@sonic:~$ sudo config interface ip remove Ethernet24 10.0.0.44/31

# Try to remove the last IP from Ethernet24
admin@sonic:~$ sudo config interface ip remove Ethernet24 FC00::59/126
Error: Cannot remove the last IP entry of interface Ethernet24. A static ip route is still bound to the RIF.
```

Despite the above error - No static route is bound to Ethernet24 -- the route is on Ethernet240.

#### What I did/How I did it
Replaced the Python in substring check with a regex word-boundary (\b) match when validating whether a static route references the interface being modified.

#### How to verify it
Before fix
```
admin@sonic:~$ sudo config interface ip remove Ethernet24 10.0.0.44/31
admin@sonic:~$ sudo config interface ip remove Ethernet24 FC00::59/126
Usage: config interface ip remove [OPTIONS] <interface_name> <ip_addr>
Try 'config interface ip remove -h' for help.

Error: Cannot remove the last IP entry of interface Ethernet24. A static ip route is still bound to the RIF.
```

After Fix
```
admin@sonic:~$ sudo config interface ip remove Ethernet24 10.0.0.44/31
admin@sonic:~$ sudo config interface ip remove Ethernet24 FC00::59/126
admin@sonic:~$
```

Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
#### Previous command output (if the output of a command-line utility has changed)

#### New command output (if the output of a command-line utility has changed)
CLI does not allow the empty '' global namespace and this causes MGMT test failures.

And requesting backport for
- [x] 202511

<!--
    Please make sure you've read and understood our contributing guidelines:
    https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

    ** Make sure all your commits include a signature generated with `git commit -s` **

    If this is a bug fix, make sure your description includes "closes #xxxx",
    "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related
    issue when the PR is merged.

    If you are adding/modifying/removing any command or utility script, please also
    make sure to add/modify/remove any unit tests from the tests
    directory as appropriate.

    If you are modifying or removing an existing 'show', 'config' or 'sonic-clear'
    subcommand, or you are adding a new subcommand, please make sure you also
    update the Command Line Reference Guide (doc/Command-Reference.md) to reflect
    your changes.

    Please provide the following information:
-->

#### What I did
Added a new function which will include the global namespace as a valid namespace option on multi-asic when the command is not run within a namespace.

#### Previous command output (if the output of a command-line utility has changed)
`counterpoll queue enable`
```
Usage: counterpoll queue [OPTIONS] COMMAND [ARGS]...
Try 'counterpoll queue --help' for help.

Error: Invalid value for '-n' / '--namespace': '' is not one of 'asic0', 'asic1'.
```

#### New command output (if the output of a command-line utility has changed)
(No output, success)

Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
<!--
 Please make sure you've read and understood our contributing guidelines:
 https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

 failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` **

 If this is a bug fix, make sure your description includes "closes #xxxx",
 "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related
 issue when the PR is merged.

 If you are adding/modifying/removing any command or utility script, please also
 make sure to add/modify/remove any unit tests from the tests
 directory as appropriate.

 If you are modifying or removing an existing 'show', 'config' or 'sonic-clear'
 subcommand, or you are adding a new subcommand, please make sure you also
 update the Command Line Reference Guide (doc/Command-Reference.md) to reflect
 your changes.

 Please provide the following information:
-->

#### What I did
This PR introduces a new monitoring script to check synchronization of LAG (Link Aggregation Group) IDs between the chassis database and ASIC databases on VOQ chassis line cards. The script is designed to be run by Monit and will alert via syslog when mismatches are detected.

#### How I did it
Key Changes
Added chassis_lag_id_checker script that retrieves and compares LAG IDs from chassis_db and asic_db, reporting mismatches per ASIC namespace
Comprehensive test suite with fixtures for mocking Redis dumps and ASIC/device configurations
Integration into setup.py for proper installation
#### How to verify it
test on voq chassis and UT

Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>

#### Previous command output (if the output of a command-line utility has changed)

#### New command output (if the output of a command-line utility has changed)
<!--
 Please make sure you've read and understood our contributing guidelines:
 https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

 failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` **

 If this is a bug fix, make sure your description includes "closes #xxxx",
 "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related
 issue when the PR is merged.

 If you are adding/modifying/removing any command or utility script, please also
 make sure to add/modify/remove any unit tests from the tests
 directory as appropriate.

 If you are modifying or removing an existing 'show', 'config' or 'sonic-clear'
 subcommand, or you are adding a new subcommand, please make sure you also
 update the Command Line Reference Guide (doc/Command-Reference.md) to reflect
 your changes.

 Please provide the following information:
-->
Fixes #4406
#### What I did
Enable express boot support for Marvell Teralynx platform.
#### How I did it
Updated fast-reboot script (which is soft-linked to express-reboot script) to allow Marvell Teralynx platform for express boot.
#### How to verify it
Execute "express-reboot" command.
#### Previous command output (if the output of a command-line utility has changed)
```
root@sonic:/home/admin# show version

SONiC Software Version: SONiC.202511.77-dirty-20260224.183644
SONiC OS Version: 13
Distribution: Debian 13.3
Kernel: 6.12.41+deb13-sonic-amd64
Build commit: 1a2154e68
Build date: Tue Feb 24 19:15:37 UTC 2026
Built by: marvell@cpss-rdanda20-new

Platform: x86_64-wistron_sw_to3200k-r0
HwSKU: Wistron_sw_to3200k
ASIC: marvell-teralynx
ASIC Count: 1
Serial Number: N/A
Model Number: N/A
Hardware Revision: N/A
Uptime: 13:51:48 up 8:37, 1 user, load average: 2.03, 2.43, 2.45
Date: Sun 22 Mar 2026 13:51:48

root@sonic:/home/admin# express-reboot
eXpress Boot is not supported
root@sonic:/home/admin#
```
#### New command output (if the output of a command-line utility has changed)
```
root@sonic:/home/admin# express-reboot -v
Sun Mar 22 02:01:07 PM UTC 2026 Starting express-reboot
Sun Mar 22 02:01:11 PM UTC 2026 Checking for active PFC storms...
Sun Mar 22 02:01:11 PM UTC 2026 No active PFC storms detected. Safe to proceed with warm-reboot...
Sun Mar 22 02:01:13 PM UTC 2026 Loading kernel without secure boot
Sun Mar 22 02:01:14 PM UTC 2026 Starting lag_keepalive to send LACPDUs ...
Sun Mar 22 02:01:21 PM UTC 2026 Pausing orchagent ...
Sun Mar 22 02:01:22 PM UTC 2026 Collecting logs to check ssd health before express-reboot...
Sun Mar 22 02:01:22 PM UTC 2026 Stopping aaastatsd.timer ...
Sun Mar 22 02:01:22 PM UTC 2026 Stopped aaastatsd.timer ...
Sun Mar 22 02:01:22 PM UTC 2026 Stopping featured.timer ...
Sun Mar 22 02:01:22 PM UTC 2026 Stopped featured.timer ...
Sun Mar 22 02:01:22 PM UTC 2026 Stopping hostcfgd.timer ...
Sun Mar 22 02:01:22 PM UTC 2026 Stopped hostcfgd.timer ...
Sun Mar 22 02:01:22 PM UTC 2026 Stopping tacacs-config.timer ...
Sun Mar 22 02:01:22 PM UTC 2026 Stopped tacacs-config.timer ...
Sun Mar 22 02:01:22 PM UTC 2026 Stopping lldp ...
Sun Mar 22 02:01:24 PM UTC 2026 Stopped lldp
Sun Mar 22 02:01:24 PM UTC 2026 Stopping radv ...
Sun Mar 22 02:01:24 PM UTC 2026 Stopped radv
Sun Mar 22 02:01:24 PM UTC 2026 Stopping bgp ...
Sun Mar 22 02:01:35 PM UTC 2026 Stopped bgp
Sun Mar 22 02:01:35 PM UTC 2026 Stopping swss ...
Sun Mar 22 02:01:36 PM UTC 2026 Stopped swss
Sun Mar 22 02:01:36 PM UTC 2026 Initialize pre-shutdown ...
Sun Mar 22 02:01:36 PM UTC 2026 Requesting express boot pre-shutdown ...
Sun Mar 22 02:01:36 PM UTC 2026 Waiting for pre-shutdown ...
Sun Mar 22 02:01:36 PM UTC 2026 Pre-shutdown succeeded, state: pre-shutdown-succeeded ...
Sun Mar 22 02:01:36 PM UTC 2026 Stopping dash-ha ...
Sun Mar 22 02:01:36 PM UTC 2026 Stopped dash-ha
Sun Mar 22 02:01:36 PM UTC 2026 Stopping teamd ...
Sun Mar 22 02:01:36 PM UTC 2026 Stopped teamd
Sun Mar 22 02:01:36 PM UTC 2026 Stopping syncd ...
Sun Mar 22 02:01:43 PM UTC 2026 Stopped syncd
Sun Mar 22 02:01:43 PM UTC 2026 Backing up database ...
Successfully copied 99.3kB to /host/warmboot
Sun Mar 22 02:01:49 PM UTC 2026 Enabling Watchdog before express-reboot
Sun Mar 22 02:01:50 PM UTC 2026 Rebooting with /sbin/kexec -e to SONiC-OS-202511.77-dirty-20260224.183644 ...
0

root@sonic:/home/admin# show reboot-cause
User issued 'express-reboot' command [User: root, Time: Sun Mar 22 02:04:38 PM UTC 2026]

root@sonic:/home/admin# show warm state
name restore_count state
------------- --------------- -----------------------
gearsyncd 1
coppmgrd 1
teamsyncd 1 reconciled
fdbsyncd 1 reconciled
syncd 1
intfmgrd 1 reconciled
teammgrd 1
vxlanmgrd 1 reconciled
portsyncd 1
rebootbackend 0
vrrpsyncd 1 reconciled
orchagent 1 reconciled
vrfmgrd 1 reconciled
tunnelmgrd 1 reconciled
vlanmgrd 1 reconciled
nbrmgrd 1
warm-shutdown 0 warm-shutdown-succeeded
neighsyncd 1 reconciled
bgp 1 reconciled

root@sonic:/home/admin# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
1dbe05c1bc28 docker-snmp:latest "/usr/bin/docker-snm…" 11 days ago Up 19 minutes snmp
251bbf5ea0fe docker-platform-monitor:latest "/usr/bin/docker_ini…" 11 days ago Up 19 minutes pmon
5aa6fb90c2c0 docker-sonic-mgmt-framework:latest "/usr/local/bin/supe…" 11 days ago Up 19 minutes mgmt-framework
0297eb7d4a29 docker-lldp:latest "/usr/bin/docker-lld…" 11 days ago Up 19 minutes lldp
4518a126f278 docker-sonic-gnmi:latest "/usr/local/bin/supe…" 11 days ago Up 19 minutes gnmi
bdc58b171baf docker-eventd:latest "/usr/local/bin/supe…" 11 days ago Up 24 minutes eventd
16f2c5ac43e8 docker-router-advertiser:latest "/usr/bin/docker-ini…" 11 days ago Up 24 minutes radv
fad1c32aaa02 docker-syncd-mrvl-teralynx-rpc:latest "/usr/local/bin/supe…" 11 days ago Up 24 minutes syncd
d8c982fefad6 docker-teamd:latest "/usr/local/bin/supe…" 11 days ago Up 24 minutes teamd
22e97f1a5b45 docker-sysmgr:latest "/usr/local/bin/supe…" 11 days ago Up 24 minutes sysmgr
709a51254b33 docker-orchagent:latest "/usr/bin/docker-ini…" 11 days ago Up 24 minutes swss
7f18cb72c949 docker-database:latest "/usr/local/bin/dock…" 11 days ago Up 24 minutes database

root@sonic:/home/admin# date
Sun Mar 22 02:26:58 PM UTC 2026

Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
When a single EEPROM page fails to read during `sfputil show eeprom-hexdump`, the command no longer aborts. It continues dumping the remaining pages and returns success (exit 0), with the failed page’s error message included in the output.

#### What I did
- **Bug**: On some modules, `sfputil show eeprom-hexdump` failed with "Failed to read EEPROM for page" and exited with an error, so no EEPROM data was shown and techsupport dumps did not contain `dump/interface.xcvrs.eeprom.RAW`.
- **Fix**: Treat a single-page read failure as non-fatal: append the error line to the output and continue with the next pages. The command still exits 0 so techsupport and scripts get a full/ partial dump.

#### How I did it
- **sfputil/main.py**
 - `eeprom_hexdump_pages_general()`: Removed early `return return_code, output` on page read failure. Always append `output` and continue, then return `(0, '\n'.join(lines))`.
 - `eeprom_hexdump_pages_sff8472()`: Same behavior for SFF8472 (A0h / A2h pages): no early return on failure, append output and continue.
- **tests/sfputil_test.py**
 - `test_eeprom_hexdump_all_falure`: Updated to expect exit 0 and that both ports’ headers and error messages appear in output (continue-on-failure behavior).
 - Added `test_eeprom_hexdump_pages_general_continues_on_single_page_failure`: one page fails, others succeed, asserts rc == 0 and output contains both error and successful page content.
 - Added `test_eeprom_hexdump_pages_sff8472_continues_on_single_page_failure`: same for SFF8472 (A0h ok, A2h lower fail, A2h upper ok).

#### How to verify it
- Unit tests: new and updated tests in `tests/sfputil_test.py` for continue-on-single-page-failure behavior.
- Manual: `sfputil show eeprom-hexdump` and `show techsupport` on some modules to confirm partial EEPROM dump and presence of `interface.xcvrs.eeprom.RAW`.
On EEPROM page read failure, log which page failed and continue dumping other pages instead of returning. Fixes modules where e.g. some page fails but other pages are readable. Allows techsupport dumps to include interface.xcvrs.eeprom.RAW.

#### Which release branch to backport (provide reason below if selected)
<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->
- [x] 202511

##### **Before this change:**
Run "sfputil show eeprom-hexdump":
Error "Failed to read EEPROM for page 10h", and stopped there.
Run "show techsupport":
In the generated dumps, the file dump/interface.xcvrs.eeprom.RAW does not exist.

##### **After this change:**
Run "sfputil show eeprom-hexdump":
```bash
EEPROM hexdump for port Ethernet0
 Lower page 0h
 00000000 80 53 45 07 00 00 00 00 00 00 00 00 00 00 00 00 |.SE.............|
 00000010 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 |..........@.....|
 00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
 00000030 00 00 00 00 00 00 00 00 03 00 00 00 00 00 80 00 |................|
 00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
 00000050 00 00 00 00 00 02 4c 14 11 ff 4b 14 11 ff 4e 1c |......L...K...N.|
 00000060 22 55 4d 1c 22 55 50 1c 44 11 4f 1c 44 11 ff 00 |"UM."UP.D.O.D...|
 00000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

 Upper page 0h
 00000080 80 4e 56 49 44 49 41 20 20 20 20 20 20 20 20 20 |.NVIDIA |
 00000090 20 00 00 00 56 4d 4f 44 5f 53 50 43 35 5f 43 50 | ...VMOD_SPC5_CP|
 000000a0 4f 20 20 20 00 00 56 4d 4f 44 5f 53 50 43 35 5f |O ..VMOD_SPC5_|
 000000b0 43 50 4f 20 30 30 00 00 00 00 00 00 00 00 20 20 |CPO 00........ |
 000000c0 20 20 20 20 20 20 20 20 50 00 00 00 00 00 00 00 | P.......|
 000000d0 00 00 00 00 04 00 00 00 00 00 00 00 00 00 d1 00 |................|
 000000e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
 000000f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

 Upper page 1h
 00000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
 00000090 00 80 46 00 00 00 9d 00 00 00 00 00 00 00 00 03 |..F.............|
 000000a0 00 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
 000000b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
 000000c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
 000000d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
 000000e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
 000000f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 69 |...............i|

 Upper page 2h
 00000080 5f 00 f6 00 5a 00 05 00 88 b8 79 18 87 5a 7a 76 |_...Z.....y..Zzv|
 00000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
 000000a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
 000000b0 fc 35 27 10 ea 91 3d f9 00 00 00 00 00 00 00 00 |.5'...=.........|
 000000c0 62 20 09 d0 4d ee 0c 5a 00 00 00 00 00 00 00 00 |b ..M..Z........|
 000000d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
 000000e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
 000000f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 6b |...............k|

 Upper page 10h
Error: Failed to read EEPROM for page 10h, flat_offset 2176, page_offset 128, size 128!

 Upper page 11h
Error: Failed to read EEPROM for page 11h, flat_offset 2304, page_offset 128, size 128!
```
Read Upper page 10h fail, but can continue to read latter page 11h, previously it will return directly. Multiple pages read failure are shown in page 10h and 11h.
Run "show techsupport":
In the generated dumps, the file dump/interface.xcvrs.eeprom.RAW exist.

Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
* mertirc and vnetname code into 202511

Signed-off-by: alawing <alawing@gmail.com>

* fix merge issue

Signed-off-by: alawing <alawing@gmail.com>

---------

Signed-off-by: alawing <alawing@gmail.com>
…validators

* Add th6 broadcom ASIC entry with all Nokia IXR7220-H6 HWSKUs:
  Nokia-IXR7220-H6-64, Nokia-IXR7220-H6-P128, Nokia-IXR7220-H6-O256
* Add unit test for th6 ASIC name resolution

Signed-off-by: dygodwin <dylan.godwin@nokia.com>
[202512] Cherry-pick PR #4223: Add Nokia TH6 HWSKUs to GCU field operation validators
@weimingx weimingx closed this May 18, 2026
@weimingx weimingx deleted the 202511_azd branch May 18, 2026 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.