Create 202511_azd branch on sonic-utilities.msft#327
Closed
weimingx wants to merge 615 commits into
Closed
Conversation
…U validators (#3658) [Mellanox] Add Mellanox-SN5610N-C256S2, Mellanox-SN5610N-C224O8 to GCU validators
Add Arista 7060X6-64 to gcu validator file. What I did Added the Arista-7060X6-64DE and Arista-7060X6-64PE hwsku platforms to the th5 entry in the gcu_field_operation_validators.conf.json file. How to verify it Run the sonic-mgmt gcu tests and confirm that apply-patch now works for hwskus under these platforms. Applicable Backport Branches 202405 202411
* [show][interfaces] Add proposal for show interfaces history Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * change name Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * corrections Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * add changes Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * correct syntax Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * fix comments Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> --------- Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>
What I did Add strict YANG validation for full config command. How I did it Fail if found table with no YANG support How to verify it UT
… (#3693) - What I did Some platforms may have multiple primary disks which makes it ambiguous to determine the device when there is no device specific in the ssdhealth command - How I did it Update the show platform command to check platform.json - How to verify it UT's vkarri@85964d14e169:/sonic/src/sonic-utilities$ pytest-3 tests/show_platform_test.py -k "ssdhealth" -v collected 8 items / 6 deselected / 2 selected tests/show_platform_test.py::TestShowPlatformSsdhealth::test_ssdhealth PASSED [ 50%] tests/show_platform_test.py::TestShowPlatformSsdhealth::test_ssdhealth_default_device PASSED [100%] Verified the CLI is printing the health output for expected device Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
- What I did Remove debug dump import by default, Starting 202411, debug dump started to import libdashapi to parse DASH objects. Keeping this in default import path is causing CLI execution time to increase. - How I did it Make the import on-demand - How to verify it UT and verified on the image
[config] Exit with non-zero when qos reload fail
* [show][interface] Add changes for interface errors command Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * remove redundant lines Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * add correction Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * back reverted Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * add ch Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * fix static Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * add phg Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * add prest Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * add fgr Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * add all Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * remove all redundant spaces Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * add key Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * add cgre Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> --------- Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>
* [show][interface] Add changes for show interface flap command Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * fix files Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * add ch Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * add ch Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * pep8 Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * fix lines Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * fix test Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * add test vals Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * wrap lines Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * fix tests Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * add fixes Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * fix all Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * fix tests Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * add tests Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * fix nit Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * fix Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * add alll Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * add fd Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> --------- Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>
* [Acl] Display rule and table info written to APP DB Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
* CLI support for SmartSwitch PMON * imad minor fixes * Did some cleanup for backward compatibility * removed the column wrapping * Made it backward compatible and removed textwrap and added ut to PR * 1. There was a duplication of part of a function and that has been addressed. 2. The DPU reboot-cause data is fetched directly fromn the chassis_state_db now * reboot_cause and system_health are obtained directly from chassisStateDB now * The expected and result are the same but the test is throwing an error, temporarily bypassing the check * Let us get the build going and then look into the test mockup * Implemented as per the pmon hld, also made some improvements in the implementation * Fixed the key for CHASSIS_MODULE_INFO_TABLE entries * Fixed "show reboot-cause all" and "show reboot-cause history all" * Addressing review comments * Checking if the test issue still exists * Resolving SA errors triggered due to reboot_cause_test * Resolved pre-commit issues * Resolved pre-commit issues * Improving coverage * Fixed SA related warnings * Did some cleanup * Minor improvements and fixes * Adding tests for system health * Adding more system health related tests * Fixed a minor issue * Fixed long line SA issue * Trying to please SA * Trying to improve coverage * import mock * Fixed a typo * mocking DB * Fixed syntax issues * DB mock fix * removed unused import * creating ut for dpu state * Improving coverage * Fixed a typo * Adjusted the reboot-cause key as per the updated hld * Added fix to gracefully handle sytem health DB keys not present case * Addressed minor review comments * Addressed review comments. Commented out system-health support until phase:2 * Resolved minor issues and SA failures * Added role to PORT table in config_db. Using role to differentiate npu-dpu data plane connection in SmartSwitch with Dpc being the role. Did a minor cleanup. * Resolving pre-commit check error related to line > 120 * Trying to avoid pre-commit issues * Testing SA and precommit checks * Making it backward compatible * Resolving column size and whitespace issue * Working on SA issue * Testing SA and UT * Added 2 spaces before inline comment * Enabling "show system-health dpu" cli alone. The rest of the dpu health is differed for now. * Fixed SA issues * Adde new line at EOF * Enabling the UT for the CLI "show system-health dpu" * Resolved SA issues * Resolved a SA issue * Added smartswitch specific "reboot-cause" and "reboot-cause history" CLI extensions * Removed the phase:2 related system-health cli extensions as a seperate PR will be raised eventually for phase:2 * Using smartswitch qualifier for the clie extensions * Fixed SA issues * mocking device_info for test cases * import patch in tests * Debugging test failure * Fixing SA issues * fixing sa issues * Debugging sa issues * trying to resolve sa issues * fixed indentation * debugging * debugging * debugging * debugging * Debugging * debugging * debugging * Debugging * Debugging * Debuggingg * Debugging * Debugging * Debugging * Debugging * Debugging * Debugging * Debugging * Debugging * Debugging * Debugging * Debugging * Debugging * Debugging * Debugging * Debugging * Debugging * Debuggingg * Debugging * Debugging * Debugging * Debugging * Debugging * Debugging * Debugging * Debugging * Debugging * Debugging * Debugging * Debugging * Debugging * Debugging * Removing the test to build an image * Removed mock import * Improving coverage * pleasing SA * Fixing tests for design changes as per review comments * Resolving test failure * fixed indentation * cleaned up the test case * Addressed review comments in Command-Reference.md and trying to improve coverage * Improving coverage * Fixed a test issue * Addressed review comments * Addressed review comment. Reading DPUs list from config_db.json * Improving coverage * Resolved SA error * Trying to improve coverage. Also, reading from platform.json * adding json import in the test * Fixed a test failure * Fixed SA error * Exercising the new function in test * Removed a blank line * fixing mock issue * Trying a different approach * working on coverage * debugging * debugging * Debugging * Increasing coverage * improving coverage * Adjusting the show cli implementation to align with the reboot-cause changes such as 1. STATE_DB vs CHASSIS_STATE_DB and the key info * Fixing a minor issue * Removed ID column from the "show system-health dpu DPUx" cli as per the new requirement * Addressed default dpu admin status for dark-mode and seamless migration to lightup mode * Resolving SA issue * Resolved a typo * Added checks to see if module_name is valid in the "config chassis modules startup DPUx" cli aand also moved all the required utilities to the common file * Fixed white space issues * Cleaned unwanted import * Fixed build issues * missedout the fixes in a couple of files * With the recent code the app_db multi_asic.PORT_ROLE is Dpc for DPU ports, earlier this was not the case. So removing the additional check. * As the port role issue is no longer seen in smartswitch, cleaning up the related chnages. * Using the verbose define for TYPE_DPC in the CLI, if there is a specific requirement to keep 'TYPE_DPC = Dpc", which is the role, then we will revert it * Reverting intfutil_test.py * Using the common API to get_dpu_list * Removed unused import json * Addressed review comments * Did some minor cleanp * Fix: SA error * Addressed review comments * Addressed review comments * Addressed review comments * Addressed review comments * Addressed review comments * Addressed review comments * Addressed review comments * Addressed review comments * Addressed review comments * Addressed review comments * Addressed review comments * Addressed review comments * Addressed review comments * Addressed review comments * Addressed review comments * Addressed review comments * Addressed review comments * Addressed review comments * Addressed review comments * Added fix for issue:21372 - Device name column shows NPU instead of module name * Added fix for issue:21372 - Fixing the device name colum in the cli output * Added a few review comments
Remove partially installer image when image install failed. What I did When install image failed, partially installed image not removed, which may cause disk full issue on small disk device. How I did it Handle install image SystemExit exception and uninstall image. How to verify it Manually verify. Pass all test case
* Add recover hardware config if load golden config. * fix format * Refactor with exist function. * Fix format * Remove deprecated tests * fix asicid * remove unrelated changes * Remove unrelated changes. * Recover the old implementation * Remove unused test. * Recover the old tests * Adjust comments.
What I did On Chassis-packet supervisor, show interface counter -d all returns no data How I did it Fixed the condition in portstat.py to use right path for Chassis Packet RP to collect counters. Added condition to collect link state information. How to verify it Run 'show interface counters -d all' on Chassis-Packet Supervisor
Fix ssdhealth failure on VS platform
…ps (#3739) * Make 'show ip bgp summary' work even when we don't have any peer groups configured. * fixing the indentation issues
What I did Fix the calls for spanning-tree commands in dump script. During call to generate techsupport, we can see that the spanning tree commands fail: . Error: No such command "spanning_tree". timeout --foreground 5m bash -c "dummy_cleanup_method () . How I did it Change from show spanning_tree to the actual command show spanning-tree How to verify it Call show techsupport
sonic-utilities: WRED stats feature changes
…chsupport (#3745)
- What I did Add support for Mellanox SN5640 new platform and hwSKUs - How I did it Add new files and folders to support the new platform and modified relevant existing files - How to verify it Install image with the change on Mellanox SN5640 setup and run tests
* display proper message with proper errno for kvm. * this change should cover all flags. * add unit test. * fix typo. * fix unit test. * whitespace EOF * delete dead variable.
* Optimize lag_keepalive by crafting the LACPDU packet ourselves Instead of waiting for a LACPDU packet to be sent and capturing that (which involves waiting roughly 30 seconds), get the necessary information from teamd and craft it ourselves. This means that the 60-second wait in making sure that a LACPDU packet is captured and the keepalive script is ready can be largely eliminated (this has been reduced to 10 seconds to make sure the script has a chance to craft the packets and send some LACPDUs). Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> * Fix pre-commit errors Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> * Keep a socket open, and reuse that for sending LACPDUs Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> * Add logic to fork into background after collecting information Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> * Start lag_keepalive before OA pause, and fork after building packets Start lag_keepalive before pausing orchagent, so that there's less of a delay between when orchagent is paused and when kexec happens, and so that fewer events/changes aren't handled by orchagent. Additionally, add an option into the lag_keepalive script to fork into the background after generating the LACPDUs and opening sockets, but before sending the actual packets. This serves as a sort-of error check to make sure that it is at least able to send LACPDU packets, and didn't bail out early. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> --------- Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
* Fixing 'show ip bgp neighbor <ip>' in frr unified config mode by making sure to get the neighbor key from config_db either in 'x.x.x.x' format or alternatively in 'default|x.x.x.x' format as is the case with unified mode * fixing the neighbor table entry match to correctly show neighbor name in unified frr config mgmt mode
* [QOS] Skip showing unnecessary warning message Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
… fs to allow remote user login. (#3700) Improve SONiC disk checker to handle disk full case and mount overlay fs to allow remote user login. This PR depends on DB schema change: sonic-net/sonic-buildimage#21351 What I did Currently disk checker only handle RO disk case, but when disk no free space, remote user also can't login. How I did it Check disk free space and mount overlay fs to allow TACACS login. How to verify it Create big file on device, make device no free space. then login with remote user should success.
* [FC] remove FC delay field Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Currently, in Cisco 8800 chassis, PFCWD is only enabled for front end ports, not on backplane. As we have PFC enabled for backplane ports, need to enable pfcwd there too. What I did. Enable PFCWD for backplane ports. How I did it Include backplane ports into the port list to be enabled for pfcwd How to verify it manually copied the file to device, and run pfcwd start_default.
This reverts commit 2866ccd.
…tion (#3763) What I did Added the options -a and --all to scripts/vnet_route_check.py. Both options are equivalent. If none of them is provided, then when finding the VNET routes that are in APP DB but not in ASIC DB, we will ignore routes in APP DB that are not active. Mock tests in tests/test_vnet_route_check.py are added to test this behavior. How I did it If -a and --all are not provided, we first filter routes in APP DB to find active routes and then check which active routes are not in ASIC DB. The status of each route is retrieved from STATE DB. If a route is not found in STATE DB, then it is considered to be active. How to verify it You can verify the behavior by running mock tests in tests/test_vnet_route_check.py, or by manually running the vnet_route_check.py script on a DUT.
* Add golden config check * fix format * Addressing comment. * fix tests. * fix format * fix common function * Fix UT * fix test * reset mock * fix ut * fix ut * remove unnecessary test * Add negative test * fix format * remove empty dict check * Add none check * fix condition * fix invalid golden
…nt units (#4359) <!-- Please make sure you've read and understood our contributing guidelines: https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` ** If this is a bug fix, make sure your description includes "closes #xxxx", "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related issue when the PR is merged. If you are adding/modifying/removing any command or utility script, please also make sure to add/modify/remove any unit tests from the tests directory as appropriate. If you are modifying or removing an existing 'show', 'config' or 'sonic-clear' subcommand, or you are adding a new subcommand, please make sure you also update the Command Line Reference Guide (doc/Command-Reference.md) to reflect your changes. Please provide the following information: --> Propagating sonic-net/sonic-host-services#320 for managing `generated/transient` units by SONiC Package Manager #### What I did Fixed application upgrade flow for Multi ASIC platforms #### How I did it Handled `systemctl` `enabled/disabled` action for `generated/transient` units #### How to verify it 1. Add app repository ``` sudo sonic-package-manager repository add <app> <repo_url> --default-reference=1.0.0 ``` 2. Install app v1.0.0 ``` sudo sonic-package-manager install <app>==1.0.0 -y ``` 3. Enable the feature and wait for container to come up ``` sudo config feature state <app> enabled ``` 4. Upgrade to v2.0.0 - THIS triggers the bug ``` sudo sonic-package-manager install -y <app>==2.0.0 ``` #### Previous command output (if the output of a command-line utility has changed) SHELL: ``` root@sonic:/home/admin# sudo sonic-package-manager install -y cpu-report==2.0.0 Execute systemctl action stop on cpu-report service Execute systemctl action disable on cpu-report service removed /usr/lib/systemd/system/cpu-report.service removed /usr/local/bin/cpu-report.sh removed /usr/bin/cpu-report.sh removed /etc/sonic/cpu-report_reconcile removed /etc/systemd/system/cpu-report.service.d generated /usr/bin/cpu-report.sh generated /usr/local/bin/cpu-report.sh generated /usr/lib/systemd/system/cpu-report.service cpu-report entry is added to AUTO_TECHSUPPORT_FEATURE table Execute systemctl action enable on cpu-report service Failed to enable unit: Unit /run/systemd/generator/cpu-report.service is transient or generated cpu-report entry is added to AUTO_TECHSUPPORT_FEATURE table removed /usr/lib/systemd/system/cpu-report.service removed /usr/local/bin/cpu-report.sh removed /usr/bin/cpu-report.sh removed /etc/sonic/cpu-report_reconcile removed /etc/systemd/system/cpu-report.service.d generated /usr/bin/cpu-report.sh generated /usr/local/bin/cpu-report.sh generated /usr/lib/systemd/system/cpu-report.service Execute systemctl action enable on cpu-report service Failed to enable unit: Unit /run/systemd/generator/cpu-report.service is transient or generated error: failed in rollback: Failed to execute "['systemctl', 'enable', 'cpu-report']" Failed to install cpu-report==2.0.0: Failed to upgrade cpu-report: Failed to execute "['systemctl', 'enable', 'cpu-report']" ``` SYSLOG: ``` 2026 Feb 10 14:31:39.193944 sonic WARNING systemctl enable for generated/transient unit cpu-report ``` #### New command output (if the output of a command-line utility has changed) SHELL: ``` root@sonic:/home/admin# sonic-package-manager install cpu-report==2.0.0 -y Execute systemctl action stop on cpu-report service Execute systemctl action disable on cpu-report service warning: Skipping systemctl disable for generated/transient unit cpu-report removed /usr/lib/systemd/system/cpu-report.service removed /usr/local/bin/cpu-report.sh removed /usr/bin/cpu-report.sh removed /etc/sonic/cpu-report_reconcile removed /etc/systemd/system/cpu-report.service.d generated /usr/bin/cpu-report.sh generated /usr/local/bin/cpu-report.sh generated /usr/lib/systemd/system/cpu-report.service cpu-report entry is added to AUTO_TECHSUPPORT_FEATURE table Execute systemctl action enable on cpu-report service warning: Skipping systemctl enable for generated/transient unit cpu-report Execute systemctl action start on cpu-report service ``` SYSLOG: ``` 2026 Feb 10 16:18:55.531288 sonic INFO sonic-package-manager: Execute systemctl action disable on cpu-report service 2026 Feb 10 16:18:55.541984 sonic WARNING sonic-package-manager: Skipping systemctl disable for generated/transient unit cpu-report 2026 Feb 10 16:18:55.542262 sonic INFO sonic-package-manager: removed /usr/lib/systemd/system/cpu-report.service 2026 Feb 10 16:18:55.542481 sonic INFO sonic-package-manager: removed /usr/local/bin/cpu-report.sh 2026 Feb 10 16:18:55.542676 sonic INFO sonic-package-manager: removed /usr/bin/cpu-report.sh 2026 Feb 10 16:18:55.542861 sonic INFO sonic-package-manager: removed /etc/sonic/cpu-report_reconcile 2026 Feb 10 16:18:55.543674 sonic INFO sonic-package-manager: removed /etc/systemd/system/cpu-report.service.d 2026 Feb 10 16:18:56.190582 sonic INFO sonic-package-manager: generated /usr/bin/cpu-report.sh 2026 Feb 10 16:18:56.201750 sonic INFO sonic-package-manager: generated /usr/local/bin/cpu-report.sh 2026 Feb 10 16:18:56.218809 sonic INFO sonic-package-manager: generated /usr/lib/systemd/system/cpu-report.service 2026 Feb 10 16:18:56.894541 sonic INFO sonic-package-manager: cpu-report entry is added to AUTO_TECHSUPPORT_FEATURE table 2026 Feb 10 16:18:56.894689 sonic INFO sonic-package-manager: Execute systemctl action enable on cpu-report service 2026 Feb 10 16:18:56.905258 sonic WARNING sonic-package-manager: Skipping systemctl enable for generated/transient unit cpu-report 2026 Feb 10 16:18:56.905380 sonic INFO sonic-package-manager: Execute systemctl action start on cpu-report service ``` **Note:** * system logger fix also resolves the issue of missing application tag and very first word of the message #### A picture of a cute animal (not mandatory but encouraged) ``` .---. .----------- / \ __ / ------ / / \( )/ ----- ////// ' \/ ` --- //// / // : : --- // / / /` '-- // //..\\ ====UU====UU==== '//||\\` ''`` ``` Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
## Description Currently `generate_sysinfo()` unconditionally overwrites `mac`, `platform`, and `asic_id` in the golden config with values from the running config or hardware detection. This prevents users from intentionally overriding these fields via `config load_minigraph -o` or `config override-config-table`. ### Problem When a user provides an explicit `mac` in their golden config (e.g., for MAC migration, virtual environments, or chassis scenarios), it gets silently overwritten before the override is applied: 1. `load_minigraph` calls `sonic-cfggen -H` which sets MAC from hardware 2. `override_config_by()` → `override_config_table()` → `generate_sysinfo()` 3. `generate_sysinfo()` reads MAC from running config (set by step 1) and **unconditionally overwrites** the golden config's MAC ### Fix Change `generate_sysinfo()` to only backfill `mac`, `platform`, and `asic_id` when they are **not explicitly present** in the golden config. If the golden config provides these values, they are preserved. This maintains backward compatibility — golden configs without these fields (the common case) still get auto-populated from hardware/running config. ## Motivation and Context Users cannot override `DEVICE_METADATA.localhost.mac` via golden config override, even when explicitly specified. ## How Has This Been Tested? Code review and manual verification of the logic change. ## Type of change - [x] Bug fix Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
Manual backport due to cherry-pick conflicts Signed-off-by: Peter <peterbailey@arista.com>
What I did
It supports to detect the route with offload False or without offload, which can capture queued route, because queued route doesn’t have offload, but for rejected route, the offload is True. It can’t detect the rejected route.
So, we add a new detection check for the value of key failed for route entries, which can cover both rejected and queued routes.
It will help to detect rejected route and queued route on device.
How I did it
Append failed route prefix into failed list, if it's not empty, script will print error message into syslog
# Check for failed state
if entry.get('failed', False):
failed_rt.append(route_prefix)
How to verify it
For rejected route:
The output of route_check.py
Some routes have failed state in FRR : ['0.0.0.0/0', '192.168.128.0/25', '192.168.128.128/25', '192.168.136.0/25', '192.168.136.128/25', '192.168.144.0/25', '192.168.144.128/25', '192.168.152.0/25', '192.168.152.128/25', '192.168.160.0/25', '192.168.160.128/25', '192.168.168.0/25', '192.168.168.128/25', '192.168.176.0/25', '192.168.176.128/25', '192.168.184.0/25', '192.168.184.128/25', '192.168.192.0/25', '192.168.192.128/25', '192.168.200.0/25', '192.168.200.128/25', '192.168.208.0/25', '192.168.208.128/25', '192.168.216.0/25', '192.168.216.128/25', '192.168.224.0/25', '192.168.224.128/25', '192.168.232.0/25', '192.168.232.128/25', '192.168.240.0/25', '192.168.240.128/25', '192.168.248.0/25', '192.168.248.128/25', '192.169.0.0/25', '192.169.0.128/25', '192.169.104.0/25', '192.169.104.128/25', '192.169.112.0/25', '192.169.112.128/25', '192.169.120.0/25', '192.169.120.128/25', '192.169.128.0/25', '192.169.128.128/25', '192.169.136.0/25', '192.169.136.128/25', '192.169.144.0/25', '192.16
Failure results: {{
"": {
"failed_FRR_routes": [
"0.0.0.0/0",
"192.168.128.0/25",
"192.168.128.128/25",
"192.168.136.0/25",
"192.168.136.128/25",
"192.168.144.0/25",
"192.168.144.128/25",
"192.168.152.0/25",
"192.168.152.128/25",
"192.168.160.0/25",
"192.168.160.128/25",
"192.168.168.0/25",
"192.168.168.128/25",
"192.168.176.0/25",
"192.168.176.128/25",
"192.168.184.0/25",
"192.168.184.128/25",
"192.168.192.0/25",
"192.168.192.128/25",
"192.168.200.0/25",
"192.168.200.128/25",
"192.168.208.0/25",
"192.168.208.128/25",
"192.168.216.0/25",
"192.168.216.128/25",
"192.168.224.0/25",
"192.168.224.128/25",
"192.168.232.0/25",
"192.168.232
Failed. Look at reported mismatches above
For rejected route:
the output of route_check.py
Some routes have failed state in FRR : ['0.0.0.0/0', '192.168.128.0/25', '192.168.128.128/25', '192.168.136.0/25', '192.168.136.128/25', '192.168.144.0/25', '192.168.144.128/25', '192.168.152.0/25', '192.168.152.128/25', '192.168.160.0/25', '192.168.160.128/25', '192.168.168.0/25', '192.168.168.128/25', '192.168.176.0/25', '192.168.176.128/25', '192.168.184.0/25', '192.168.184.128/25', '192.168.192.0/25', '192.168.192.128/25', '192.168.200.0/25', '192.168.200.128/25', '192.168.208.0/25', '192.168.208.128/25', '192.168.216.0/25', '192.168.216.128/25', '192.168.224.0/25', '192.168.224.128/25', '192.168.232.0/25', '192.168.232.128/25', '192.168.240.0/25', '192.168.240.128/25', '192.168.248.0/25', '192.168.248.128/25', '192.169.0.0/25', '192.169.0.128/25', '192.169.104.0/25', '192.169.104.128/25', '192.169.112.0/25', '192.169.112.128/25', '192.169.120.0/25', '192.169.120.128/25', '192.169.128.0/25', '192.169.128.128/25', '192.169.136.0/25', '192.169.136.128/25', '192.169.144.0/25', '192.16
Failure results: {{
"": {
"missed_FRR_routes": [
{
"destSelected": true,
"distance": 20,
"failed": true,
"installedNexthopGroupId": 39146,
"internalFlags": 8,
"internalNextHopActiveNum": 4,
"internalNextHopNum": 4,
"internalStatus": 168,
"metric": 0,
"nexthopGroupId": 39146,
"nexthops": [
{
"active": true,
"afi": "ipv4",
"fib": true,
"flags": 3,
"interfaceIndex": 6,
"interfaceName": "PortChannel102",
"ip": "10.0.0.1",
"rmapSource": "10.1.0.32",
"weight": 1
},
{
"active": true,
Failed. Look at reported mismatches above
Signed-off-by: Priyansh Tratiya <ptratiya@microsoft.com>
Co-authored-by: Zhaohui Sun <94606222+ZhaohuiS@users.noreply.github.com>
…IPv4 or IPv6 route (#4177) (#4363) * remove early return * add unit tests for IP4 only and IP6 only scenarios * fix incorrect UT data format * fix UT data * add early return for non-json case --------- Signed-off-by: BYGX-wcr <wcr@live.cn> Signed-off-by: Priyansh Tratiya <ptratiya@microsoft.com> Co-authored-by: Changrong Wu <wcr@live.cn>
…sessions (#4368) ## What I did Fixed is_port_mirror_capability_supported() so that ERSPAN sessions (direction=None) are not blocked by the PORT_INGRESS_MIRROR_CAPABLE / PORT_EGRESS_MIRROR_CAPABLE capability check. ## Root cause PR #4089 added a capability check that reads PORT_INGRESS_MIRROR_CAPABLE and PORT_EGRESS_MIRROR_CAPABLE from STATE_DB SWITCH_CAPABILITY|switch. For ERSPAN sessions, direction=None was treated as 'check both', but: 1. These capability flags only apply to SPAN (port mirror) sessions, not ERSPAN (which uses source/destination IPs, not ports) 2. Platforms that don't populate these STATE_DB keys return None, which != 'true', so the function incorrectly returns False (unsupported) PR #4159 partially addressed the multi-ASIC namespace issue but did not fix the fundamental problem for ERSPAN sessions with no src/dst port specified. ## How I fixed it - **For ERSPAN (direction=None)**: Return True immediately. PORT_INGRESS/EGRESS_MIRROR_CAPABLE does not apply to ERSPAN sessions. - **For SPAN (direction != None)**: Treat absent STATE_DB key (None value) as 'supported' for backward compatibility with platforms that don't populate SWITCH_CAPABILITY table entries. ## How to verify it Unit tests updated in ests/config_mirror_session_test.py: - Added Test 4 to verify behavior when STATE_DB keys are absent (all return True) - Updated Test 2 and Test 3 assertions for direction=None to expect True Fixes: sonic-net/sonic-mgmt#21690 Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
<!-- Please make sure you've read and understood our contributing guidelines: https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` ** If this is a bug fix, make sure your description includes "closes #xxxx", "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related issue when the PR is merged. If you are adding/modifying/removing any command or utility script, please also make sure to add/modify/remove any unit tests from the tests directory as appropriate. If you are modifying or removing an existing 'show', 'config' or 'sonic-clear' subcommand, or you are adding a new subcommand, please make sure you also update the Command Line Reference Guide (doc/Command-Reference.md) to reflect your changes. Please provide the following information: --> #### What I did Fixed `sonic-installer install` failing during `migrate_sonic_packages()` when `/etc/resolv.conf` in the new image is a symlink to `/run/resolvconf/resolv.conf`. The failure occurs because the `cp` command at `main.py:386` follows the symlink through the overlay mount. Since the symlink target is an absolute path, it resolves to the **host's** `/run/resolvconf/resolv.conf` — the same file as the source. `cp` detects same source and destination inode and exits with: ``` cp: '/etc/resolv.conf' and '/tmp/image-<version>-fs/etc/resolv.conf' are the same file ``` This was introduced by the `build_debian.sh` change that replaced `touch` with `ln -sf /run/resolvconf/resolv.conf` for `/etc/resolv.conf` in the image filesystem. #### How I did it Check whether `/etc/resolv.conf` in the chroot is a symlink or a regular file, and handle each case appropriately: - **Symlink** (images with `resolvconf` package installed): Read the symlink target via `readlink` (e.g. `/run/resolvconf/resolv.conf`), then create the target file inside the chroot with the host's DNS content. The symlink then resolves correctly inside the chroot. This avoids touching the symlink itself, so the overlay upper dir's `etc/resolv.conf` is never modified and the new image boots with the symlink intact. No cleanup is needed — the target lives under `/run`, which is a tmpfs recreated at every boot. - **Regular file** (images without `resolvconf`, or where the build process explicitly creates a regular file via `touch`): Overwrite directly with the host's DNS content. No backup/restore is needed — the original file is empty (cleared during build), and after reboot the `resolv-config` service reconfigures DNS from CONFIG_DB. The previous backup-overwrite-restore pattern has been removed since it is unnecessary in both cases. #### How to verify it 1. Start with a switch running an image where `/etc/resolv.conf` is a symlink: ```bash # Confirm symlink exists ls -la /etc/resolv.conf # Expected: /etc/resolv.conf -> /run/resolvconf/resolv.conf # Ensure only one image is installed (clean state) sudo sonic-installer list # If the target image is already present, remove it: sudo sonic-installer remove <target-image> -y ``` 2. Run `sonic-installer install` with an image that also has the symlink: ```bash sudo sonic-installer install <image-path> -y ``` 3. Verify: - Installation completes without `cp: ... are the same file` error - `sonic-installer list` shows the new image as default - After reboot, `/etc/resolv.conf` is still a symlink to `/run/resolvconf/resolv.conf` - DNS resolution works Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com> #### Previous command output (if the output of a command-line utility has changed) #### New command output (if the output of a command-line utility has changed)
#### What I did On DPU platforms (pensando, nvidia-bluefield), the `routeCheck` monit service is removed from the monit config (sonic-net/sonic-buildimage#26214) since route checking is not applicable on DPUs. However, `config reload` unconditionally tries to `monit unmonitor/monitor routeCheck`, causing: ``` Disabling container and routeCheck monitoring ... There is no service named "routeCheck" ``` Added a `_monit_service_exists()` helper and guarded `routeCheck` monit calls in `_stop_services()` and `_restart_services()` so they only execute when the service is present in monit. #### How I did it - Added `_monit_service_exists(service)` helper that runs `sudo monit summary <service>` and returns `True`/`False` based on exit code. - In `_stop_services()`: only call `monit unmonitor routeCheck` if `_monit_service_exists("routeCheck")` returns `True`. - In `_restart_services()`: only call `monit monitor routeCheck` and `_wait_for_monit_service_monitored("routeCheck")` if the service exists. #### How to verify it 1. On a DPU platform (nvidia-bluefield / pensando) with routeCheck removed from monit config: - Run `config reload -y` — should complete without `"There is no service named routeCheck"` error 2. On a non-DPU platform where routeCheck exists in monit: - Run `config reload -y` — routeCheck monitoring should still be disabled/re-enabled as before #### Previous command output (if the output of a command-line utility has changed) On DPU with routeCheck removed from monit: ``` root@sonic-dpu-3:/home/admin# config reload -y Acquired lock on /etc/sonic/reload.lock Disabling container and routeCheck monitoring ... There is no service named "routeCheck" Released lock on /etc/sonic/reload.lock ``` #### New command output (if the output of a command-line utility has changed) On DPU with routeCheck removed from monit: ``` root@sonic-dpu-3:/home/admin# config reload -y Acquired lock on /etc/sonic/reload.lock Disabling container monitoring ... ... Enabling container monitoring ... Released lock on /etc/sonic/reload.lock ``` Companion PR: sonic-net/sonic-buildimage#26214 Fixes: sonic-net/sonic-buildimage#26225 Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
<!-- Please make sure you've read and understood our contributing guidelines: https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` ** If this is a bug fix, make sure your description includes "closes #xxxx", "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related issue when the PR is merged. If you are adding/modifying/removing any command or utility script, please also make sure to add/modify/remove any unit tests from the tests directory as appropriate. If you are modifying or removing an existing 'show', 'config' or 'sonic-clear' subcommand, or you are adding a new subcommand, please make sure you also update the Command Line Reference Guide (doc/Command-Reference.md) to reflect your changes. Please provide the following information: --> #### What I did Update `show bfd summary` to aggregate BFD sessions across all ASIC namespaces when no `-n <namespace>` is provided. Extend multi-ASIC BFD tests and expected output for the all-ASIC summary. #### How I did it #### How to verify it Run the `show bfd summary` with the change can give you all BFD sessions across all ASIC namespaces. BTW, running the classic `sudo ip netns exec <namespace> show bfd summary` will still give you the BFD sessions of that namespace, so there's no regression ``` admin@dut:~$ sudo ip netns exec asic1 show bfd summary Total number of BFD sessions: 220 Peer Addr Interface Vrf State Type Local Addr TX Interval RX Interval Multiplier Multihop Local Discriminator ----------------------------- ----------- ------- ------- ------------ ----------------------------- ------------- ------------- ------------ ---------- --------------------- 2603:10b0:607:7:0:a:eb64:8700 default default Up async_active 2603:10b0:607:7:0:a:eb64:8b00 50 50 3 false 127 2603:10b0:607:7:0:a:eb62:8900 default default Up async_active 2603:10b0:607:7:0:a:eb62:8b00 50 50 3 false 104 2603:10b0:607:7:0:a:eb64:8300 default default Up async_active 2603:10b0:607:7:0:a:eb64:8b00 50 50 3 false 211 10.235.96.10 default default Up async_active 10.235.96.8 50 50 3 false 18 10.235.96.138 default default Up async_active 10.235.96.139 50 50 3 false 25 2603:10b0:607:7:0:a:eb62:c00 default default Up async_active 2603:10b0:607:7:0:a:eb62:800 50 50 3 false 108 2603:10b0:607:7:0:a:eb60:8700 default default Up async_active 2603:10b0:607:7:0:a:eb60:8b00 50 50 3 false 73 2603:10b0:607:7:0:a:eb61:8c00 default default Up async_active 2603:10b0:607:7:0:a:eb61:8b00 50 50 3 false 93 ... ``` #### Previous command output (if the output of a command-line utility has changed) ``` admin@dut:~$ show bfd summary Total number of BFD sessions: 0 Peer Addr Interface Vrf State Type Local Addr TX Interval RX Interval Multiplier Multihop Local Discriminator ----------- ----------- ----- ------- ------ ------------ ------------- ------------- ------------ ---------- --------------------- ``` #### New command output (if the output of a command-line utility has changed) ``` admin@dut:~$ show bfd summary Total number of BFD sessions: 660 Peer Addr Interface Vrf State Type Local Addr TX Interval RX Interval Multiplier Multihop Local Discriminator ------------------- ----------- ------- ------- ------------ ------------------- ------------- ------------- ------------ ---------- --------------------- 20.0.13.6 default default Up async_active 20.0.13.1 50 50 3 false 27 2603:10e2:400:2::7 default default Up async_active 2603:10e2:400:2::1 50 50 3 false 106 20.0.12.6 default default Up async_active 20.0.12.1 50 50 3 false 136 2603:10e2:400:14::9 default default Up async_active 2603:10e2:400:14::1 50 50 3 false 94 2603:10e2:400:6::5 default default Up async_active 2603:10e2:400:6::1 50 50 3 false 159 2603:10e2:400:14::7 default default Up async_active 2603:10e2:400:14::1 50 50 3 false 92 2603:10e2:400:11::8 default default Up async_active 2603:10e2:400:11::1 50 50 3 false 73 20.0.10.9 default default Up async_active 20.0.10.1 50 50 3 false 11 20.0.10.3 default default Up async_active 20.0.10.1 50 50 3 false 132 20.0.5.2 default default Up async_active 20.0.5.1 50 50 3 false 143 20.0.11.7 default default Up async_active 20.0.11.1 50 50 3 false 15 2603:10e2:400:1::2 default default Up async_active 2603:10e2:400:1::1 50 50 3 false 155 20.0.13.3 default default Up async_active 20.0.13.1 50 50 3 false 137 2603:10e2:400:14::2 default default Up async_active 2603:10e2:400:14::1 50 50 3 false 89 20.0.1.9 default default Up async_active 20.0.1.1 50 50 3 false 5 20.0.5.5 default default Up async_active 20.0.5.1 50 50 3 false 145 2603:10e2:400:5::3 default default Up async_active 2603:10e2:400:5::1 50 50 3 false 158 20.0.14.6 default default Up async_active 20.0.14.1 50 50 3 false 33 ... ``` Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
What I did The Vrf Pool is being increased to 4096 by PR : sonic-net/sonic-swss#4168 Therefore the mgmt vrf tabled id is being moved to 6000. Updating the "show mgmt-vrf routes" for the same. Signed-off-by: ypcisco <ypcisco@gmail.com>
…ne (#4424) <!-- Please make sure you've read and understood our contributing guidelines: https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` ** If this is a bug fix, make sure your description includes "closes #xxxx", "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related issue when the PR is merged. If you are adding/modifying/removing any command or utility script, please also make sure to add/modify/remove any unit tests from the tests directory as appropriate. If you are modifying or removing an existing 'show', 'config' or 'sonic-clear' subcommand, or you are adding a new subcommand, please make sure you also update the Command Line Reference Guide (doc/Command-Reference.md) to reflect your changes. Please provide the following information: --> #### What I did When config reload is done, BGP_PEER_CONFIGURED_TABLE is not cleared of pervious entries of BGP peers. Added logic to clean that up during config reload #### How I did it Connected to state db and cleared up all keys starting with BGP_PEER_CONFIGURED_TABLE|* #### How to verify it delete a peer from config_db.json and perform config reload. Peer should be deleted from state db table BGP_PEER_CONFIGURED_TABLE Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com> #### Previous command output (if the output of a command-line utility has changed) #### New command output (if the output of a command-line utility has changed)
<!-- Please make sure you've read and understood our contributing guidelines: https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` ** If this is a bug fix, make sure your description includes "closes #xxxx", "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related issue when the PR is merged. If you are adding/modifying/removing any command or utility script, please also make sure to add/modify/remove any unit tests from the tests directory as appropriate. If you are modifying or removing an existing 'show', 'config' or 'sonic-clear' subcommand, or you are adding a new subcommand, please make sure you also update the Command Line Reference Guide (doc/Command-Reference.md) to reflect your changes. Please provide the following information: --> #### What I did sonic-net/sonic-buildimage#24829 Fixed a bug in the "show fabric isolation" command output. When the fabric port sees CRC/FEC-Uncorrectable error, the Fabric monitor feature Isolates the port and sets the ISOALTED=1 and also AUTO_ISOLATED=1 in the STATE_DB for that port in the FABRIC_PORT_TABLE. The field ISOLATED and CONFIG_ISOLATED are always present for all the fabric ports in FABRIC_PORT_TABLE. However the field AUTO_ISOLATED is not present always and added only when the port is auto ISOLATED. Due to the bug in the FabricIsolation cli script, the Auto Isolated is shown 1 for all the ports printed in the show command after the port which is actually isolated. In the output shown below, the port 165 is isolated, however Auto-isolated is shown 1 for all the ports after 165 which is wrong. <img width="415" height="245" alt="image" src="https://github.com/user-attachments/assets/26df22fd-615f-44c7-9b9a-3e7e44bef735" /> #### How I did it Initialized the variable correctly inside the loop. #### How to verify it Induced the CRC error for one of the port and after that port is isolated, verified the "show fabric isolation" output shows the correct output. <img width="342" height="239" alt="image" src="https://github.com/user-attachments/assets/749d13c0-9892-4f2c-af9a-70a49d7ee2e5" /> Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com> #### Previous command output (if the output of a command-line utility has changed) #### New command output (if the output of a command-line utility has changed)
<!-- Please make sure you've read and understood our contributing guidelines: https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` ** If this is a bug fix, make sure your description includes "closes #xxxx", "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related issue when the PR is merged. If you are adding/modifying/removing any command or utility script, please also make sure to add/modify/remove any unit tests from the tests directory as appropriate. If you are modifying or removing an existing 'show', 'config' or 'sonic-clear' subcommand, or you are adding a new subcommand, please make sure you also update the Command Line Reference Guide (doc/Command-Reference.md) to reflect your changes. Please provide the following information: --> #### What I did Doing a reboot on SB enabled systems and check dmesg, ``` root@sonic/home/admin# dmesg -W [33875.287197] ima: impossible to appraise a kernel image without a file descriptor; try using kexec_file_load syscall. ``` This is fixed in fast-reboot/warm-reboot script here sonic-net/sonic-utilities@317e649 but was missed in reboot script #### How I did it Use the -a argument with kexec ``` -a, --kexec-syscall-auto Use file based syscall for kexec and fall back to the compatibility syscall when file based syscall is not supported or the kernel did not understand the image ``` Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com> #### How to verify it #### Previous command output (if the output of a command-line utility has changed) #### New command output (if the output of a command-line utility has changed)
<!-- Please make sure you've read and understood our contributing guidelines: https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` ** If this is a bug fix, make sure your description includes "closes #xxxx", "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related issue when the PR is merged. If you are adding/modifying/removing any command or utility script, please also make sure to add/modify/remove any unit tests from the tests directory as appropriate. If you are modifying or removing an existing 'show', 'config' or 'sonic-clear' subcommand, or you are adding a new subcommand, please make sure you also update the Command Line Reference Guide (doc/Command-Reference.md) to reflect your changes. Please provide the following information: --> #### What I did For `fwutil show` command which displays the usage/help message reduce the time taken by lazily importing PlatformDataProvider. This reduced the average time taken by ~50%. #### How I did it Use a singleton PlatformDataProvider in fwutil/main.py #### How to verify it Before the change ``` Running 'fwutil show' 10 times (gap 5s)... Run 1: 972 ms Run 2: 1058 ms Run 3: 948 ms Run 4: 1213 ms Run 5: 1507 ms Run 6: 1235 ms Run 7: 1553 ms Run 8: 1037 ms Run 9: 1000 ms Run 10: 1037 ms ---- fwutil show stats ---- Avg: 1156 ms Min: 948 ms Max: 1553 ms ``` After the change ``` Running 'fwutil show' 10 times (gap 5s)... Run 1: 496 ms Run 2: 482 ms Run 3: 466 ms Run 4: 445 ms Run 5: 482 ms Run 6: 463 ms Run 7: 780 ms Run 8: 662 ms Run 9: 653 ms Run 10: 659 ms ---- fwutil show stats ---- Avg: 558 ms Min: 445 ms Max: 780 ms ``` Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com> #### Previous command output (if the output of a command-line utility has changed) #### New command output (if the output of a command-line utility has changed)
…ndle OID update (#4429) <!-- Please make sure you've read and understood our contributing guidelines: https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` ** If this is a bug fix, make sure your description includes "closes #xxxx", "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related issue when the PR is merged. If you are adding/modifying/removing any command or utility script, please also make sure to add/modify/remove any unit tests from the tests directory as appropriate. If you are modifying or removing an existing 'show', 'config' or 'sonic-clear' subcommand, or you are adding a new subcommand, please make sure you also update the Command Line Reference Guide (doc/Command-Reference.md) to reflect your changes. Please provide the following information: --> #### What I did - Updated port_speed_change_validator to restart the telemetry container when a port speed is changed via GCU. - Added unit tests to verify the correct behavior for port speed changes and no-change scenarios. Microsoft ADO: 36692456 This is short term plan to restart telemetry in GCU. Long term plan is telemetry reload automatically. Microsoft ADO: 36496651 #### How I did it - Modified the validator to detect any port speed change and call _service_restart("telemetry") when detected, ensuring telemetry restarts and picks up the new port OID. - Implemented tests using mocks for subprocess calls to simulate and verify the restart logic. #### How to verify it - Run the unit tests in service_validator_test.py to confirm that telemetry is restarted only when a port speed changes, and not for unrelated changes. - Check that all tests pass, indicating correct validator and restart behavior. - Test bed ```shell stli@STG02-0101-0400-02T2-lc03:~$ show int sta Interface Lanes Speed MTU FEC Alias Vlan Oper Admin Type Asym PFC --------------- --------------------------------------- ------- ----- ----- ----------------------- --------------- ------ ------- ------ ---------- ... Ethernet64 264,265,266,267,268,269,270,271 400G 9100 rs FourHundredGigE0/3/0/8 routed down down N/A off stli@STG02-0101-0400-02T2-lc03:~$ cat gcu_patch.json [ { "op": "add", "path": "/asic0/PORT/Ethernet64", "value": { "index": "8", "description": "STG02-0101-0110-17T1:etp12", "alias": "HundredGigE0/3/0/8", "pfc_asym": "off", "fec": "rs", "speed": "100000", "mtu": "9100", "tpid": "0x8100", "lanes": "264,265,266,267", "asic_port_name": "Eth64-ASIC0", "role": "Ext", "admin_status": "up" } } ] stli@STG02-0101-0400-02T2-lc03:~$ sudo config apply-patch gcu_patch.json Patch Applier: asic0: Patch application starting. Patch Applier: asic0: Patch: [{"op": "add", "path": "/PORT/Ethernet64", "value": {"index": "8", "description": "STG02-0101-0110-17T1:etp12", "alias": "HundredGigE0/3/0/8", "pfc_asym": "off", "fec": "rs", "speed": "100000", "mtu": "9100", "tpid": "0x8100", "lanes": "264,265,266,267", "asic_port_name": "Eth64-ASIC0", "role": "Ext", "admin_status": "up"}}] Patch Applier: asic0 getting current config db. Patch Applier: asic0: simulating the target full config after applying the patch. Patch Applier: asic0: validating all JsonPatch operations are permitted on the specified fields Patch Applier: asic0: validating target config does not have empty tables, since they do not show up in ConfigDb. Patch Applier: asic0: sorting patch updates. Patch Applier: The asic0 patch was converted into 6 changes: Patch Applier: asic0: applying 6 changes in order: Patch Applier: failure_prs.log skip_prs.log [{"op": "remove", "path": "/CABLE_LENGTH/AZURE/Ethernet64"}] Patch Applier: failure_prs.log skip_prs.log [{"op": "replace", "path": "/PORT/Ethernet64/description", "value": "STG02-0101-0110-17T1:etp12"}] Patch Applier: failure_prs.log skip_prs.log [{"op": "replace", "path": "/PORT/Ethernet64/speed", "value": "100000"}] > /usr/local/lib/python3.11/dist-packages/generic_config_updater/services_validator.py(71)port_speed_change_validator() -> return _service_restart("telemetry") (Pdb) old_speed '400000' (Pdb) upd_speed '100000' (Pdb) c Job for telemetry.service failed because start of the service was attempted too often. See "systemctl status telemetry.service" and "journalctl -xeu telemetry.service" for details. To force a start use "systemctl reset-failed telemetry.service" followed by "systemctl start telemetry.service" again. Patch Applier: failure_prs.log skip_prs.log [{"op": "remove", "path": "/PORT/Ethernet64"}] Patch Applier: failure_prs.log skip_prs.log [{"op": "add", "path": "/PORT/Ethernet64", "value": {"index": "8", "description": "STG02-0101-0110-17T1:etp12", "alias": "HundredGigE0/3/0/8", "pfc_asym": "off", "fec": "rs", "speed": "100000", "mtu": "9100", "tpid": "0x8100", "lanes": "264,265,266,267", "asic_port_name": "Eth64-ASIC0", "role": "Ext", "admin_status": "up"}}] Patch Applier: failure_prs.log skip_prs.log [{"op": "add", "path": "/CABLE_LENGTH/AZURE/Ethernet64", "value": "300m"}] Patch Applier: asic0: verifying patch updates are reflected on ConfigDB. Patch Applier: asic0 patch application completed. Patch applied successfully. stli@STG02-0101-0400-02T2-lc03:~$ show int sta Interface Lanes Speed MTU FEC Alias Vlan Oper Admin Type Asym PFC --------------- --------------------------------------- ------- ----- ----- ----------------------- --------------- ------ ------- ------ ---------- ... Ethernet64 264,265,266,267 100G 9100 rs HundredGigE0/3/0/8 routed down up N/A off stli@STG02-0101-0400-02T2-lc03:~$ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES aec6bcdb096d soniccr1.azurecr.io/chassis-port-counter-monitor:20260122 "/usr/local/bin/dock…" 7 days ago Up 7 days chassis-port-counter-monitor 2bb42c2ef60a docker-sonic-telemetry:latest "/usr/local/bin/supe…" 7 weeks ago Up About a minute telemetry 84e4a349cd4f docker-snmp:latest "/usr/bin/docker-snm…" 7 weeks ago Up 7 weeks snmp stli@STG02-0101-0400-02T2-lc03:~$ sudo grep -i "spawned: 'telemetry'" /var/log/syslog 2026 Feb 5 02:08:13.183970 STG02-0101-0400-02T2-lc03 INFO telemetry#supervisord 2026-02-05 02:08:13,183 INFO spawned: 'telemetry' with pid 24 ``` Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com> #### Previous command output (if the output of a command-line utility has changed) #### New command output (if the output of a command-line utility has changed)
#### Why I did it Many tests for various packages use /tmp/tmp.XXXXXXXX or /tmp/tmpi_XXXXX as the temporary file or directory pattern for mktemp. Since the same slave container is used for multiple simultaneous builds, destroying an in-progress build's temporary file or directory will cause those builds to fail. While this has existed for a year, it appears the introduction of Trixie has reordered the builds a bit so that packages using the temp file patterns impacted are built simultaneously. #### How I did it It appears /tmp/tmpx and /tmp/tmpy are the toplevel directories created, clear those. #### How to verify it Run a trixie build with high parallelism. #### Which release branch to backport - [x] 202511 Fixes sonic-net/sonic-buildimage#25424 Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
<!-- Please make sure you've read and understood our contributing guidelines: https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` ** If this is a bug fix, make sure your description includes "closes #xxxx", "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related issue when the PR is merged. If you are adding/modifying/removing any command or utility script, please also make sure to add/modify/remove any unit tests from the tests directory as appropriate. If you are modifying or removing an existing 'show', 'config' or 'sonic-clear' subcommand, or you are adding a new subcommand, please make sure you also update the Command Line Reference Guide (doc/Command-Reference.md) to reflect your changes. Please provide the following information: --> #### Why I did [PR #25398](sonic-net/sonic-buildimage#25398) changed the host's Docker config to use -H fd://. During image installation, sonic-installer copies these host options to run a temporary dockerd inside a chroot. Since systemd isn't running in the chroot to create and handover a socket dockerd crashes, causing sonic-package-manager migrate to fail. #### What I did Sanitized the copied DOCKER_OPTS so the temporary dockerd can start safely without systemd socket activation. #### How I did it - In sonic_installer/main.py (get_docker_opts), replaced fd:// with unix://. - Updated mock arguments in test_sonic_installer.py to verify the sanitization logic. #### How to verify it Run sudo sonic-installer install <image.bin> and verify the installation completes successfully without failing during the package migration step. Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
<!-- Please make sure you've read and understood our contributing guidelines: https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` ** If this is a bug fix, make sure your description includes "closes #xxxx", "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related issue when the PR is merged. If you are adding/modifying/removing any command or utility script, please also make sure to add/modify/remove any unit tests from the tests directory as appropriate. If you are modifying or removing an existing 'show', 'config' or 'sonic-clear' subcommand, or you are adding a new subcommand, please make sure you also update the Command Line Reference Guide (doc/Command-Reference.md) to reflect your changes. Please provide the following information: --> fixes #4292 #### What I did The `generate_dump` script creates a symlink to itself in the dump directory before generating the initial tar archive. However, due to a typo in the cleanup step, the symlink was not correctly removed. As a result, when the dump directory was appended later using `save_to_tar`, the `generate_dump` entry was added a second time. This patch corrects the cleanup target so that `generate_dump` symlink is properly removed, preventing duplicate. #### How I did it Updated the cleanup step in `generate_dump`: From: `$RM $V -f $TARDIR/sonic_dump` To: `$RM $V -f $TARDIR/generate_dump` #### How to verify it 1. Run `show techsupport` on a SONiC target device. 2. Inspect the resulting tar archive in the path `/var/dump`. It should look like `sonic_dump_sonic_20260116_001237.tar.gz`. 3. Confirm that the dump directory should only contain a single `generate_dump`. **Which release branch to backport** - [x] 202305 - [x] 202311 - [x] 202405 - [x] 202411 - [x] 202505 - [x] 202511 - [x] 202605 Signed-off-by: Yi Xu <Yi.Xu@lumentum.com> Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
…ti-ASIC platforms (#4433) <!-- Please make sure you've read and understood our contributing guidelines: https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` ** If this is a bug fix, make sure your description includes "closes #xxxx", "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related issue when the PR is merged. If you are adding/modifying/removing any command or utility script, please also make sure to add/modify/remove any unit tests from the tests directory as appropriate. If you are modifying or removing an existing 'show', 'config' or 'sonic-clear' subcommand, or you are adding a new subcommand, please make sure you also update the Command Line Reference Guide (doc/Command-Reference.md) to reflect your changes. Please provide the following information: --> #### What I did Added multi-ASIC platform support for `show interfaces flap` command #### How I did it - Added `multi_asic_click_options` decorator to support `-n/--namespace` and `-d/--display` options - Updated the command to iterate across all namespaces and aggregate flap data from each ASIC's APP_DB - Added flap mock data to multi-ASIC mock tables (`asic0/appl_db.json`, `asic1/appl_db.json`) - Added multi-ASIC test file `tests/flap_test.py` #### How to verify it ```bash # Run multi-ASIC tests pytest tests/flap_test.py -v # On a multi-ASIC device show interfaces flap show interfaces flap -n asic0 ``` #### Previous command output (if the output of a command-line utility has changed) ```bash admin@sonic:~$ show interfaces flap Interface Flap Count Admin Oper Link Down TimeStamp(UTC) Link Up TimeStamp(UTC) ----------- ------------ ------- ------- -------------------------- ------------------------ etp1 Never Unknown Unknown Never Never etp2 Never Unknown Unknown Never Never etp3 Never Unknown Unknown Never Never etp4 Never Unknown Unknown Never Never ... admin@sonic:~$ show interfaces flap -n asic0 Usage: show interfaces flap [OPTIONS] [INTERFACENAME] Try "show interfaces flap -h" for help. Error: no such option: -n ``` #### New command output (if the output of a command-line utility has changed) ```bash admin@sonic:~$ show interfaces flap Interface Flap Count Admin Oper Link Down TimeStamp(UTC) Link Up TimeStamp(UTC) ----------- ------------ ------- ------ -------------------------- ------------------------ Ethernet0 4097 Up Up Sat Feb 21 11:00:41 2026 Sat Feb 21 11:00:59 2026 Ethernet8 4035 Up Up Sat Feb 21 11:00:41 2026 Sat Feb 21 11:00:59 2026 Ethernet16 4015 Up Up Sat Feb 21 11:01:23 2026 Sat Feb 21 11:01:41 2026 Ethernet24 4019 Up Up Sat Feb 21 11:01:23 2026 Sat Feb 21 11:01:41 2026 ... admin@sonic:~$ show interfaces flap -n asic0 Interface Flap Count Admin Oper Link Down TimeStamp(UTC) Link Up TimeStamp(UTC) ----------- ------------ ------- ------ -------------------------- ------------------------ Ethernet0 4097 Up Up Sat Feb 21 11:00:41 2026 Sat Feb 21 11:00:59 2026 Ethernet8 4035 Up Up Sat Feb 21 11:00:41 2026 Sat Feb 21 11:00:59 2026 Ethernet16 4015 Up Up Sat Feb 21 11:01:23 2026 Sat Feb 21 11:01:41 2026 Ethernet24 4019 Up Up Sat Feb 21 11:01:23 2026 Sat Feb 21 11:01:41 2026 ... ``` Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
Fixes issue sonic-net/sonic-utilities#4375 Currently when using `sudo ip netns exec <namespace> counterpoll <args>` it will ignore the namespace and use the default namespace. This patch fixes that behavior to use namespace that the command is running in. This is how the command worked previously but a regression was introduced breaking this behaviour. <!-- Please make sure you've read and understood our contributing guidelines: https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` ** If this is a bug fix, make sure your description includes "closes #xxxx", "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related issue when the PR is merged. If you are adding/modifying/removing any command or utility script, please also make sure to add/modify/remove any unit tests from the tests directory as appropriate. If you are modifying or removing an existing 'show', 'config' or 'sonic-clear' subcommand, or you are adding a new subcommand, please make sure you also update the Command Line Reference Guide (doc/Command-Reference.md) to reflect your changes. Please provide the following information: --> #### What I did I changed the default value of the -n arg to be the namespace we are running the command in. #### How to verify it I verified that with my change the namespace being run in will be the default namespace chosen. ``` $ sonic-db-cli -n asic0 CONFIG_DB hgetall "FLEX_COUNTER_TABLE|PG_DROP" {'FLEX_COUNTER_STATUS': 'enable'} $ sudo ip netns exec asic0 counterpoll pg-drop disable $ sonic-db-cli -n asic0 CONFIG_DB hgetall "FLEX_COUNTER_TABLE|PG_DROP" {'FLEX_COUNTER_STATUS': 'disable'} ``` I also verified that the command exits gracefully if the user attempts to confuse it with 2 namespaces like so: ``` sudo ip netns exec asic0 counterpoll pg-drop -n asic1 disable Usage: counterpoll pg-drop [OPTIONS] COMMAND [ARGS]... Try 'counterpoll pg-drop --help' for help. Error: Invalid value for '-n' / '--namespace': 'asic1' is not 'asic0'. ``` #### Previous command output (if the output of a command-line utility has changed) ``` $ sonic-db-cli -n asic0 CONFIG_DB hgetall "FLEX_COUNTER_TABLE|PG_DROP" {'FLEX_COUNTER_STATUS': 'enable'} $ sudo ip netns exec asic0 counterpoll pg-drop disable $ sonic-db-cli -n asic0 CONFIG_DB hgetall "FLEX_COUNTER_TABLE|PG_DROP" {'FLEX_COUNTER_STATUS': 'enable'} ``` #### New command output (if the output of a command-line utility has changed) Seen above Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
… match on interface name (#4435) <!-- Please make sure you've read and understood our contributing guidelines: https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` ** If this is a bug fix, make sure your description includes "closes #xxxx", "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related issue when the PR is merged. If you are adding/modifying/removing any command or utility script, please also make sure to add/modify/remove any unit tests from the tests directory as appropriate. If you are modifying or removing an existing 'show', 'config' or 'sonic-clear' subcommand, or you are adding a new subcommand, please make sure you also update the Command Line Reference Guide (doc/Command-Reference.md) to reflect your changes. Please provide the following information: --> #### Why I did config interface ip remove falsely blocks IP removal when the target interface name is a substring of another interface that has a static route. For example, removing the last IP from Ethernet24 fails because "Ethernet24" in "...Ethernet240..." evaluates to True. ``` # Given: static route already exists via Ethernet240 admin@sonic:~$ show ip route vrf all static | grep Ethernet240 S>* 10.1.0.7/32 [1/0] via 18.2.202.1, Ethernet240, weight 1, 1d15h47m # Remove IPv4 so IPv6 becomes the last IP on Ethernet24 admin@sonic:~$ sudo config interface ip remove Ethernet24 10.0.0.44/31 # Try to remove the last IP from Ethernet24 admin@sonic:~$ sudo config interface ip remove Ethernet24 FC00::59/126 Error: Cannot remove the last IP entry of interface Ethernet24. A static ip route is still bound to the RIF. ``` Despite the above error - No static route is bound to Ethernet24 -- the route is on Ethernet240. #### What I did/How I did it Replaced the Python in substring check with a regex word-boundary (\b) match when validating whether a static route references the interface being modified. #### How to verify it Before fix ``` admin@sonic:~$ sudo config interface ip remove Ethernet24 10.0.0.44/31 admin@sonic:~$ sudo config interface ip remove Ethernet24 FC00::59/126 Usage: config interface ip remove [OPTIONS] <interface_name> <ip_addr> Try 'config interface ip remove -h' for help. Error: Cannot remove the last IP entry of interface Ethernet24. A static ip route is still bound to the RIF. ``` After Fix ``` admin@sonic:~$ sudo config interface ip remove Ethernet24 10.0.0.44/31 admin@sonic:~$ sudo config interface ip remove Ethernet24 FC00::59/126 admin@sonic:~$ ``` Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com> #### Previous command output (if the output of a command-line utility has changed) #### New command output (if the output of a command-line utility has changed)
CLI does not allow the empty '' global namespace and this causes MGMT test failures.
And requesting backport for
- [x] 202511
<!--
Please make sure you've read and understood our contributing guidelines:
https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md
** Make sure all your commits include a signature generated with `git commit -s` **
If this is a bug fix, make sure your description includes "closes #xxxx",
"fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related
issue when the PR is merged.
If you are adding/modifying/removing any command or utility script, please also
make sure to add/modify/remove any unit tests from the tests
directory as appropriate.
If you are modifying or removing an existing 'show', 'config' or 'sonic-clear'
subcommand, or you are adding a new subcommand, please make sure you also
update the Command Line Reference Guide (doc/Command-Reference.md) to reflect
your changes.
Please provide the following information:
-->
#### What I did
Added a new function which will include the global namespace as a valid namespace option on multi-asic when the command is not run within a namespace.
#### Previous command output (if the output of a command-line utility has changed)
`counterpoll queue enable`
```
Usage: counterpoll queue [OPTIONS] COMMAND [ARGS]...
Try 'counterpoll queue --help' for help.
Error: Invalid value for '-n' / '--namespace': '' is not one of 'asic0', 'asic1'.
```
#### New command output (if the output of a command-line utility has changed)
(No output, success)
Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
<!-- Please make sure you've read and understood our contributing guidelines: https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` ** If this is a bug fix, make sure your description includes "closes #xxxx", "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related issue when the PR is merged. If you are adding/modifying/removing any command or utility script, please also make sure to add/modify/remove any unit tests from the tests directory as appropriate. If you are modifying or removing an existing 'show', 'config' or 'sonic-clear' subcommand, or you are adding a new subcommand, please make sure you also update the Command Line Reference Guide (doc/Command-Reference.md) to reflect your changes. Please provide the following information: --> #### What I did This PR introduces a new monitoring script to check synchronization of LAG (Link Aggregation Group) IDs between the chassis database and ASIC databases on VOQ chassis line cards. The script is designed to be run by Monit and will alert via syslog when mismatches are detected. #### How I did it Key Changes Added chassis_lag_id_checker script that retrieves and compares LAG IDs from chassis_db and asic_db, reporting mismatches per ASIC namespace Comprehensive test suite with fixtures for mocking Redis dumps and ASIC/device configurations Integration into setup.py for proper installation #### How to verify it test on voq chassis and UT Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com> #### Previous command output (if the output of a command-line utility has changed) #### New command output (if the output of a command-line utility has changed)
<!-- Please make sure you've read and understood our contributing guidelines: https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` ** If this is a bug fix, make sure your description includes "closes #xxxx", "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related issue when the PR is merged. If you are adding/modifying/removing any command or utility script, please also make sure to add/modify/remove any unit tests from the tests directory as appropriate. If you are modifying or removing an existing 'show', 'config' or 'sonic-clear' subcommand, or you are adding a new subcommand, please make sure you also update the Command Line Reference Guide (doc/Command-Reference.md) to reflect your changes. Please provide the following information: --> Fixes #4406 #### What I did Enable express boot support for Marvell Teralynx platform. #### How I did it Updated fast-reboot script (which is soft-linked to express-reboot script) to allow Marvell Teralynx platform for express boot. #### How to verify it Execute "express-reboot" command. #### Previous command output (if the output of a command-line utility has changed) ``` root@sonic:/home/admin# show version SONiC Software Version: SONiC.202511.77-dirty-20260224.183644 SONiC OS Version: 13 Distribution: Debian 13.3 Kernel: 6.12.41+deb13-sonic-amd64 Build commit: 1a2154e68 Build date: Tue Feb 24 19:15:37 UTC 2026 Built by: marvell@cpss-rdanda20-new Platform: x86_64-wistron_sw_to3200k-r0 HwSKU: Wistron_sw_to3200k ASIC: marvell-teralynx ASIC Count: 1 Serial Number: N/A Model Number: N/A Hardware Revision: N/A Uptime: 13:51:48 up 8:37, 1 user, load average: 2.03, 2.43, 2.45 Date: Sun 22 Mar 2026 13:51:48 root@sonic:/home/admin# express-reboot eXpress Boot is not supported root@sonic:/home/admin# ``` #### New command output (if the output of a command-line utility has changed) ``` root@sonic:/home/admin# express-reboot -v Sun Mar 22 02:01:07 PM UTC 2026 Starting express-reboot Sun Mar 22 02:01:11 PM UTC 2026 Checking for active PFC storms... Sun Mar 22 02:01:11 PM UTC 2026 No active PFC storms detected. Safe to proceed with warm-reboot... Sun Mar 22 02:01:13 PM UTC 2026 Loading kernel without secure boot Sun Mar 22 02:01:14 PM UTC 2026 Starting lag_keepalive to send LACPDUs ... Sun Mar 22 02:01:21 PM UTC 2026 Pausing orchagent ... Sun Mar 22 02:01:22 PM UTC 2026 Collecting logs to check ssd health before express-reboot... Sun Mar 22 02:01:22 PM UTC 2026 Stopping aaastatsd.timer ... Sun Mar 22 02:01:22 PM UTC 2026 Stopped aaastatsd.timer ... Sun Mar 22 02:01:22 PM UTC 2026 Stopping featured.timer ... Sun Mar 22 02:01:22 PM UTC 2026 Stopped featured.timer ... Sun Mar 22 02:01:22 PM UTC 2026 Stopping hostcfgd.timer ... Sun Mar 22 02:01:22 PM UTC 2026 Stopped hostcfgd.timer ... Sun Mar 22 02:01:22 PM UTC 2026 Stopping tacacs-config.timer ... Sun Mar 22 02:01:22 PM UTC 2026 Stopped tacacs-config.timer ... Sun Mar 22 02:01:22 PM UTC 2026 Stopping lldp ... Sun Mar 22 02:01:24 PM UTC 2026 Stopped lldp Sun Mar 22 02:01:24 PM UTC 2026 Stopping radv ... Sun Mar 22 02:01:24 PM UTC 2026 Stopped radv Sun Mar 22 02:01:24 PM UTC 2026 Stopping bgp ... Sun Mar 22 02:01:35 PM UTC 2026 Stopped bgp Sun Mar 22 02:01:35 PM UTC 2026 Stopping swss ... Sun Mar 22 02:01:36 PM UTC 2026 Stopped swss Sun Mar 22 02:01:36 PM UTC 2026 Initialize pre-shutdown ... Sun Mar 22 02:01:36 PM UTC 2026 Requesting express boot pre-shutdown ... Sun Mar 22 02:01:36 PM UTC 2026 Waiting for pre-shutdown ... Sun Mar 22 02:01:36 PM UTC 2026 Pre-shutdown succeeded, state: pre-shutdown-succeeded ... Sun Mar 22 02:01:36 PM UTC 2026 Stopping dash-ha ... Sun Mar 22 02:01:36 PM UTC 2026 Stopped dash-ha Sun Mar 22 02:01:36 PM UTC 2026 Stopping teamd ... Sun Mar 22 02:01:36 PM UTC 2026 Stopped teamd Sun Mar 22 02:01:36 PM UTC 2026 Stopping syncd ... Sun Mar 22 02:01:43 PM UTC 2026 Stopped syncd Sun Mar 22 02:01:43 PM UTC 2026 Backing up database ... Successfully copied 99.3kB to /host/warmboot Sun Mar 22 02:01:49 PM UTC 2026 Enabling Watchdog before express-reboot Sun Mar 22 02:01:50 PM UTC 2026 Rebooting with /sbin/kexec -e to SONiC-OS-202511.77-dirty-20260224.183644 ... 0 root@sonic:/home/admin# show reboot-cause User issued 'express-reboot' command [User: root, Time: Sun Mar 22 02:04:38 PM UTC 2026] root@sonic:/home/admin# show warm state name restore_count state ------------- --------------- ----------------------- gearsyncd 1 coppmgrd 1 teamsyncd 1 reconciled fdbsyncd 1 reconciled syncd 1 intfmgrd 1 reconciled teammgrd 1 vxlanmgrd 1 reconciled portsyncd 1 rebootbackend 0 vrrpsyncd 1 reconciled orchagent 1 reconciled vrfmgrd 1 reconciled tunnelmgrd 1 reconciled vlanmgrd 1 reconciled nbrmgrd 1 warm-shutdown 0 warm-shutdown-succeeded neighsyncd 1 reconciled bgp 1 reconciled root@sonic:/home/admin# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 1dbe05c1bc28 docker-snmp:latest "/usr/bin/docker-snm…" 11 days ago Up 19 minutes snmp 251bbf5ea0fe docker-platform-monitor:latest "/usr/bin/docker_ini…" 11 days ago Up 19 minutes pmon 5aa6fb90c2c0 docker-sonic-mgmt-framework:latest "/usr/local/bin/supe…" 11 days ago Up 19 minutes mgmt-framework 0297eb7d4a29 docker-lldp:latest "/usr/bin/docker-lld…" 11 days ago Up 19 minutes lldp 4518a126f278 docker-sonic-gnmi:latest "/usr/local/bin/supe…" 11 days ago Up 19 minutes gnmi bdc58b171baf docker-eventd:latest "/usr/local/bin/supe…" 11 days ago Up 24 minutes eventd 16f2c5ac43e8 docker-router-advertiser:latest "/usr/bin/docker-ini…" 11 days ago Up 24 minutes radv fad1c32aaa02 docker-syncd-mrvl-teralynx-rpc:latest "/usr/local/bin/supe…" 11 days ago Up 24 minutes syncd d8c982fefad6 docker-teamd:latest "/usr/local/bin/supe…" 11 days ago Up 24 minutes teamd 22e97f1a5b45 docker-sysmgr:latest "/usr/local/bin/supe…" 11 days ago Up 24 minutes sysmgr 709a51254b33 docker-orchagent:latest "/usr/bin/docker-ini…" 11 days ago Up 24 minutes swss 7f18cb72c949 docker-database:latest "/usr/local/bin/dock…" 11 days ago Up 24 minutes database root@sonic:/home/admin# date Sun Mar 22 02:26:58 PM UTC 2026 Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
When a single EEPROM page fails to read during `sfputil show eeprom-hexdump`, the command no longer aborts. It continues dumping the remaining pages and returns success (exit 0), with the failed page’s error message included in the output. #### What I did - **Bug**: On some modules, `sfputil show eeprom-hexdump` failed with "Failed to read EEPROM for page" and exited with an error, so no EEPROM data was shown and techsupport dumps did not contain `dump/interface.xcvrs.eeprom.RAW`. - **Fix**: Treat a single-page read failure as non-fatal: append the error line to the output and continue with the next pages. The command still exits 0 so techsupport and scripts get a full/ partial dump. #### How I did it - **sfputil/main.py** - `eeprom_hexdump_pages_general()`: Removed early `return return_code, output` on page read failure. Always append `output` and continue, then return `(0, '\n'.join(lines))`. - `eeprom_hexdump_pages_sff8472()`: Same behavior for SFF8472 (A0h / A2h pages): no early return on failure, append output and continue. - **tests/sfputil_test.py** - `test_eeprom_hexdump_all_falure`: Updated to expect exit 0 and that both ports’ headers and error messages appear in output (continue-on-failure behavior). - Added `test_eeprom_hexdump_pages_general_continues_on_single_page_failure`: one page fails, others succeed, asserts rc == 0 and output contains both error and successful page content. - Added `test_eeprom_hexdump_pages_sff8472_continues_on_single_page_failure`: same for SFF8472 (A0h ok, A2h lower fail, A2h upper ok). #### How to verify it - Unit tests: new and updated tests in `tests/sfputil_test.py` for continue-on-single-page-failure behavior. - Manual: `sfputil show eeprom-hexdump` and `show techsupport` on some modules to confirm partial EEPROM dump and presence of `interface.xcvrs.eeprom.RAW`. On EEPROM page read failure, log which page failed and continue dumping other pages instead of returning. Fixes modules where e.g. some page fails but other pages are readable. Allows techsupport dumps to include interface.xcvrs.eeprom.RAW. #### Which release branch to backport (provide reason below if selected) <!-- - Note we only backport fixes to a release branch, *not* features! - Please also provide a reason for the backporting below. - e.g. - [x] 202006 --> - [x] 202511 ##### **Before this change:** Run "sfputil show eeprom-hexdump": Error "Failed to read EEPROM for page 10h", and stopped there. Run "show techsupport": In the generated dumps, the file dump/interface.xcvrs.eeprom.RAW does not exist. ##### **After this change:** Run "sfputil show eeprom-hexdump": ```bash EEPROM hexdump for port Ethernet0 Lower page 0h 00000000 80 53 45 07 00 00 00 00 00 00 00 00 00 00 00 00 |.SE.............| 00000010 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 |..........@.....| 00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000030 00 00 00 00 00 00 00 00 03 00 00 00 00 00 80 00 |................| 00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000050 00 00 00 00 00 02 4c 14 11 ff 4b 14 11 ff 4e 1c |......L...K...N.| 00000060 22 55 4d 1c 22 55 50 1c 44 11 4f 1c 44 11 ff 00 |"UM."UP.D.O.D...| 00000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| Upper page 0h 00000080 80 4e 56 49 44 49 41 20 20 20 20 20 20 20 20 20 |.NVIDIA | 00000090 20 00 00 00 56 4d 4f 44 5f 53 50 43 35 5f 43 50 | ...VMOD_SPC5_CP| 000000a0 4f 20 20 20 00 00 56 4d 4f 44 5f 53 50 43 35 5f |O ..VMOD_SPC5_| 000000b0 43 50 4f 20 30 30 00 00 00 00 00 00 00 00 20 20 |CPO 00........ | 000000c0 20 20 20 20 20 20 20 20 50 00 00 00 00 00 00 00 | P.......| 000000d0 00 00 00 00 04 00 00 00 00 00 00 00 00 00 d1 00 |................| 000000e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000000f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| Upper page 1h 00000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000090 00 80 46 00 00 00 9d 00 00 00 00 00 00 00 00 03 |..F.............| 000000a0 00 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000000b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000000c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000000d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000000e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000000f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 69 |...............i| Upper page 2h 00000080 5f 00 f6 00 5a 00 05 00 88 b8 79 18 87 5a 7a 76 |_...Z.....y..Zzv| 00000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000000a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000000b0 fc 35 27 10 ea 91 3d f9 00 00 00 00 00 00 00 00 |.5'...=.........| 000000c0 62 20 09 d0 4d ee 0c 5a 00 00 00 00 00 00 00 00 |b ..M..Z........| 000000d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000000e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000000f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 6b |...............k| Upper page 10h Error: Failed to read EEPROM for page 10h, flat_offset 2176, page_offset 128, size 128! Upper page 11h Error: Failed to read EEPROM for page 11h, flat_offset 2304, page_offset 128, size 128! ``` Read Upper page 10h fail, but can continue to read latter page 11h, previously it will return directly. Multiple pages read failure are shown in page 10h and 11h. Run "show techsupport": In the generated dumps, the file dump/interface.xcvrs.eeprom.RAW exist. Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
* mertirc and vnetname code into 202511 Signed-off-by: alawing <alawing@gmail.com> * fix merge issue Signed-off-by: alawing <alawing@gmail.com> --------- Signed-off-by: alawing <alawing@gmail.com>
…validators * Add th6 broadcom ASIC entry with all Nokia IXR7220-H6 HWSKUs: Nokia-IXR7220-H6-64, Nokia-IXR7220-H6-P128, Nokia-IXR7220-H6-O256 * Add unit test for th6 ASIC name resolution Signed-off-by: dygodwin <dylan.godwin@nokia.com>
[202512] Cherry-pick PR #4223: Add Nokia TH6 HWSKUs to GCU field operation validators
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Create 202511_azd branch on sonic-utilities.msft