Skip to content

Add domain-aware PCIe handling and improve pcie_common robustness#655

Open
t2sharma wants to merge 3 commits into
sonic-net:masterfrom
t2sharma:patch-1
Open

Add domain-aware PCIe handling and improve pcie_common robustness#655
t2sharma wants to merge 3 commits into
sonic-net:masterfrom
t2sharma:patch-1

Conversation

@t2sharma
Copy link
Copy Markdown

Description:
This PR enhances pcie_common.py to support domain-aware PCIe enumeration and improves overall robustness and safety of the utility.

Problem:
The current upstream implementation:

Assumes PCI domain is implicitly 0000
Uses non-domain-aware lspci output
Uses unsafe YAML parsing (yaml.load)
Does not handle duplicate PCIe entries
Lacks visibility into PCIe AER error statistics

This leads to incorrect PCIe detection on platforms where:

PCI domains are non-zero
Multiple devices share similar BDF across domains

Changes:

  1. Domain-aware PCIe enumeration
    Switched to:
    lspci -D
    lspci -D -n
    Parses full BDF: domain:bus:device.function

  2. Added domain support in validation
    get_pcie_check() now validates using:
    domain + bus + device + function
    Backward compatible (defaults to "0000")

  3. Safer YAML parsing:
    yaml.safe_load()

  4. Duplicate PCIe filtering:
    Avoids duplicate entries using (domain, bus, dev, fn) key

  5. Vendor ID capture:
    Adds vendor field for better device identification

  6. Improved subprocess handling
    Checks return codes
    Captures stderr

  7. AER stats support (new feature)
    Added:
    get_pcie_aer_stats()
    Reads from:
    /sys/bus/pci/devices/.../aer_dev_*

  8. Config file improvements
    Supports revision-based configs:
    pcie_.yaml
    Preserves YAML key order (sort_keys=False)
    Backward Compatibility
    Existing configs without domain continue to work (default = 0000)
    No breaking change to existing workflows
    Existing CLI behavior unchanged
    Validation
    Verified PCIe detection on Arctos platform
    Compared output with:
    lspci -D

Confirmed:
Correct domain parsing
No duplicate entries
Accurate sysfs validation

Impact:
Enables correct PCIe validation on multi-domain platforms
Improves debugging visibility (AER stats)
Makes code safer and more maintainable

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented Apr 15, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@bmridul
Copy link
Copy Markdown
Collaborator

bmridul commented Apr 16, 2026

Pls add a sample UT log.

Comment thread sonic_platform_base/sonic_pcie/pcie_common.py Outdated
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@t2sharma
Copy link
Copy Markdown
Author

@bgallagher-nexthop I checked the CI log (Azure Pipelines / Azure.sonic-platform-common) in detail. The failure is in tests/pcie_common_test.py, where the mock still expects ['sudo', 'lspci'] and ['sudo', 'lspci', '-n']. The updated implementation now correctly uses domain-aware commands ['sudo', 'lspci', '-D'] and ['sudo', 'lspci', '-D', '-n'], so the mock falls through and output never gets assigned, causing the UnboundLocalError. We need to update the unit test mock to match the new commands, and also update the expected PCIe entries to include the new domain field returned by the implementation. Please help with updating of the script for Pipeline to Pass.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

t2sharma and others added 3 commits April 23, 2026 15:09
Adding domain-aware PCIe handling and improve pcie_common robustness

Signed-off-by: t2sharma <dhananjais009@gmail.com>
Co-authored-by: Brian Gallagher <bgallagher@nexthop.ai>
Signed-off-by: t2sharma <dhananjais009@gmail.com>
Signed-off-by: t2sharma <dhananjais009@gmail.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@t2sharma
Copy link
Copy Markdown
Author

manual ut -f
pcie_module_UT.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants