Add domain-aware PCIe handling and improve pcie_common robustness#655
Add domain-aware PCIe handling and improve pcie_common robustness#655t2sharma wants to merge 3 commits into
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Pls add a sample UT log. |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@bgallagher-nexthop I checked the CI log (Azure Pipelines / Azure.sonic-platform-common) in detail. The failure is in tests/pcie_common_test.py, where the mock still expects ['sudo', 'lspci'] and ['sudo', 'lspci', '-n']. The updated implementation now correctly uses domain-aware commands ['sudo', 'lspci', '-D'] and ['sudo', 'lspci', '-D', '-n'], so the mock falls through and output never gets assigned, causing the UnboundLocalError. We need to update the unit test mock to match the new commands, and also update the expected PCIe entries to include the new domain field returned by the implementation. Please help with updating of the script for Pipeline to Pass. |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Adding domain-aware PCIe handling and improve pcie_common robustness Signed-off-by: t2sharma <dhananjais009@gmail.com>
Co-authored-by: Brian Gallagher <bgallagher@nexthop.ai> Signed-off-by: t2sharma <dhananjais009@gmail.com>
Signed-off-by: t2sharma <dhananjais009@gmail.com>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
manual ut -f |
Description:
This PR enhances pcie_common.py to support domain-aware PCIe enumeration and improves overall robustness and safety of the utility.
Problem:
The current upstream implementation:
Assumes PCI domain is implicitly 0000
Uses non-domain-aware lspci output
Uses unsafe YAML parsing (yaml.load)
Does not handle duplicate PCIe entries
Lacks visibility into PCIe AER error statistics
This leads to incorrect PCIe detection on platforms where:
PCI domains are non-zero
Multiple devices share similar BDF across domains
Changes:
Domain-aware PCIe enumeration
Switched to:
lspci -D
lspci -D -n
Parses full BDF: domain:bus:device.function
Added domain support in validation
get_pcie_check() now validates using:
domain + bus + device + function
Backward compatible (defaults to "0000")
Safer YAML parsing:
yaml.safe_load()
Duplicate PCIe filtering:
Avoids duplicate entries using (domain, bus, dev, fn) key
Vendor ID capture:
Adds vendor field for better device identification
Improved subprocess handling
Checks return codes
Captures stderr
AER stats support (new feature)
Added:
get_pcie_aer_stats()
Reads from:
/sys/bus/pci/devices/.../aer_dev_*
Config file improvements
Supports revision-based configs:
pcie_.yaml
Preserves YAML key order (sort_keys=False)
Backward Compatibility
Existing configs without domain continue to work (default = 0000)
No breaking change to existing workflows
Existing CLI behavior unchanged
Validation
Verified PCIe detection on Arctos platform
Compared output with:
lspci -D
Confirmed:
Correct domain parsing
No duplicate entries
Accurate sysfs validation
Impact:
Enables correct PCIe validation on multi-domain platforms
Improves debugging visibility (AER stats)
Makes code safer and more maintainable