Skip to content

Add complete bank support for CMIS transceivers#640

Merged
prgeor merged 11 commits into
sonic-net:masterfrom
nexthop-ai:bank-support-cmis
May 18, 2026
Merged

Add complete bank support for CMIS transceivers#640
prgeor merged 11 commits into
sonic-net:masterfrom
nexthop-ai:bank-support-cmis

Conversation

@bobby-nexthop
Copy link
Copy Markdown
Contributor

@bobby-nexthop bobby-nexthop commented Mar 24, 2026

Description

This PR extends the bank parameter support introduced in PR #632 to fully enable multi-bank CMIS transceiver support throughout the stack.

Changes include:

  1. Memory Map Bank Awareness:

    • Updated CmisFlatMemMap, CmisMemMap, and CCmisMemMap to accept bank parameter
    • Implemented bank-aware getaddr() method in CmisFlatMemMap that correctly calculates linear offsets using the formula from sonic-linux-kernel PR
    • Added bank property to CmisFlatMemMap for read-only access
    • Added constants: CMIS_EEPROM_PAGE_SIZE, CMIS_NUM_NON_BANKED_PAGES, CMIS_NUM_BANKED_PAGES, CMIS_ARCH_PAGES
    • Uses unified formula: linear_offset = (bank * CMIS_ARCH_PAGES + page) * page_size + offset
    • Each bank is treated as a full 256-page (32KB) architectural block for proper alignment
  2. API Factory Updates:

    • Added bank parameter to create_xcvr_api() and _create_cmis_api()
    • Updated CMIS API instantiation to pass bank parameter to memory maps
    • Modified id_mapping to pass bank parameter for all CMIS module types (0x18, 0x19, 0x1b, 0x1e)
  3. SfpBase Integration:

    • Updated refresh_xcvr_api() to pass bank parameter when creating xcvr_api
  4. Field Definitions:

    • Added BANKS_SUPPORTED_FIELD constant to consts.py
    • Added BanksSupported field to MODULE_CHAR_ADVT in CmisMemMap

This enables proper support for CMIS modules with multiple memory banks, allowing correct EEPROM access across all banks as defined in the CMIS specification. The offset calculation formula matches the optoe kernel driver implementation in sonic-linux-kernel PR #473.

Motivation and Context

We are implementing banking support as allowed by the CMIS spec. Our HLD is available at sonic-net/SONiC#2183

How Has This Been Tested?

I don't have banked hardware yet to test on, have confirmed that these changes are backwards compatible and don't break existing systems

Additional Information (Optional)

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@bobby-nexthop bobby-nexthop marked this pull request as ready for review March 27, 2026 22:18
This PR extends the bank parameter support introduced in PR sonic-net#632 to
fully enable multi-bank CMIS transceiver support throughout the stack.

Changes include:

1. **Memory Map Bank Awareness**:
   - Updated CmisFlatMemMap, CmisMemMap, and CCmisMemMap to accept bank parameter
   - Implemented bank-aware getaddr() method in CmisFlatMemMap that correctly
     calculates linear offsets using the formula from sonic-linux-kernel PR sonic-net#473
   - Added bank property to CmisFlatMemMap for read-only access
   - Added constants: CMIS_EEPROM_PAGE_SIZE, CMIS_NUM_NON_BANKED_PAGES,
     CMIS_NUM_BANKED_PAGES, CMIS_ARCH_PAGES
   - Uses unified formula: linear_offset = (bank * CMIS_ARCH_PAGES + page) * page_size + offset
   - Each bank is treated as a full 256-page (32KB) architectural block for proper alignment

2. **API Factory Updates**:
   - Added bank parameter to create_xcvr_api() and _create_cmis_api()
   - Updated CMIS API instantiation to pass bank parameter to memory maps
   - Modified id_mapping to pass bank parameter for all CMIS module types
     (0x18, 0x19, 0x1b, 0x1e)

3. **SfpBase Integration**:
   - Updated refresh_xcvr_api() to pass bank parameter when creating xcvr_api

4. **Field Definitions**:
   - Added BANKS_SUPPORTED_FIELD constant to consts.py
   - Added BanksSupported field to MODULE_CHAR_ADVT in CmisMemMap

This enables proper support for CMIS modules with multiple memory banks,
allowing correct EEPROM access across all banks as defined in the CMIS
specification. The offset calculation formula matches the optoe kernel
driver implementation in sonic-linux-kernel PR sonic-net#473.

Signed-off-by: Bobby McGonigle <bobby@nexthop.ai>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@prgeor prgeor requested review from Copilot and mihirpat1 April 20, 2026 04:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends CMIS transceiver support to be bank-aware end-to-end (SfpBase → API factory → CMIS mem maps) so EEPROM access can be targeted to a selected CMIS memory bank.

Changes:

  • Added bank parameter plumbing into SfpBase.refresh_xcvr_api() and XcvrApiFactory.create_xcvr_api() / _create_cmis_api().
  • Updated CMIS memory map classes to accept a bank and implemented a bank-aware getaddr() offset calculation.
  • Introduced a BanksSupported field definition and corresponding constant.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
sonic_platform_base/sonic_xcvr/xcvr_api_factory.py Threads bank through CMIS API creation paths.
sonic_platform_base/sonic_xcvr/mem_maps/public/cmis.py Adds bank-aware addressing to CMIS mem maps and defines a new BanksSupported field.
sonic_platform_base/sonic_xcvr/mem_maps/public/c_cmis.py Passes bank through to the coherent CMIS mem map.
sonic_platform_base/sonic_xcvr/fields/consts.py Adds the BANKS_SUPPORTED_FIELD constant.
sonic_platform_base/sfp_base.py Ensures bank is provided when refreshing the cached transceiver API.

Comment thread sonic_platform_base/sonic_xcvr/xcvr_api_factory.py
Comment on lines +157 to +177
For lower memory (page 0, offset < 128):
linear_offset = offset

For paged memory:
offset_in_paged_area = (page * page_size + offset) - 128
bytes_per_bank = CMIS_ARCH_PAGES * page_size (256 * 128 = 32KB)
linear_offset = 128 + (bank * bytes_per_bank) + offset_in_paged_area

Simplified:
linear_offset = (bank * CMIS_ARCH_PAGES + page) * page_size + offset

Note: Each bank is treated as a full 256-page (32KB) architectural block,
even though only pages 10h-FFh (240 pages) are actually banked. This ensures
proper alignment and matches the kernel driver behavior.
"""
if page == 0 and offset < 128:
# Lower memory - not affected by banking
return offset

# For all paged memory (including bank 0), use the unified formula
# that treats each bank as a 256-page (32KB) block
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getaddr() only exempts the lower 128 bytes (page 0, offset < 128) from banking, but the constants/comments state pages 00h–0Fh are non-banked. With the current formula, upper page 00h (offset 128–255) and pages 01h–0Fh will be shifted by bank * 32KB, which is likely incorrect if those pages are meant to be shared across banks. If pages 00h–0Fh are truly non-banked, special-case them to always use bank 0 (or otherwise document why they must be bank-offset).

Suggested change
For lower memory (page 0, offset < 128):
linear_offset = offset
For paged memory:
offset_in_paged_area = (page * page_size + offset) - 128
bytes_per_bank = CMIS_ARCH_PAGES * page_size (256 * 128 = 32KB)
linear_offset = 128 + (bank * bytes_per_bank) + offset_in_paged_area
Simplified:
linear_offset = (bank * CMIS_ARCH_PAGES + page) * page_size + offset
Note: Each bank is treated as a full 256-page (32KB) architectural block,
even though only pages 10h-FFh (240 pages) are actually banked. This ensures
proper alignment and matches the kernel driver behavior.
"""
if page == 0 and offset < 128:
# Lower memory - not affected by banking
return offset
# For all paged memory (including bank 0), use the unified formula
# that treats each bank as a 256-page (32KB) block
Pages 00h-0Fh are architecturally non-banked and must always map to the
shared address space regardless of the selected bank:
linear_offset = page * page_size + offset
For banked pages 10h-FFh:
linear_offset = (bank * CMIS_ARCH_PAGES + page) * page_size + offset
Note: Each bank is still treated as a full 256-page (32KB)
architectural block for banked pages to preserve alignment and match
the kernel driver behavior.
"""
if page < CMIS_NUM_NON_BANKED_PAGES:
# Pages 00h-0Fh are not affected by banking
return page * page_size + offset
# Banked pages 10h-FFh use the unified formula that treats each bank
# as a 256-page (32KB) architectural block.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suggestion is invalid, since the addressing approach of treating all upper memory as if it were banked is intentional.

Copy link
Copy Markdown
Contributor

@bgallagher-nexthop bgallagher-nexthop May 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've actually addressed this by clamping the bank parameter to 0 for non-banked pages. This will result in us writing 0 to the bank select register when accessing non-banked pages.

Comment thread sonic_platform_base/sonic_xcvr/mem_maps/public/cmis.py
Comment thread sonic_platform_base/sonic_xcvr/mem_maps/public/cmis.py
@Natgerbi
Copy link
Copy Markdown

@bobby-nexthop, is this PR still relevant? I see that you also have PR sonic-net/SONiC#2183, which seems to be more maintained and includes the paging part that does not exist here.


# Constants matching optoe driver
CMIS_EEPROM_PAGE_SIZE = 128
CMIS_NUM_NON_BANKED_PAGES = 16 # pages 00h-0Fh
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bobby-nexthop can we add a unit test to cover a scenario to ensure that NONE of the EEPROM memory map field mistakenly does not set bank >=1 for Unbanked Pages.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I have changed getaddr() to clamp bank to 0 when accessing unbanked pages.
  • I have added unit-tests that ensure we calculate the same offset regardless of what bank is set to.

linear_offset = 128 + (bank * bytes_per_bank) + offset_in_paged_area

Simplified:
linear_offset = (bank * CMIS_ARCH_PAGES + page) * page_size + offset
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bobby-nexthop will be good to put an ASSERT here to make sure none of the memmap fields mistakenly access NON-banked pages. And in the unit test we can exercise this. This way we ensure this requirement at single place.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new unit-tests provide coverage to prevent incorrect offset calculation. Let me know if I am missing anything.

Signed-off-by: Brian Gallagher <bgallagher@nexthop.ai>
…ith a banked register

Signed-off-by: Brian Gallagher <bgallagher@nexthop.ai>
@abelamit
Copy link
Copy Markdown

@bobby-nexthop, is this PR still relevant? I see that you also have PR sonic-net/SONiC#2183, which seems to be more maintained and includes the paging part that does not exist here.

hi @bobby-nexthop - can you please comment if you plan to rebase this commit on top of the other commit proposed? It seems like they are not correlated.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@prgeor
Copy link
Copy Markdown
Collaborator

prgeor commented May 14, 2026

@bobby-nexthop, is this PR still relevant? I see that you also have PR sonic-net/SONiC#2183, which seems to be more maintained and includes the paging part that does not exist here.

@Natgerbi can you be more specific? feel free to quote the ref section or snapshot here

…n-banked pages

Signed-off-by: Brian Gallagher <bgallagher@nexthop.ai>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Brian Gallagher <bgallagher@nexthop.ai>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Brian Gallagher <bgallagher@nexthop.ai>
…bank_size

Signed-off-by: Brian Gallagher <bgallagher@nexthop.ai>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Brian Gallagher <bgallagher@nexthop.ai>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).


# If we are accessing a non-banked page, there is no reason to set the bank
# to a non-zero value.
bank = 0 if page < CMIS_NUM_NON_BANKED_PAGES else self.bank
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Natgerbi @abelamit please note this TODO for your platform.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2026-05-15 at 1 33 57 PM

For reference, this is the relevant part of the CMIS spec this TODO is referring to (2 CDB instances are possible to support, accessible via bank selection).

except (OSError, IOError):
pass

def set_optoe_max_bank_size(self, max_bank_size):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why bank size is set in optoe, we should move it into chassis. the scope of sfp is within one bank, not the whole eeprom

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, however, the API and the ability to access the EEPROM and detect module removal/insertion is at a per-bank SFP object level. We do ensure read-before-write semantics to ensure we don't write max_bank_size if it is already correct to prevent repeated writes. Do you have any recommendations on where we could put it in the chassis and how we could set it and also handle setting it upon transceiver removal/insertion?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While each SFP object is associated with a single bank, I chose this approach for the following reasons:

  • the write needs to be triggered again in the event of a new transceiver being plugged in with a different number of banks than the old one. Housing this logic in refresh_xcvr_api gives us that behaviour for free. If it was in the Chassis code, this would not be the case.
  • writing to max_bank_size is an implementation detail specific to the optoe driver, so it is nice to keep this logic in optoe-specific code.

The change has been implemented with read-before-write idempotency, so the following sequence of events would occur if a Chassis created 4 Sfp objects for a 4-banked transceiver:

  • Sfp object with bank=0 would do nothing (this is so unbanked transceivers continue to operate as they previously did before banking support was introduced)
  • Sfp object with bank=1 will perform the write to max_bank_size.
  • Sfp object with bank=2 will see the value has already been written and do nothing.
  • Sfp object with bank=3 will see the value has already been written and do nothing.

We will ultimately only write to the max_bank_size sysfs path once, while still keeping this optoe specific quirk internal to SfpOptoeBase.

except (OSError, IOError):
pass

def _read_optoe_max_bank_size(self):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should also move to chassis if necessary

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above.

…OptoeBase

Signed-off-by: Brian Gallagher <bgallagher@nexthop.ai>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@Natgerbi
Copy link
Copy Markdown

see that you also have PR sonic-net/SONiC#2183, which seems to be more maintained and includes the paging part that does not exist

What I meant is that section 7.5 of the HLD, “Memory Map Abstraction Changes (cmis.py),” already aligns with the new abstraction layer based on memory-map pages. However, I think the actual implementation of the page abstraction can be added later in a separate PR.

Copy link
Copy Markdown
Contributor

@micas-net micas-net left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good to me!


def _read_optoe_max_bank_size(self):
"""Determine optoe max_bank_size from the module's CMIS BanksSupported advertisement, or None if non-CMIS, flat-memory, or unreadable."""
id_byte = self.read_eeprom(0, 1)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgallagher-nexthop the optoe API here assumes its accessing CMIS memory map even before API is refreshed.
TODO: Refresh the API and then set the max bank if the API supports it.

@prgeor prgeor merged commit 39c3bd6 into sonic-net:master May 18, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants