Skip to content

Add AMDSMIObserver that uses amdsmi to measure energy#372

Open
stijnh wants to merge 1 commit intomasterfrom
amdsmi-observer
Open

Add AMDSMIObserver that uses amdsmi to measure energy#372
stijnh wants to merge 1 commit intomasterfrom
amdsmi-observer

Conversation

@stijnh
Copy link
Copy Markdown
Member

@stijnh stijnh commented Mar 31, 2026

This PR adds AMDSMIObserver, a new observer that uses amdsmi to monitor the GPU.

It can measure:

  • energy (in Joule)
  • core clock frequency
  • memory clock frequency
  • temperature
  • core voltage

It requires no dependencies, besides amdsmi (which is installed by default with rocm).

I have tested this on LUMI with ROCm 6.3.4. The ROCm API is quite unstable so hopefully it also works with other versions.

@sonarqubecloud
Copy link
Copy Markdown

@stijnh stijnh changed the title Add AMDSMIObserver to uses amdsmi Add AMDSMIObserver that uses amdsmi to measure energy Mar 31, 2026
@wjp
Copy link
Copy Markdown
Collaborator

wjp commented Apr 1, 2026

It seems to work with rocm 7.1 for me as well.

I do get an error if I don't pass device_id=0 to the AMDSMIObserver (ValueError: failed to detect AMD device: invalid UUID of backend: None), when testing with examples/hip/vector_add.py . I don't see HipFunctions storing the uuid, so maybe I'm not using the right version of the KT hip backend yet?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants