Skip to content

perf[next-dace]: Update thread block size to (256, 1)#2598

Open
iomaganaris wants to merge 1 commit into
mainfrom
update_threadblocksize
Open

perf[next-dace]: Update thread block size to (256, 1)#2598
iomaganaris wants to merge 1 commit into
mainfrom
update_threadblocksize

Conversation

@iomaganaris
Copy link
Copy Markdown
Contributor

@iomaganaris iomaganaris commented May 11, 2026

Update thread block size to (256, 1) as a default. Thanks to @dganellari for finding this out 👍
Initially this seemed like it improves MI300A performance but I've also tested GH200 and A100 CPUs with blueline in a single node (icon_ch1-medium-stencils experiment) and I saw improvements in some stencils.
Here are the results per GT4Py program in GH200:
image
And here are the results from icon_ch1-medium that reports the timer of the whole dycore (taking the min timer for nh_solve from each output) from A100:

| case            | t_min (s)  | acceleration vs ACC |
|:----------------|:----------:|:-------------------:|
| ACC             | 0.00615907 |        1.000        |
| icon4py default | 0.00517106 |        1.191        |
| icon4py 256x1   | 0.00480199 |        1.282        |

I thought that since we see improvements in all cases we can use it as the default in GT4Py

Co-authored-by: Daniel Ganellari <gadaniel@ethz.ch>
Copy link
Copy Markdown
Contributor

@philip-paul-mueller philip-paul-mueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

However, I am not sure if it is also set somewhere in ICON4Py.
But from the numbers you posted I do not think that.

Copy link
Copy Markdown
Contributor

@edopao edopao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good results, and nice color gradients in the plot!
However, I am not sure whether these default settings should be set in gt4py or in icon4py, in model_backends.py. To me, it feels more an application default, and it should belong to icon4py.

@iomaganaris
Copy link
Copy Markdown
Contributor Author

However, I am not sure whether these default settings should be set in gt4py or in icon4py, in model_backends.py. To me, it feels more an application default, and it should belong to icon4py.

I had the same trouble figuring out where this should be set. Since we have to set a default value in GT4Py for that though I thought it's better to set there the best values we know until now. If you think these should still go to icon4py let me know. I don't have a strong opinion tbh

@philip-paul-mueller
Copy link
Copy Markdown
Contributor

I also kind of think that it should be in ICON4Py, the question is just what value do we use in GT4Py?

  • We could use the old one for compatibility.
  • We could make it mandatory and force the user to make a choice.
  • We could use the new value.

@edopao
Copy link
Copy Markdown
Contributor

edopao commented May 12, 2026

I also kind of think that it should be in ICON4Py, the question is just what value do we use in GT4Py?

1. We could use the old one for compatibility.

2. We could make it mandatory and force the user to make a choice.

3. We could use the new value.

Adding a fourth option:
4. We could set default None, so use the dace default value.

I vote for either 1 or 4, but I don't have a strong opinion.

@iomaganaris
Copy link
Copy Markdown
Contributor Author

I think that 1. might add confusion so I would propose 4.
Thanks a lot for the suggestions 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants