@@ -988,6 +988,171 @@ pto.store_scalar %val, %ptr[%offset] : !pto.ptr<f32>, f32
988988
989989---
990990
991+ ##### ` pto.tput ` - Synchronous Remote Write
992+
993+ ** Summary:** Lowers to ` pto::comm::TPUT(...) ` and copies data from local GM to remote GM through a VEC staging tile.
994+
995+ ** Arguments:**
996+
997+ | Name | Type | Description |
998+ | ------| ------| -------------|
999+ | ` dst ` | GM memref / ` pto.tensor_view ` / ` pto.partition_tensor_view ` | Remote destination buffer |
1000+ | ` src ` | GM memref / ` pto.tensor_view ` / ` pto.partition_tensor_view ` | Local source buffer |
1001+ | ` ping ` | ` pto.tile_buf ` / local VEC memref | Required staging tile |
1002+ | ` pong ` | ` pto.tile_buf ` / local VEC memref | Optional second staging tile for ping-pong transfer |
1003+ | ` atomicType ` | ` #pto.atomic_type<...> ` | Atomic mode, default ` atomic_none ` |
1004+
1005+ ** Constraints & Verification:**
1006+
1007+ - ` dst ` / ` src ` must be GM-shaped values with positive static shapes.
1008+ - ` dst ` and ` src ` must have the same element type and static shape.
1009+ - ` ping ` / ` pong ` must be local VEC tile-like values whose element type matches ` src ` .
1010+
1011+ ** Basic Example:**
1012+
1013+ ``` mlir
1014+ pto.tput %dst, %src, %ping {atomicType = #pto.atomic_type<atomic_none>} :
1015+ !pto.partition_tensor_view<128xf32>, !pto.partition_tensor_view<128xf32>, !pto.tile_buf<loc=vec, dtype=f32, rows=1, cols=128, v_row=1, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>
1016+
1017+ pto.tput %dst, %src, %ping, %pong {atomicType = #pto.atomic_type<atomic_add>} :
1018+ !pto.partition_tensor_view<128xf32>, !pto.partition_tensor_view<128xf32>, !pto.tile_buf<loc=vec, dtype=f32, rows=1, cols=128, v_row=1, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>, !pto.tile_buf<loc=vec, dtype=f32, rows=1, cols=128, v_row=1, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>
1019+ ```
1020+
1021+ ---
1022+
1023+ ##### ` pto.tget ` - Synchronous Remote Read
1024+
1025+ ** Summary:** Lowers to ` pto::comm::TGET(...) ` and copies data from remote GM to local GM through a VEC staging tile.
1026+
1027+ ** Arguments:**
1028+
1029+ | Name | Type | Description |
1030+ | ------| ------| -------------|
1031+ | ` dst ` | GM memref / ` pto.tensor_view ` / ` pto.partition_tensor_view ` | Local destination buffer |
1032+ | ` src ` | GM memref / ` pto.tensor_view ` / ` pto.partition_tensor_view ` | Remote source buffer |
1033+ | ` ping ` | ` pto.tile_buf ` / local VEC memref | Required staging tile |
1034+ | ` pong ` | ` pto.tile_buf ` / local VEC memref | Optional second staging tile for ping-pong transfer |
1035+
1036+ ** Constraints & Verification:**
1037+
1038+ - Same GM/global-like and staging constraints as ` pto.tput ` .
1039+ - ` dst ` and ` src ` must have the same element type and static shape.
1040+
1041+ ** Basic Example:**
1042+
1043+ ``` mlir
1044+ pto.tget %dst, %src, %ping :
1045+ !pto.partition_tensor_view<128xf32>, !pto.partition_tensor_view<128xf32>, !pto.tile_buf<loc=vec, dtype=f32, rows=1, cols=128, v_row=1, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>
1046+ ```
1047+
1048+ ---
1049+
1050+ ##### ` pto.tnotify ` / ` pto.twait ` / ` pto.ttest ` - Communication Signal Ops
1051+
1052+ ** Summary:** Lower to ` pto::comm::TNOTIFY/TWAIT/TTEST ` for GM ` i32 ` signal buffers.
1053+
1054+ ** Arguments:**
1055+
1056+ | Op | Operands | Attributes | Result |
1057+ | ----| ----------| ------------| --------|
1058+ | ` pto.tnotify ` | ` signal ` , ` value ` | ` notifyOp = #pto.notify_op<atomic_add/set> ` | none |
1059+ | ` pto.twait ` | ` signal ` , ` cmpValue ` | ` cmp = #pto.wait_cmp<eq/ne/gt/ge/lt/le> ` | none |
1060+ | ` pto.ttest ` | ` signal ` , ` cmpValue ` | ` cmp = #pto.wait_cmp<eq/ne/gt/ge/lt/le> ` | ` i1 ` |
1061+
1062+ ** Constraints & Verification:**
1063+
1064+ - ` signal ` must be a GM-shaped value with element type ` i32 ` .
1065+ - ` value ` / ` cmpValue ` must be signless integer scalars.
1066+
1067+ ** Basic Example:**
1068+
1069+ ``` mlir
1070+ pto.tnotify %sig, %v {notifyOp = #pto.notify_op<set>} : !pto.partition_tensor_view<1xi32>, i32
1071+ pto.twait %sig, %v {cmp = #pto.wait_cmp<ge>} : !pto.partition_tensor_view<1xi32>, i32
1072+ %ok = pto.ttest %sig, %v {cmp = #pto.wait_cmp<eq>} : !pto.partition_tensor_view<1xi32>, i32 -> i1
1073+ ```
1074+
1075+ ---
1076+
1077+ ##### ` pto.tbroadcast ` - Collective Broadcast
1078+
1079+ ** Summary:** Lowers to ` pto::comm::TBROADCAST(...) ` .
1080+
1081+ ** Arguments:**
1082+
1083+ | Name | Type | Description |
1084+ | ------| ------| -------------|
1085+ | ` src ` | GM-shaped value | Root source buffer |
1086+ | ` ping ` / ` pong ` | local VEC tile-like values | Staging tiles |
1087+ | ` group ` | variadic GM-shaped values | Parallel group members |
1088+ | ` root ` | ` i32 ` attr | Root rank index inside ` group ` |
1089+
1090+ ** Constraints & Verification:**
1091+
1092+ - ` group ` must be non-empty and all members must have identical types.
1093+ - ` src ` must have the same type as each ` group ` member.
1094+ - ` root ` must be in range ` [0, group.size) ` .
1095+
1096+ ** Basic Example:**
1097+
1098+ ``` mlir
1099+ pto.tbroadcast %src, %ping, %g0, %g1, %g2 {root = 1, operandSegmentSizes = array<i32: 1, 1, 0, 3>} :
1100+ !pto.partition_tensor_view<128xf32>, !pto.tile_buf<loc=vec, dtype=f32, rows=1, cols=128, v_row=1, v_col=128, blayout=row_major, slayout=none_box, fractal=512, pad=0>, !pto.partition_tensor_view<128xf32>, !pto.partition_tensor_view<128xf32>, !pto.partition_tensor_view<128xf32>
1101+ ```
1102+
1103+ ---
1104+
1105+ ##### ` pto.comm_tgather ` - Collective Gather
1106+
1107+ ** Summary:** Communication collective that lowers to ` pto::comm::TGATHER(...) ` . This op is distinct from tile-level ` pto.tgather ` .
1108+
1109+ ** Arguments:** ` dst ` , ` ping ` , optional ` pong ` , variadic ` group ` , ` root `
1110+
1111+ ** Constraints & Verification:**
1112+
1113+ - ` group ` must be non-empty and all members must have identical types.
1114+ - ` dst ` element type must match the group element type.
1115+ - ` ping ` / ` pong ` must be local VEC tile-like values with matching element type.
1116+
1117+ ---
1118+
1119+ ##### ` pto.comm_tscatter ` - Collective Scatter
1120+
1121+ ** Summary:** Communication collective that lowers to ` pto::comm::TSCATTER(...) ` . This op is distinct from tile-level ` pto.tscatter ` .
1122+
1123+ ** Arguments:** ` src ` , ` ping ` , optional ` pong ` , variadic ` group ` , ` root `
1124+
1125+ ** Constraints & Verification:**
1126+
1127+ - ` group ` must be non-empty and all members must have identical types.
1128+ - ` src ` element type must match the group element type.
1129+ - ` ping ` / ` pong ` must be local VEC tile-like values with matching element type.
1130+
1131+ ---
1132+
1133+ ##### ` pto.treduce ` - Collective Reduce
1134+
1135+ ** Summary:** Lowers to ` pto::comm::TREDUCE(...) ` .
1136+
1137+ ** Arguments:**
1138+
1139+ | Name | Type | Description |
1140+ | ------| ------| -------------|
1141+ | ` dst ` | GM-shaped value | Root destination buffer |
1142+ | ` acc ` | local VEC tile-like value | Accumulation tile |
1143+ | ` recvPing ` / ` recvPong ` | local VEC tile-like values | Receive staging tiles |
1144+ | ` group ` | variadic GM-shaped values | Parallel group members |
1145+ | ` reduceOp ` | ` #pto.reduce_op<sum/max/min> ` | Reduction mode |
1146+ | ` root ` | ` i32 ` attr | Root rank index inside ` group ` |
1147+
1148+ ** Constraints & Verification:**
1149+
1150+ - ` group ` must be non-empty and all members must have identical types.
1151+ - ` dst ` element type must match the group element type.
1152+ - ` acc ` and ` recvPing ` / ` recvPong ` must be local VEC tile-like values whose element type matches ` dst ` .
1153+
1154+ ---
1155+
9911156##### ` pto.tmov ` - Tile Move Between Local Domains
9921157
9931158** Summary:** Moves data between local memory domains (for example ` mat/acc/vec/bias/scaling ` ) using tile buffers, and supports the same optional parameter families as the ` TMOV/TMOV_FP ` APIs in ` pto-isa ` .
0 commit comments