You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
refactor(compare): clarify baseline vs test semantics in compare scripts
Print baseline/test paths at the start of output and update argument
help text. In compare_tps, flip signed_change to (test-baseline)/baseline
so positive means test is faster and negative means regression.
# Calculate signed relative change: positive means dir1 faster, negative means dir1 slower
42
-
signed_change= (avg1-avg2) /avg2ifavg2>0else0
41
+
# Calculate signed relative change of test vs baseline: positive means test faster, negative means test slower
42
+
signed_change= (avg2-avg1) /avg1ifavg1>0else0
43
43
44
44
messages= []
45
45
failed=False
46
46
ifabs(signed_change) >threshold:
47
47
sign="+"ifsigned_change>=0else""
48
48
ifsigned_change<0:
49
-
# dir1 slower than dir2 -> failure
49
+
# test slower than baseline -> failure
50
50
label="✗ SLOWER"
51
51
failed=True
52
52
else:
53
-
# dir1 faster than dir2 -> pass but notify
53
+
# test faster than baseline -> pass but notify
54
54
label="↑ FASTER"
55
-
messages.append(f" Average tok/s: {avg1:.2f} vs {avg2:.2f}{label} ({sign}{signed_change*100:.1f}%, threshold: ±{threshold*100:.0f}%)")
55
+
messages.append(f" Average tok/s: {avg1:.2f}(baseline) vs {avg2:.2f} (test){label} ({sign}{signed_change*100:.1f}%, threshold: ±{threshold*100:.0f}%)")
56
56
messages.append(f" Steps compared: {len(tps1)} vs {len(tps2)} (excluding step 1)")
0 commit comments