Context
MemMachine's benchmark blog discovered that LoCoMo category assignments in the paper differ from the source code:
'This finding suggests that some public LoCoMo results might be presenting misclassified data, making a direct and fair comparison challenging.'
They use the source code assignments as ground truth, not the paper's descriptions.
Action
- Compare our category assignments against the LoCoMo source code (github.com/snap-research/LoCoMo)
- Document any discrepancies with the paper
- Ensure our per-category results use the correct assignments
- If our categories were wrong, re-run and report corrected numbers
This is important for credibility — if we publish numbers with wrong categories, competitors will call it out.
Related
Milestone
v0.19.0
Context
MemMachine's benchmark blog discovered that LoCoMo category assignments in the paper differ from the source code:
They use the source code assignments as ground truth, not the paper's descriptions.
Action
This is important for credibility — if we publish numbers with wrong categories, competitors will call it out.
Related
Milestone
v0.19.0