Skip to content

Copy of PR #2462#2468

Draft
shaahji wants to merge 5 commits into
mainfrom
ykhrustalev/lmeval-ort-chat-template
Draft

Copy of PR #2462#2468
shaahji wants to merge 5 commits into
mainfrom
ykhrustalev/lmeval-ort-chat-template

Conversation

@shaahji
Copy link
Copy Markdown
Collaborator

@shaahji shaahji commented May 20, 2026

Describe your changes

Checklist before requesting a review

  • Add unit tests for this change.
  • Make sure all tests can pass.
  • Update documents if necessary.
  • Lint and apply fixes to your code by running lintrunner -a
  • Is this a user-facing change? If yes, give a description of this change to be included in the release notes.

(Optional) Issue link

lm-eval's `simple_evaluate(..., apply_chat_template=True)` requires the
underlying LM class to implement `tokenizer_name` and `apply_chat_template`.
The HFLM backend has both; the ORT GenAI backend does not, so any attempt
to evaluate a chat-tuned ONNX model with chat-formatted prompts raises
`NotImplementedError: To use this model with chat templates, please
implement the 'tokenizer_name' property.`

This adds the two members with the minimum surface area:
  - `tokenizer_name` returns the model path (for lm-eval's chat-aware
    result caching), matching the HFLM convention of slash-replacement.
  - `apply_chat_template` defers to the model's HF tokenizer via
    `AutoTokenizer.apply_chat_template`, mirroring HFLM's
    implementation.

The HF tokenizer is loaded once at `__init__` purely for chat-template
rendering; token-level encode/decode still goes through `og.Tokenizer`
and the runtime, so there is no change to generation behavior or any
existing code path.

Verified end-to-end on LFM2.5-350M (int4, k_quant_mixed) MBPP:
without chat-template hooks the eval raised at task start; with them
plus `num_fewshot=0` and a chat-friendly stop list, pass@1 went from
0.0/500 to 67/500 (13.4%) -- the original 0.0 was a prompt-format
artifact (instruct model + completion-style few-shot), not a
conversion regression.
… key, tests

- Lazy-load the HF tokenizer on the first ``apply_chat_template`` call rather
  than at ``__init__``. Callers that never enable chat templating no longer
  need HF tokenizer files (``tokenizer_config.json`` etc.) in the model
  directory; eager loading would have regressed those workflows.

- ``tokenizer_name`` now replaces both POSIX and Windows path separators with
  ``__`` so the lm-eval cache identifier is stable across platforms. The
  previous implementation only handled forward slashes, leaving backslashes
  in the key on Windows because ``str(Path(...))`` preserves the native
  separator.

- Add unit tests for both behaviours:
    - ``tokenizer_name`` parametrised over POSIX, relative, and Windows-style
      paths to lock in the normalisation contract.
    - ``apply_chat_template`` verified to (a) not load the HF tokenizer at
      construction, (b) load once on first call, and (c) reuse the cached
      tokenizer on subsequent calls. ``AutoTokenizer`` is patched so the
      tests run without any HF tokenizer files on disk.

All four new tests pass; ``test_olive_evaluator.py`` as a whole stays green
(85 passed). ``lintrunner`` reports no new warnings on the changed files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants