Copy of PR #2462 by shaahji · Pull Request #2468 · microsoft/Olive

shaahji · 2026-05-20T17:45:48Z

Describe your changes

Checklist before requesting a review

Add unit tests for this change.
Make sure all tests can pass.
Update documents if necessary.
Lint and apply fixes to your code by running lintrunner -a
Is this a user-facing change? If yes, give a description of this change to be included in the release notes.

(Optional) Issue link

lm-eval's `simple_evaluate(..., apply_chat_template=True)` requires the underlying LM class to implement `tokenizer_name` and `apply_chat_template`. The HFLM backend has both; the ORT GenAI backend does not, so any attempt to evaluate a chat-tuned ONNX model with chat-formatted prompts raises `NotImplementedError: To use this model with chat templates, please implement the 'tokenizer_name' property.` This adds the two members with the minimum surface area: - `tokenizer_name` returns the model path (for lm-eval's chat-aware result caching), matching the HFLM convention of slash-replacement. - `apply_chat_template` defers to the model's HF tokenizer via `AutoTokenizer.apply_chat_template`, mirroring HFLM's implementation. The HF tokenizer is loaded once at `__init__` purely for chat-template rendering; token-level encode/decode still goes through `og.Tokenizer` and the runtime, so there is no change to generation behavior or any existing code path. Verified end-to-end on LFM2.5-350M (int4, k_quant_mixed) MBPP: without chat-template hooks the eval raised at task start; with them plus `num_fewshot=0` and a chat-friendly stop list, pass@1 went from 0.0/500 to 67/500 (13.4%) -- the original 0.0 was a prompt-format artifact (instruct model + completion-style few-shot), not a conversion regression.

… key, tests - Lazy-load the HF tokenizer on the first ``apply_chat_template`` call rather than at ``__init__``. Callers that never enable chat templating no longer need HF tokenizer files (``tokenizer_config.json`` etc.) in the model directory; eager loading would have regressed those workflows. - ``tokenizer_name`` now replaces both POSIX and Windows path separators with ``__`` so the lm-eval cache identifier is stable across platforms. The previous implementation only handled forward slashes, leaving backslashes in the key on Windows because ``str(Path(...))`` preserves the native separator. - Add unit tests for both behaviours: - ``tokenizer_name`` parametrised over POSIX, relative, and Windows-style paths to lock in the normalisation contract. - ``apply_chat_template`` verified to (a) not load the HF tokenizer at construction, (b) load once on first call, and (c) reuse the cached tokenizer on subsequent calls. ``AutoTokenizer`` is patched so the tests run without any HF tokenizer files on disk. All four new tests pass; ``test_olive_evaluator.py`` as a whole stays green (85 passed). ``lintrunner`` reports no new warnings on the changed files.

ykhrustalev added 5 commits May 12, 2026 17:06

Trim comments and docstrings on chat-template hooks

e174f59

Use object.__new__ in chat-template test helper to silence pylint E1120

34ff372

Skip chat-template tests when lm_eval is not installed

cb21551

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Copy of PR #2462#2468

Copy of PR #2462#2468
shaahji wants to merge 5 commits into
mainfrom
ykhrustalev/lmeval-ort-chat-template

shaahji commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shaahji commented May 20, 2026

Describe your changes

Checklist before requesting a review

(Optional) Issue link

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants