Copy of PR #2462#2468
Draft
shaahji wants to merge 5 commits into
Draft
Conversation
lm-eval's `simple_evaluate(..., apply_chat_template=True)` requires the
underlying LM class to implement `tokenizer_name` and `apply_chat_template`.
The HFLM backend has both; the ORT GenAI backend does not, so any attempt
to evaluate a chat-tuned ONNX model with chat-formatted prompts raises
`NotImplementedError: To use this model with chat templates, please
implement the 'tokenizer_name' property.`
This adds the two members with the minimum surface area:
- `tokenizer_name` returns the model path (for lm-eval's chat-aware
result caching), matching the HFLM convention of slash-replacement.
- `apply_chat_template` defers to the model's HF tokenizer via
`AutoTokenizer.apply_chat_template`, mirroring HFLM's
implementation.
The HF tokenizer is loaded once at `__init__` purely for chat-template
rendering; token-level encode/decode still goes through `og.Tokenizer`
and the runtime, so there is no change to generation behavior or any
existing code path.
Verified end-to-end on LFM2.5-350M (int4, k_quant_mixed) MBPP:
without chat-template hooks the eval raised at task start; with them
plus `num_fewshot=0` and a chat-friendly stop list, pass@1 went from
0.0/500 to 67/500 (13.4%) -- the original 0.0 was a prompt-format
artifact (instruct model + completion-style few-shot), not a
conversion regression.
… key, tests
- Lazy-load the HF tokenizer on the first ``apply_chat_template`` call rather
than at ``__init__``. Callers that never enable chat templating no longer
need HF tokenizer files (``tokenizer_config.json`` etc.) in the model
directory; eager loading would have regressed those workflows.
- ``tokenizer_name`` now replaces both POSIX and Windows path separators with
``__`` so the lm-eval cache identifier is stable across platforms. The
previous implementation only handled forward slashes, leaving backslashes
in the key on Windows because ``str(Path(...))`` preserves the native
separator.
- Add unit tests for both behaviours:
- ``tokenizer_name`` parametrised over POSIX, relative, and Windows-style
paths to lock in the normalisation contract.
- ``apply_chat_template`` verified to (a) not load the HF tokenizer at
construction, (b) load once on first call, and (c) reuse the cached
tokenizer on subsequent calls. ``AutoTokenizer`` is patched so the
tests run without any HF tokenizer files on disk.
All four new tests pass; ``test_olive_evaluator.py`` as a whole stays green
(85 passed). ``lintrunner`` reports no new warnings on the changed files.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe your changes
Checklist before requesting a review
lintrunner -a(Optional) Issue link