Skip to content

Hugging Face Model integration in Superbench#803

Open
Aishwarya-Tonpe wants to merge 5 commits intomainfrom
hf-models-clean
Open

Hugging Face Model integration in Superbench#803
Aishwarya-Tonpe wants to merge 5 commits intomainfrom
hf-models-clean

Conversation

@Aishwarya-Tonpe
Copy link
Copy Markdown
Contributor

@Aishwarya-Tonpe Aishwarya-Tonpe commented Apr 13, 2026

Adds support for loading and benchmarking models from HuggingFace Hub across Inference micro-benchmarks -ORT/TensorRT inference. Users can run any compatible HF-hosted model through the existing benchmark harness using --model_source huggingface --model_identifier <org/model>.

SuperBench previously only supported in-house model definitions with hardcoded architectures. Adding new models required code changes. This PR allows benchmarking any compatible HuggingFace model with a CLI flag change, including gated models via HF_TOKEN.

Key Changes

New modules:

  • HuggingFaceModelLoader — Downloads, caches, and loads models from HF Hub. Estimates parameter count from model config (few KB) and checks GPU
    memory before downloading full weights to avoid failed multi-GB downloads.

  • ModelSourceConfig — Dataclass for model source configuration (in-house / huggingface), dtype, revision, auth token, and device mapping.

    Micro-benchmarks (inference):

  • ORT inference — Downloads HF model → exports to ONNX → runs ORT inference. Handles both vision (pixel_values) and NLP (input_ids) inputs
    automatically.

  • TensorRT inference — Same flow: download → ONNX export → trtexec engine build → inference. Includes dynamic input shape detection from the
    exported ONNX graph.

  • ONNX exporter — New export_huggingface_model() method with vision/NLP auto-detection, dynamic axes, and external data support for large models
    (>2GB).

Testing

  • test_model_source_config.py — Unit tests for validation, defaults, and edge cases.
  • test_huggingface_loader.py — Unit tests for dtype conversion, model size calculation, memory estimation, and param count estimation.
  • test_huggingface_e2e.py — End-to-end integration tests covering micro-benchmarks with real HF models.

Usage

Training benchmark

ORT inference
python examples/benchmarks/ort_inference_performance.py
--model_source huggingface --model_identifier bert-base-uncased

TensorRT inference
python examples/benchmarks/tensorrt_inference_performance.py
--model_source huggingface --model_identifier microsoft/resnet-50

Gated models
export HF_TOKEN=hf_xxxxx

@Aishwarya-Tonpe Aishwarya-Tonpe requested a review from a team as a code owner April 13, 2026 17:36
Copilot AI review requested due to automatic review settings April 13, 2026 17:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds HuggingFace Hub as a first-class model source across SuperBench training benchmarks and ORT/TensorRT inference micro-benchmarks, enabling users to benchmark arbitrary HF models via CLI flags (including gated models via HF_TOKEN).

Changes:

  • Introduces ModelSourceConfig and HuggingFaceModelLoader for unified HF model configuration/loading and memory-fit checks.
  • Extends PyTorch model benchmarks to optionally load HF backbones and wrap them with task-specific heads.
  • Adds HF→ONNX export support and integrates HF flows into ORT and TensorRT inference micro-benchmarks, plus new tests and examples.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/benchmarks/micro_benchmarks/test_model_source_config.py Adds unit tests for ModelSourceConfig validation/defaulting.
tests/benchmarks/micro_benchmarks/test_huggingface_loader.py Adds unit tests for HF loader dtype handling, load flow, and size estimation.
tests/benchmarks/micro_benchmarks/test_huggingface_e2e.py Adds integration tests that download real HF models and validate basic forward pass.
superbench/benchmarks/model_benchmarks/pytorch_mixtral_impl.py Adds HF config customization + wrapper and HF-loading branch for Mixtral benchmark.
superbench/benchmarks/model_benchmarks/pytorch_lstm.py Adds HF-loading path + wrapper and refactors in-house model creation.
superbench/benchmarks/model_benchmarks/pytorch_llama.py Adds HF-loading path + wrapper and refactors in-house model creation.
superbench/benchmarks/model_benchmarks/pytorch_gpt2.py Adds HF-loading path + wrapper and refactors in-house model creation.
superbench/benchmarks/model_benchmarks/pytorch_cnn.py Adds HF-loading path + wrapper for HF vision backbones, keeps in-house torchvision path.
superbench/benchmarks/model_benchmarks/pytorch_bert.py Adds HF-loading path + wrapper and refactors in-house model creation.
superbench/benchmarks/model_benchmarks/pytorch_base.py Adds shared HF model loading flow, memory estimation, and CLI args for model source/identifier.
superbench/benchmarks/micro_benchmarks/tensorrt_inference_performance.py Adds HF model preprocessing: config-only memory check, HF load, ONNX export, TRT build command.
superbench/benchmarks/micro_benchmarks/ort_inference_performance.py Adds HF preprocessing (config memory check, HF load, ONNX export/quantize) + dynamic input handling.
superbench/benchmarks/micro_benchmarks/model_source_config.py New dataclass encapsulating model source, identifier, dtype, token, and loader kwargs.
superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py New loader for HF Hub with tokenizer support, size/memory estimation utilities, and pre-checks.
superbench/benchmarks/micro_benchmarks/_export_torch_to_onnx.py Adds HF model ONNX export with vision/NLP detection, dynamic axes, and optional external data output.
examples/benchmarks/tensorrt_inference_performance.py Updates example script to show in-house vs HF usage via CLI.
examples/benchmarks/pytorch_huggingface_models.py New example demonstrating HF-backed training benchmarks, incl. distributed option.
examples/benchmarks/ort_inference_performance.py Updates ORT example script to show in-house vs HF usage via CLI.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +130 to +136
logger.info(f'Loading HuggingFace model: {model_config.identifier}')

# Step 1: Download config only (few KB) to estimate memory
hf_token = os.environ.get('HF_TOKEN') or os.environ.get('HUGGING_FACE_HUB_TOKEN')
load_kwargs = {}
if hf_token:
load_kwargs['token'] = hf_token
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os is used in _create_huggingface_model() but is not imported in this file (based on the shown diff). This will raise NameError at runtime. Add a module-level import os in pytorch_base.py.

Copilot uses AI. Check for mistakes.

def test_missing_identifier(self):
"""Test missing identifier raises error."""
with pytest.raises(ValueError, match='identifier must be provided'):
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test’s regex does not match the actual error message raised by ModelSourceConfig.__post_init__() ('Model identifier must be provided.'). Update the match= pattern (e.g., to 'Model identifier must be provided' or a case-insensitive regex) so the test reflects the real behavior.

Suggested change
with pytest.raises(ValueError, match='identifier must be provided'):
with pytest.raises(ValueError, match='Model identifier must be provided'):

Copilot uses AI. Check for mistakes.
do_constant_folding=True,
input_names=input_names,
output_names=['output'],
dynamic_axes=dynamic_axes,
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For models >2GB, exporting without enabling external-data at export time can fail due to protobuf size limits (the subsequent convert_model_to_external_data() may never run). Pass the appropriate export-time option (e.g., use_external_data_format=use_external_data) so large-model exports succeed reliably.

Suggested change
dynamic_axes=dynamic_axes,
dynamic_axes=dynamic_axes,
use_external_data_format=use_external_data,

Copilot uses AI. Check for mistakes.
Comment on lines +324 to 346

# Get input names from the ONNX session to determine input format
input_names = [input.name for input in ort_sess.get_inputs()]

# Determine input format based on what the model expects
if 'pixel_values' in input_names:
# Vision model: use pixel_values (batch_size, 3, 224, 224)
pixel_values = np.random.randn(self._args.batch_size, 3, 224, 224).astype(dtype=precision)
inputs = {'pixel_values': pixel_values}
elif 'input_ids' in input_names:
# NLP model: use input_ids and attention_mask
seq_len = getattr(self._args, 'seq_length', 512)
input_ids = np.random.randint(0, 30000, (self._args.batch_size, seq_len)).astype(np.int64)
attention_mask = np.ones((self._args.batch_size, seq_len), dtype=np.int64)
inputs = {
'input_ids': input_ids,
'attention_mask': attention_mask
}
else:
# Default for in-house torchvision models: use 'input' (batch_size, 3, 224, 224)
input_tensor = np.random.randn(self._args.batch_size, 3, 224, 224).astype(dtype=precision)
inputs = {'input': input_tensor}

Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For many HF-exported NLP models the ONNX graph may require additional inputs beyond input_ids and attention_mask (e.g., token_type_ids, position_ids, sometimes past-key-values). As written, ort_sess.run() will fail with a missing-input error for those models. Build the inputs dict by iterating ort_sess.get_inputs() and generating a tensor for every required input name (using name/type heuristics), rather than hardcoding only two inputs.

Suggested change
# Get input names from the ONNX session to determine input format
input_names = [input.name for input in ort_sess.get_inputs()]
# Determine input format based on what the model expects
if 'pixel_values' in input_names:
# Vision model: use pixel_values (batch_size, 3, 224, 224)
pixel_values = np.random.randn(self._args.batch_size, 3, 224, 224).astype(dtype=precision)
inputs = {'pixel_values': pixel_values}
elif 'input_ids' in input_names:
# NLP model: use input_ids and attention_mask
seq_len = getattr(self._args, 'seq_length', 512)
input_ids = np.random.randint(0, 30000, (self._args.batch_size, seq_len)).astype(np.int64)
attention_mask = np.ones((self._args.batch_size, seq_len), dtype=np.int64)
inputs = {
'input_ids': input_ids,
'attention_mask': attention_mask
}
else:
# Default for in-house torchvision models: use 'input' (batch_size, 3, 224, 224)
input_tensor = np.random.randn(self._args.batch_size, 3, 224, 224).astype(dtype=precision)
inputs = {'input': input_tensor}
batch_size = self._args.batch_size
seq_len = getattr(self._args, 'seq_length', 512)
def _onnx_type_to_numpy_dtype(onnx_type):
dtype_map = {
'tensor(float16)': np.float16,
'tensor(float)': np.float32,
'tensor(double)': np.float64,
'tensor(int64)': np.int64,
'tensor(int32)': np.int32,
'tensor(int16)': np.int16,
'tensor(int8)': np.int8,
'tensor(uint64)': np.uint64,
'tensor(uint32)': np.uint32,
'tensor(uint16)': np.uint16,
'tensor(uint8)': np.uint8,
'tensor(bool)': np.bool_,
}
return dtype_map.get(onnx_type, precision)
def _resolve_shape(name, shape):
if not shape:
return ()
resolved_shape = []
rank = len(shape)
lower_name = name.lower()
for axis, dim in enumerate(shape):
if isinstance(dim, int) and dim > 0:
resolved_shape.append(dim)
continue
if axis == 0:
resolved_shape.append(batch_size)
elif 'pixel_values' in lower_name or (lower_name == 'input' and rank == 4):
if axis == 1:
resolved_shape.append(3)
else:
resolved_shape.append(224)
elif 'past' in lower_name or 'key_values' in lower_name:
resolved_shape.append(seq_len if axis >= rank - 2 else 1)
elif axis == 1:
resolved_shape.append(seq_len)
else:
resolved_shape.append(1)
return tuple(resolved_shape)
def _generate_input_tensor(ort_input):
name = ort_input.name
lower_name = name.lower()
dtype = _onnx_type_to_numpy_dtype(ort_input.type)
shape = _resolve_shape(name, ort_input.shape)
rank = len(shape)
if lower_name == 'input_ids':
return np.random.randint(0, 30000, size=shape, dtype=np.int64)
if lower_name == 'attention_mask':
return np.ones(shape, dtype=np.int64)
if lower_name == 'token_type_ids':
return np.zeros(shape, dtype=np.int64)
if lower_name == 'position_ids':
if rank >= 2:
positions = np.arange(shape[1], dtype=np.int64)
return np.broadcast_to(positions, shape).copy()
return np.arange(shape[0], dtype=np.int64)
if 'pixel_values' in lower_name or (lower_name == 'input' and rank == 4):
return np.random.randn(*shape).astype(dtype=dtype)
if 'past' in lower_name or 'key_values' in lower_name:
return np.zeros(shape, dtype=dtype)
if dtype == np.bool_:
return np.ones(shape, dtype=np.bool_)
if np.issubdtype(dtype, np.integer):
return np.zeros(shape, dtype=dtype)
return np.random.randn(*shape).astype(dtype=dtype)
inputs = {}
for ort_input in ort_sess.get_inputs():
inputs[ort_input.name] = _generate_input_tensor(ort_input)

Copilot uses AI. Check for mistakes.
@Aishwarya-Tonpe Aishwarya-Tonpe changed the title Hf models clean Hugging Face Model integration in Superbench Apr 14, 2026
Copilot AI review requested due to automatic review settings April 14, 2026 17:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 13 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +347 to +359
dynamic_axes = {
'input_ids': {
0: 'batch_size',
1: 'seq_length'
},
'attention_mask': {
0: 'batch_size',
1: 'seq_length'
},
'output': {
0: 'batch_size'
},
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For many NLP models the exported output shape is sequence-dependent (e.g., logits/hidden states often have a seq_length dimension). Currently only the batch dimension is marked dynamic for output, which can lock the exported ONNX to a fixed seq_length and break dynamic-shape inference/engine building. Consider adding the sequence dimension to output’s dynamic_axes when the model output is 3D (batch, seq, hidden/vocab).

Copilot uses AI. Check for mistakes.
Comment on lines +174 to +177
# Get GPU rank to create unique file paths and avoid race conditions
# when multiple processes export the same model simultaneously
gpu_rank = os.getenv('CUDA_VISIBLE_DEVICES', '0')
proc_rank = os.getenv('PROC_RANK', gpu_rank)
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA_VISIBLE_DEVICES is not a stable per-process rank (it can be a comma-separated list like 0,1), so using it for per-process output directories can cause collisions or odd directory names. Prefer LOCAL_RANK/RANK (torchrun) or MPI local-rank env vars when available; fall back to PID if no rank is set.

Suggested change
# Get GPU rank to create unique file paths and avoid race conditions
# when multiple processes export the same model simultaneously
gpu_rank = os.getenv('CUDA_VISIBLE_DEVICES', '0')
proc_rank = os.getenv('PROC_RANK', gpu_rank)
# Get a stable per-process rank to create unique file paths and avoid
# race conditions when multiple processes export the same model
# simultaneously. Do not use CUDA_VISIBLE_DEVICES here because it may
# be a comma-separated device list (for example, "0,1") rather than a
# unique per-process rank.
proc_rank = next(
(
os.getenv(env_name) for env_name in (
'PROC_RANK',
'LOCAL_RANK',
'OMPI_COMM_WORLD_LOCAL_RANK',
'MPI_LOCALRANKID',
'SLURM_LOCALID',
'RANK',
) if os.getenv(env_name) is not None
),
str(os.getpid()),
)

Copilot uses AI. Check for mistakes.
Comment on lines +221 to +233
# Get the first input to determine shape and name
input_name = onnx_model.graph.input[0].name

# Vision models typically have 4D input (batch, channels, height, width)
# NLP models typically have 2D input (batch, sequence)
if input_name == 'pixel_values' or len(onnx_model.graph.input[0].type.tensor_type.shape.dim) == 4:
# Vision model: batch x channels x height x width
input_shapes = f'{input_name}:{self._args.batch_size}x3x224x224'
else:
# NLP model: batch x sequence - need to specify all inputs with same batch and seq length
seq_len = getattr(self._args, 'seq_length', 512)
shapes_list = []
for inp in onnx_model.graph.input:
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ONNX graph.input may include initializers/weights (depending on how the model was saved), so graph.input[0] is not guaranteed to be a real runtime input tensor. This can cause incorrect shape detection and invalid --optShapes. Consider filtering out inputs whose names appear in graph.initializer (and/or using the exporter’s known input names) before selecting the first real input.

Suggested change
# Get the first input to determine shape and name
input_name = onnx_model.graph.input[0].name
# Vision models typically have 4D input (batch, channels, height, width)
# NLP models typically have 2D input (batch, sequence)
if input_name == 'pixel_values' or len(onnx_model.graph.input[0].type.tensor_type.shape.dim) == 4:
# Vision model: batch x channels x height x width
input_shapes = f'{input_name}:{self._args.batch_size}x3x224x224'
else:
# NLP model: batch x sequence - need to specify all inputs with same batch and seq length
seq_len = getattr(self._args, 'seq_length', 512)
shapes_list = []
for inp in onnx_model.graph.input:
# Filter out initializer-backed graph inputs; ONNX graph.input may include weights/constants.
initializer_names = {initializer.name for initializer in onnx_model.graph.initializer}
runtime_inputs = [inp for inp in onnx_model.graph.input if inp.name not in initializer_names]
if not runtime_inputs:
logger.error(f'No runtime inputs found in exported ONNX model: {onnx_path}')
return False
# Get the first real runtime input to determine shape and name
first_input = runtime_inputs[0]
input_name = first_input.name
# Vision models typically have 4D input (batch, channels, height, width)
# NLP models typically have 2D input (batch, sequence)
if input_name == 'pixel_values' or len(first_input.type.tensor_type.shape.dim) == 4:
# Vision model: batch x channels x height x width
input_shapes = f'{input_name}:{self._args.batch_size}x3x224x224'
else:
# NLP model: batch x sequence - need to specify all inputs with same batch and seq length
seq_len = getattr(self._args, 'seq_length', 512)
shapes_list = []
for inp in runtime_inputs:

Copilot uses AI. Check for mistakes.
choices=['in-house', 'huggingface'],
default='in-house',
required=False,
help='Source of the model: inhouse (default) or huggingface.',
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The help text says inhouse but the CLI choice/value is in-house. Align the help text with the actual accepted value to avoid confusing users.

Suggested change
help='Source of the model: inhouse (default) or huggingface.',
help='Source of the model: in-house (default) or huggingface.',

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 10 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

device_map: Optional[str] = None,
config: Optional[PretrainedConfig] = None,
**kwargs
) -> Tuple[PreTrainedModel, PretrainedConfig, AutoTokenizer]:
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tokenizer can be None when tokenizer loading fails, but the return type annotation claims it is always AutoTokenizer. Update the return type to reflect optionality (e.g., Optional[...]) so callers and type-checking/tests don't rely on a tokenizer always being present.

Suggested change
) -> Tuple[PreTrainedModel, PretrainedConfig, AutoTokenizer]:
) -> Tuple[PreTrainedModel, PretrainedConfig, Optional[AutoTokenizer]]:

Copilot uses AI. Check for mistakes.
Comment on lines +120 to +125
tokenizer = None
try:
logger.info('Loading tokenizer...')
tokenizer = AutoTokenizer.from_pretrained(model_identifier, trust_remote_code=True, **load_kwargs)
except Exception as e:
logger.warning(f'Could not load tokenizer: {e}. Continuing without tokenizer.')
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tokenizer can be None when tokenizer loading fails, but the return type annotation claims it is always AutoTokenizer. Update the return type to reflect optionality (e.g., Optional[...]) so callers and type-checking/tests don't rely on a tokenizer always being present.

Copilot uses AI. Check for mistakes.
f'({self._get_model_size(model):.2f}M parameters)'
)

return model, config, tokenizer
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tokenizer can be None when tokenizer loading fails, but the return type annotation claims it is always AutoTokenizer. Update the return type to reflect optionality (e.g., Optional[...]) so callers and type-checking/tests don't rely on a tokenizer always being present.

Copilot uses AI. Check for mistakes.
Comment on lines +56 to +57
if not self.identifier:
raise ValueError('Model identifier must be provided.')
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message likely breaks the newly added unit test that expects the message to match identifier must be provided (case-sensitive substring match). Either adjust the test expectation or change the raised message to match the intended contract; keeping the message stable/consistent is preferable since it becomes a public-ish validation surface.

Copilot uses AI. Check for mistakes.
Comment on lines +48 to +53
raise ValueError(f"Invalid model source '{self.source}'.Must be 'in-house' or 'huggingface'.")

# Validate torch_dtype
valid_dtypes = ['float32', 'float16', 'bfloat16', 'int8']
if self.torch_dtype not in valid_dtypes:
raise ValueError(f"Invalid torch_dtype '{self.torch_dtype}'.Must be one of {valid_dtypes}.")
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both error strings are missing a space after the period ('.Must'). Please add the missing space so the messages are readable and consistent with other validation errors.

Suggested change
raise ValueError(f"Invalid model source '{self.source}'.Must be 'in-house' or 'huggingface'.")
# Validate torch_dtype
valid_dtypes = ['float32', 'float16', 'bfloat16', 'int8']
if self.torch_dtype not in valid_dtypes:
raise ValueError(f"Invalid torch_dtype '{self.torch_dtype}'.Must be one of {valid_dtypes}.")
raise ValueError(f"Invalid model source '{self.source}'. Must be 'in-house' or 'huggingface'.")
# Validate torch_dtype
valid_dtypes = ['float32', 'float16', 'bfloat16', 'int8']
if self.torch_dtype not in valid_dtypes:
raise ValueError(f"Invalid torch_dtype '{self.torch_dtype}'. Must be one of {valid_dtypes}.")

Copilot uses AI. Check for mistakes.
Comment on lines +48 to +53
raise ValueError(f"Invalid model source '{self.source}'.Must be 'in-house' or 'huggingface'.")

# Validate torch_dtype
valid_dtypes = ['float32', 'float16', 'bfloat16', 'int8']
if self.torch_dtype not in valid_dtypes:
raise ValueError(f"Invalid torch_dtype '{self.torch_dtype}'.Must be one of {valid_dtypes}.")
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both error strings are missing a space after the period ('.Must'). Please add the missing space so the messages are readable and consistent with other validation errors.

Suggested change
raise ValueError(f"Invalid model source '{self.source}'.Must be 'in-house' or 'huggingface'.")
# Validate torch_dtype
valid_dtypes = ['float32', 'float16', 'bfloat16', 'int8']
if self.torch_dtype not in valid_dtypes:
raise ValueError(f"Invalid torch_dtype '{self.torch_dtype}'.Must be one of {valid_dtypes}.")
raise ValueError(f"Invalid model source '{self.source}'. Must be 'in-house' or 'huggingface'.")
# Validate torch_dtype
valid_dtypes = ['float32', 'float16', 'bfloat16', 'int8']
if self.torch_dtype not in valid_dtypes:
raise ValueError(f"Invalid torch_dtype '{self.torch_dtype}'. Must be one of {valid_dtypes}.")

Copilot uses AI. Check for mistakes.
choices=['in-house', 'huggingface'],
default='in-house',
required=False,
help='Source of the model: inhouse (default) or huggingface.',
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as ORT: the help text references inhouse while the actual choice is in-house. Align the wording with the parser choices.

Suggested change
help='Source of the model: inhouse (default) or huggingface.',
help='Source of the model: in-house (default) or huggingface.',

Copilot uses AI. Check for mistakes.
Comment on lines +201 to +202
output_dir = f'/tmp/tensorrt_onnx_rank_{proc_rank}'
os.makedirs(output_dir, exist_ok=True)
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard-coding exports to /tmp can be problematic in containerized/locked-down environments (noexec, limited disk, or different temp roots) and makes cleanup harder. Prefer using an existing benchmark cache/output directory if available in this benchmark (similar to ORT’s __model_cache_path) or tempfile.mkdtemp() under a configurable base directory.

Copilot uses AI. Check for mistakes.
del dummy_input
torch.cuda.empty_cache()
return file_name

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new export_huggingface_model() introduces multiple branches (vision vs NLP, dynamic axes behavior, and external-data conversion for >2GB) but there’s no targeted unit test coverage shown for this method. Consider adding mocked/unit tests that validate: (1) correct input/output names for vision and NLP paths, and (2) external-data conversion is invoked when the size threshold is exceeded (can be done by mocking parameter sizing and ONNX helpers).

Suggested change
_ONNX_EXTERNAL_DATA_THRESHOLD_BYTES = 2 * 1024 * 1024 * 1024
def _get_model_parameter_size_bytes(self, model):
"""Return the total serialized parameter size in bytes for a model.
This helper is intentionally isolated so unit tests can mock parameter
shapes/sizes without performing a real ONNX export.
Args:
model: Model instance exposing ``parameters()``.
Returns:
int: Total parameter size in bytes.
"""
total_size = 0
for parameter in model.parameters():
total_size += parameter.nelement() * parameter.element_size()
return total_size
def _should_use_external_data_format(self, model):
"""Return whether ONNX external-data format should be used.
Args:
model: Model instance exposing ``parameters()``.
Returns:
bool: True when the model size exceeds the ONNX 2GB threshold.
"""
return self._get_model_parameter_size_bytes(model) > self._ONNX_EXTERNAL_DATA_THRESHOLD_BYTES
def _build_huggingface_export_config(self, model, batch_size=1, seq_length=512):
"""Build dummy input and ONNX I/O metadata for HuggingFace export.
This helper extracts the vision-vs-NLP branch logic into a directly
testable unit so mocked tests can validate input/output names and
dynamic axes without requiring a full export.
Args:
model: HuggingFace model instance to export.
batch_size (int): Batch size of input. Defaults to 1.
seq_length (int): Sequence length of input. Defaults to 512.
Returns:
tuple: (dummy_input, input_names, output_names, dynamic_axes)
"""
config = getattr(model, 'config', None)
model_type = getattr(config, 'model_type', '')
# Vision models typically consume pixel_values with NCHW layout.
if model_type in ('vit', 'swin', 'convnext', 'beit', 'deit', 'resnet', 'detr'):
dummy_input = torch.randn((batch_size, 3, 224, 224), device='cuda')
input_names = ['pixel_values']
output_names = ['output']
dynamic_axes = {
'pixel_values': {
0: 'batch_size',
},
'output': {
0: 'batch_size',
}
}
return dummy_input, input_names, output_names, dynamic_axes
# Default HuggingFace NLP-style export.
dummy_input = torch.ones((batch_size, seq_length), dtype=torch.int64, device='cuda')
input_names = ['input_ids']
output_names = ['output']
dynamic_axes = {
'input_ids': {
0: 'batch_size',
1: 'seq_length',
},
'output': {
0: 'batch_size',
}
}
return dummy_input, input_names, output_names, dynamic_axes

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 14, 2026 20:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +132 to +153
# Handle device mapping for large models
if device_map:
model_kwargs['device_map'] = device_map
elif device == 'cuda' and torch.cuda.is_available():
# Don't set device_map if device is explicitly cuda
pass
elif device != 'cpu':
model_kwargs['device_map'] = device

# Pass pre-downloaded config to from_pretrained so any overrides take effect
if config is not None:
model_kwargs['config'] = config

try:
model = AutoModel.from_pretrained(model_identifier, **model_kwargs)
except ValueError:
logger.info('AutoModel failed, trying AutoModelForCausalLM...')
model = AutoModelForCausalLM.from_pretrained(model_identifier, **model_kwargs)

# Move to device if not using device_map
if not device_map and device != 'auto':
model = model.to(device)
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The decision to call model.to(device) is based on the argument device_map, but model_kwargs['device_map'] can be set even when device_map (the arg) is None (e.g., when device != 'cpu' and CUDA is unavailable). In that case, from_pretrained(..., device_map=...) returns a dispatched model and calling .to(...) can error. Track the effective device_map used (e.g., effective_device_map = model_kwargs.get('device_map')) and only call .to(device) when no device_map was actually passed.

Copilot uses AI. Check for mistakes.
Comment on lines +224 to +239
ValueError: If dtype string is invalid.
"""
dtype_map = {
'float32': torch.float32,
'float16': torch.float16,
'bfloat16': torch.bfloat16,
'int8': torch.int8,
'fp32': torch.float32,
'fp16': torch.float16,
'bf16': torch.bfloat16,
}

if dtype_str.lower() not in dtype_map:
raise ValueError(f"Invalid dtype '{dtype_str}'.Must be one of {list(dtype_map.keys())}")

return dtype_map[dtype_str.lower()]
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing torch_dtype='int8' and mapping it to torch.int8 is misleading: from_pretrained(..., torch_dtype=torch.int8) generally isn't a supported way to load int8 weights for Transformers (int8 inference typically requires dedicated quantization flows/backends). Consider rejecting int8 in _get_torch_dtype (or in ModelSourceConfig) and reserving int8 for post-export quantization (as you already do for ORT), or implement a supported HF quantization path explicitly.

Suggested change
ValueError: If dtype string is invalid.
"""
dtype_map = {
'float32': torch.float32,
'float16': torch.float16,
'bfloat16': torch.bfloat16,
'int8': torch.int8,
'fp32': torch.float32,
'fp16': torch.float16,
'bf16': torch.bfloat16,
}
if dtype_str.lower() not in dtype_map:
raise ValueError(f"Invalid dtype '{dtype_str}'.Must be one of {list(dtype_map.keys())}")
return dtype_map[dtype_str.lower()]
ValueError: If dtype string is invalid or unsupported for standard HF loading.
"""
normalized_dtype = dtype_str.lower()
if normalized_dtype == 'int8':
raise ValueError(
"Unsupported dtype 'int8' for Hugging Face model loading via torch_dtype. "
'Use a dedicated quantization/loading path for int8 models or apply int8 quantization '
'after export.'
)
dtype_map = {
'float32': torch.float32,
'float16': torch.float16,
'bfloat16': torch.bfloat16,
'fp32': torch.float32,
'fp16': torch.float16,
'bf16': torch.bfloat16,
}
if normalized_dtype not in dtype_map:
raise ValueError(f"Invalid dtype '{dtype_str}'.Must be one of {list(dtype_map.keys())}")
return dtype_map[normalized_dtype]

Copilot uses AI. Check for mistakes.
Comment on lines +377 to +393
# Export to ONNX for large models (>2GB), use external data format
model_size_gb = sum(p.numel() * p.element_size() for p in model.parameters()) / (1024**3)
use_external_data = model_size_gb > 2.0

if use_external_data:
logger.info(f'Model size is {model_size_gb:.2f}GB, using external data format for ONNX export')

torch.onnx.export(
wrapped_model,
export_args,
file_name,
opset_version=14,
do_constant_folding=True,
input_names=input_names,
output_names=['output'],
dynamic_axes=dynamic_axes,
)
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For models larger than ~2GB, torch.onnx.export(...) may fail before the later convert_model_to_external_data(...) step due to protobuf size limits, because the export itself still attempts to serialize initializers into the main ONNX file. For large-model support to be reliable, enable PyTorch's large/external-data export mode at export time (e.g., using the appropriate large_model / use_external_data_format option supported by your PyTorch version) rather than only converting after the fact.

Copilot uses AI. Check for mistakes.
Comment on lines +223 to +228
# Vision models typically have 4D input (batch, channels, height, width)
# NLP models typically have 2D input (batch, sequence)
if input_name == 'pixel_values' or len(onnx_model.graph.input[0].type.tensor_type.shape.dim) == 4:
# Vision model: batch x channels x height x width
input_shapes = f'{input_name}:{self._args.batch_size}x3x224x224'
else:
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard-coding 3x224x224 will produce incorrect shapes for many vision models (e.g., models trained/evaluated at 384px, grayscale, or non-3-channel inputs). Since you're already inspecting the ONNX graph, prefer deriving H/W/C from the declared input shape when static, or (when dynamic/unknown) using model/config metadata (e.g., image_size, num_channels) with sensible defaults.

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +22
@pytest.fixture
def loader(self):
"""Create a loader instance for testing."""
return HuggingFaceModelLoader(cache_dir='/tmp/test_cache', token=None)
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a hard-coded /tmp/... path makes the test non-portable (e.g., Windows runners) and can cause interference across parallel test runs. Prefer tmp_path/tmp_path_factory to generate an isolated cache directory per test.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants