[WIP] deepseek_ocr_2 exp by hengtaoguo · Pull Request #4352 · AI-Hypercomputer/maxtext

hengtaoguo · 2026-07-05T00:27:54Z

Description

Start with a short description of what the PR does and how this is a change from
the past.

The rest of the description includes relevant details and context, examples:

why is this change being made,
the problem being solved and any relevant context,
why this is a good solution,
some information about the specific implementation,
shortcomings of the solution and possible future improvements.

If the change fixes a bug or a Github issue, please include a link, e.g.,:
FIXES: b/123456
FIXES: #123456

You can also provide a comma-separated list. If you don't want to close a bug but
simply to reference it, use BUGS, e.g.:
BUGS: b/123456

Notice 1: Once all tests pass, the "pull ready" label will automatically be assigned.
This label is used for administrative purposes. Please do not add it manually.

Notice 2: For external contributions, our settings currently require an approval from a MaxText maintainer to trigger CI tests.

Tests

# Checkpoint conversion
python -m maxtext.checkpoint_conversion.to_maxtext src/maxtext/configs/base.yml model_name=deepseek_ocr_2 tokenizer_type=huggingface hf_trust_remote_code=false base_output_directory=gs://hengtaoguo-maxtext-logs/checkpoints/deepseek_ocr_2/unscanned/2026-07-04-11-55 scan_layers=false weight_dtype=bfloat16 attention=dot_product use_multimodal=True hardware=cpu skip_jax_distributed_system=True sparse_matmul=false attention_bias=true checkpoint_storage_use_ocdbt=False checkpoint_storage_use_zarr3=False --eager_load_method=safetensors --lazy_load_tensors=False hf_access_token=xxx

# Decode
python -m maxtext.inference.decode src/maxtext/configs/base.yml model_name=deepseek_ocr_2 tokenizer_path=deepseek-ai/DeepSeek-OCR-2 tokenizer_type=huggingface hf_trust_remote_code=false load_parameters_path=gs://hengtaoguo-maxtext-logs/checkpoints/deepseek_ocr_2/unscanned/2026-07-04-11-55/0/items scan_layers=false use_multimodal=true image_path=/home/hengtaoguo_google_com/projects/maxtext_text.png prompt=\<image\>\\n\<\|grounding\|\>Convert\ the\ document\ to\ markdown.\  max_prefill_predict_length=1200 max_target_length=1400 weight_dtype=bfloat16 checkpoint_storage_use_ocdbt=False checkpoint_storage_use_zarr3=False attention=dot_product skip_jax_distributed_system=True sparse_matmul=false attention_bias=true hf_access_token=xxx

# Result
Input `<image>
<|grounding|>Convert the document to markdown.` -> `text[[5, 30, 970, 199]]
MaxText is a high performance, highly scalable, open-source LLM library and reference implementation written in pure Python/JAX and targeting Google Cloud TPUs and GPUs for training.

text[[4, 364, 987, 978]]
MaxText provides a library of high performance models to choose from, including Gemma, Llama, DeepSeek, Qwen, and Mistral. For each of these models, MaxText supports pre-training (up to tens of thousands of chips) and scalable post-training, with popular techniques like Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO, a type of Reinforcement Learning) and Group Sequence Policy Optimization (GSPO, a type of Reinforcement Learning). 1.0**1.0**00.0**1.0**00.0**1.`

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-07-05T00:38:06Z

Codecov Report

❌ Patch coverage is 12.60946% with 499 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/maxtext/models/deepseek_ocr.py	14.78%	219 Missing ⚠️
src/maxtext/multimodal/processor_deepseek_ocr.py	16.57%	146 Missing ⚠️
...xtext/checkpoint_conversion/utils/param_mapping.py	2.06%	95 Missing ⚠️
src/maxtext/multimodal/processor.py	0.00%	22 Missing ⚠️
src/maxtext/layers/encoders.py	0.00%	7 Missing ⚠️
src/maxtext/multimodal/utils.py	0.00%	4 Missing ⚠️
src/maxtext/models/deepseek.py	25.00%	3 Missing ⚠️
src/maxtext/checkpoint_conversion/utils/utils.py	0.00%	2 Missing ⚠️
src/maxtext/checkpoint_conversion/to_maxtext.py	0.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

update

2409fab

hengtaoguo force-pushed the hengtaoguo-ocr branch from d345a33 to 2409fab Compare July 5, 2026 00:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] deepseek_ocr_2 exp#4352

[WIP] deepseek_ocr_2 exp#4352
hengtaoguo wants to merge 1 commit into
mainfrom
hengtaoguo-ocr

hengtaoguo commented Jul 5, 2026

Uh oh!

codecov Bot commented Jul 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

hengtaoguo commented Jul 5, 2026

Description

Tests

Checklist

Uh oh!

codecov Bot commented Jul 5, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant