Skip to content

[WIP] deepseek_ocr_2 exp#4352

Draft
hengtaoguo wants to merge 1 commit into
mainfrom
hengtaoguo-ocr
Draft

[WIP] deepseek_ocr_2 exp#4352
hengtaoguo wants to merge 1 commit into
mainfrom
hengtaoguo-ocr

Conversation

@hengtaoguo

Copy link
Copy Markdown
Collaborator

Description

Start with a short description of what the PR does and how this is a change from
the past.

The rest of the description includes relevant details and context, examples:

  • why is this change being made,
  • the problem being solved and any relevant context,
  • why this is a good solution,
  • some information about the specific implementation,
  • shortcomings of the solution and possible future improvements.

If the change fixes a bug or a Github issue, please include a link, e.g.,:
FIXES: b/123456
FIXES: #123456

You can also provide a comma-separated list. If you don't want to close a bug but
simply to reference it, use BUGS, e.g.:
BUGS: b/123456

Notice 1: Once all tests pass, the "pull ready" label will automatically be assigned.
This label is used for administrative purposes. Please do not add it manually.

Notice 2: For external contributions, our settings currently require an approval from a MaxText maintainer to trigger CI tests.

Tests

# Checkpoint conversion
python -m maxtext.checkpoint_conversion.to_maxtext src/maxtext/configs/base.yml model_name=deepseek_ocr_2 tokenizer_type=huggingface hf_trust_remote_code=false base_output_directory=gs://hengtaoguo-maxtext-logs/checkpoints/deepseek_ocr_2/unscanned/2026-07-04-11-55 scan_layers=false weight_dtype=bfloat16 attention=dot_product use_multimodal=True hardware=cpu skip_jax_distributed_system=True sparse_matmul=false attention_bias=true checkpoint_storage_use_ocdbt=False checkpoint_storage_use_zarr3=False --eager_load_method=safetensors --lazy_load_tensors=False hf_access_token=xxx

# Decode
python -m maxtext.inference.decode src/maxtext/configs/base.yml model_name=deepseek_ocr_2 tokenizer_path=deepseek-ai/DeepSeek-OCR-2 tokenizer_type=huggingface hf_trust_remote_code=false load_parameters_path=gs://hengtaoguo-maxtext-logs/checkpoints/deepseek_ocr_2/unscanned/2026-07-04-11-55/0/items scan_layers=false use_multimodal=true image_path=/home/hengtaoguo_google_com/projects/maxtext_text.png prompt=\<image\>\\n\<\|grounding\|\>Convert\ the\ document\ to\ markdown.\  max_prefill_predict_length=1200 max_target_length=1400 weight_dtype=bfloat16 checkpoint_storage_use_ocdbt=False checkpoint_storage_use_zarr3=False attention=dot_product skip_jax_distributed_system=True sparse_matmul=false attention_bias=true hf_access_token=xxx

# Result
Input `<image>
<|grounding|>Convert the document to markdown.` -> `text[[5, 30, 970, 199]]
MaxText is a high performance, highly scalable, open-source LLM library and reference implementation written in pure Python/JAX and targeting Google Cloud TPUs and GPUs for training.

text[[4, 364, 987, 978]]
MaxText provides a library of high performance models to choose from, including Gemma, Llama, DeepSeek, Qwen, and Mistral. For each of these models, MaxText supports pre-training (up to tens of thousands of chips) and scalable post-training, with popular techniques like Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO, a type of Reinforcement Learning) and Group Sequence Policy Optimization (GSPO, a type of Reinforcement Learning). 1.0**1.0**00.0**1.0**00.0**1.`

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant