@@ -203,57 +203,10 @@ await backend.register(model)
203203| ` disk ` | ~ 10-20s | Preserved | Large checkpoints |
204204| ` restart ` | ~ 30-60s | Lost | Single-GPU fallback |
205205
206- ## Known Issues and Workarounds
207206
208- ### 1. DeviceMesh Memory Imbalance Error
209207
210- ** Symptom** : SGLang fails to start with memory imbalance error.
211208
212- ** Solution** : Set environment variable (done automatically by SGLangBackend):
213- ``` bash
214- export SGL_DISABLE_TP_MEMORY_INBALANCE_CHECK=True
215- ```
216-
217- ### 2. update_weights_from_tensor Fails with TP > 1
218-
219- ** Reference** : [ SGLang #3726 ] ( https://github.com/sgl-project/sglang/issues/3726 )
220-
221- ** Solution** : Use ` weight_sync_method="lora" ` or ` "disk" ` instead of tensor sync.
222-
223- ### 3. OOM on Weight Update
224-
225- ** Reference** : [ SGLang #8076 ] ( https://github.com/sgl-project/sglang/issues/8076 )
226-
227- ** Solution** : Use disk-based sync or reduce ` mem_fraction_static ` .
228-
229- ### 4. dp_size Must Be 1 for Weight Updates
230-
231- ** Reference** : [ SGLang #4283 ] ( https://github.com/sgl-project/sglang/issues/4283 )
232-
233- ** Solution** : Don't use data parallelism for inference (use TP instead).
234-
235- ### 5. Garbled Output with Small Tensor Buckets
236-
237- ** Reference** : [ SGLang #14178 ] ( https://github.com/sgl-project/sglang/issues/14178 )
238-
239- ** Solution** : Use LoRA-based sync instead of tensor sync.
240-
241- ## Performance Comparison
242-
243- Based on external benchmarks (H100, Llama 3.1 8B):
244-
245- | Metric | vLLM | SGLang | Improvement |
246- | --------| ------| --------| -------------|
247- | Throughput (tok/s) | ~ 12,500 | ~ 16,200 | ~ 29% |
248- | TTFT (ms) | ~ 45 | ~ 35 | ~ 22% |
249- | P99 Latency (ms) | ~ 120 | ~ 95 | ~ 21% |
250-
251- * Source: [ aimultiple.com benchmark] ( https://aimultiple.com/llm-inference-benchmark ) *
252209
253- The performance advantage comes from:
254- - RadixAttention's automatic prefix caching
255- - Zero-overhead scheduler design
256- - Optimized FlashInfer kernels
257210
258211## Benchmarking Your Setup
259212
0 commit comments