Posts for: #tensor-parallel

用两台 GB10 跑 DeepSeek-V4-Flash：284B 模型的双机部署记录

2026-05-31

#deepseek #vllm #gb10 #dgx-spark #llm-inference #tensor-parallel

用两台 DGX Spark（GB10）部署 DeepSeek-V4-Flash（284B/13B-active，官方 FP8）的实践记录：为什么 128GB 单机装不下 149GB 权重、如何为 GB10 的 sm_121 架构选对 vLLM 引擎、源码构建中 torch 被悄悄降级的隐蔽问题，以及 MTP 调优后的实际吞吐。

[Read more]

vLLM TP=2 跨节点部署实践：两台 DGX Spark 跑 Qwen3.5-35B-A3B

2026-04-12

#AI #LLM #NVIDIA #DGX Spark #vLLM #Tensor Parallel

记录首次在两台 DGX Spark 上以 vLLM TP=2 方式部署 Qwen3.5-35B-A3B 的过程与 benchmark 结果。

[Read more]