Supported Hardware
Ascend NPUs
Cambricon MLUs
Iluvatar CoreX GPUs
Moore Threads GPUs

Deploy mainstream open models on supported accelerators with a unified service-engine stack and production-oriented tooling.
Improve throughput with asynchronous scheduling, graph optimization, multi-stream execution, and dynamic load balancing.
Reduce inference cost through hardware-aware optimization, efficient memory management, and global KV cache control.
Broad Support
xLLM supports mainstream open models across heterogeneous accelerator targets.
Supported Hardware
Supported Models
Got Questions?
Whether you are getting started or debugging a deployment, the public project resources are the fastest place to continue.
Resources
Core project resources in one place.