Coordinator
GPU node
Client
Offline
Fetching...
Model
Qwen2.5
72B
72B
AWQ · 4-bit
Layers
80
20 × 4
Nodes
—
4 GPU + clients
Context
32K
tokens
Shard Config
ModelQwen2.5-72B-Instruct
QuantizationAWQ · 4-bit
Parameters72.7B
Transformer layers80
Coordinatorlayers 0–19 + head/draft
GPU node 2layers 20–39 (20)
GPU node 3layers 40–59 (20)
GPU node 4layers 60–79 (20)
Per-node VRAM~10–16 GB (L4 24 GB)
Node clientsrouting / edge (×6)
Context32 768 tokens
TransportEncrypted wire · ChaCha20
GPUs4× NVIDIA L4 (CUDA)
x402 Payment Flow
1
Request
POST /v1/chat/completions → HTTP 402 with CIRC treasury + amount.
2
Send CIRC
8fQgfsRnRkKSeNUhevT7wp8mhNvMSJdLn1fJi4oVpump — $0.001/call.
3
Retry + sig
X-Payment-Signature: <tx> — verified on-chain, single-use, <5 min.
4
Stream + earn
Workers 80% proportional to layers. Coordinator 20%. Localhost free.
API Endpoints
7 routes · POST /v1/chat/completions gated x402
▼
POST/v1/chat/completionsOpenAI-compat chat · streaming SSEx402 CIRC
POST/v1/completionsRaw text completionx402 CIRC
GET/v1/modelsAvailable modelsFree
GET/v1/workersWorker registry + layer assignmentsFree
POST/v1/workers/registerRegister a worker nodeCluster key
POST/v1/workers/heartbeatWorker keepaliveFree
GET/healthCoordinator health + pipeline statusFree
DLLM CHAT
Qwen2.5-72B-Instruct
— workers
80 LAYERS · 4 GPU NODES · INFERENCE READY
Free on-site demo · the x402 API is at inference.circuitllm.xyz
SYS
|
TEMP 0.7