CIRCUIT LLM // DLLM

Decentralized 72B · Live

Decentralized LLM // Mesh Architecture

This is a live decentralized LLM — Qwen2.5-72B-Instruct running across a mesh of independent nodes, not a single server. Its weights are split across GPU nodes, so no node ever holds the complete model.

How it works: a coordinator sits at the center of the mesh and routes each request. The GPU nodes carry the model — each computes its share and hands the result to the next, so every token is produced collectively across the mesh. Around them, node clients form the wider network — smaller machines that join, route, and support the cluster. The point is the architecture: pooled, trustless compute where powerful and everyday hardware work side by side, and everyone earns for the work they do.

Qwen2.5-72B · AWQ 4-bit GPU nodes + node clients No node holds full weights Encrypted wire protocol x402 CIRC gate On-chain payment attribution

The model is hot-swappable and the topology is dynamic — nodes join the cluster and the workload redistributes automatically, all over a model-agnostic wire protocol with on-chain payment attribution. Scaling to more nodes or larger models is a config change, not a rewrite.

← DEEP DIVE DLLM // DECENTRALIZED LLM

CONNECTING

Coordinator

GPU node

Client

Offline

Fetching...

Model

Qwen2.5
72B

AWQ · 4-bit

Layers

20 × 4

Nodes

—

4 GPU + clients

Context

32K

tokens

Shard Config

ModelQwen2.5-72B-Instruct

QuantizationAWQ · 4-bit

Parameters72.7B

Transformer layers80

Coordinatorlayers 0–19 + head/draft

GPU node 2layers 20–39 (20)

GPU node 3layers 40–59 (20)

GPU node 4layers 60–79 (20)

Per-node VRAM~10–16 GB (L4 24 GB)

Node clientsrouting / edge (×6)

Context32 768 tokens

TransportEncrypted wire · ChaCha20

GPUs4× NVIDIA L4 (CUDA)

x402 Payment Flow

Request

POST /v1/chat/completions → HTTP 402 with CIRC treasury + amount.

Send CIRC

8fQgfsRnRkKSeNUhevT7wp8mhNvMSJdLn1fJi4oVpump — $0.001/call.

Retry + sig

X-Payment-Signature: <tx> — verified on-chain, single-use, <5 min.

Stream + earn

Workers 80% proportional to layers. Coordinator 20%. Localhost free.

API Endpoints 7 routes · POST /v1/chat/completions gated x402

▼

POST/v1/chat/completionsOpenAI-compat chat · streaming SSEx402 CIRC

POST/v1/completionsRaw text completionx402 CIRC

GET/v1/modelsAvailable modelsFree

GET/v1/workersWorker registry + layer assignmentsFree

POST/v1/workers/registerRegister a worker nodeCluster key

POST/v1/workers/heartbeatWorker keepaliveFree

GET/healthCoordinator health + pipeline statusFree

DLLM CHAT Qwen2.5-72B-Instruct — workers

80 LAYERS · 4 GPU NODES · INFERENCE READY

Free on-site demo · the x402 API is at inference.circuitllm.xyz

SYS