Borrowed Hardware: Inference on GPUs, Agents on CPUs, Trust on Neither // CIRCUIT LLM

A network that does both AI and trading has two completely different kinds of work to place. Running a language model is heavy, parallel matrix math that saturates a GPU for a fraction of a second and then needs the next request. Running a trading agent is the opposite: a small loop that wakes every few seconds, checks some prices, maybe places one order, and otherwise sits idle, for days. The first wants a graphics card. The second wants a CPU it can hold for a long time and barely use. Most networks are built for one or the other and rent the rest.

Circuit places both, and it places them on the same contributors' machines. A node operator who plugs in a GPU is also, on the CPU cores sitting next to it, hosting other people's agents. Inference runs on the graphics card; agents run on the processor that hangs off the same box. One pool of hardware, two kinds of work, drawn from people who already turned their machine on.

The interesting part is not that this is possible. It is that it can be made trustless, that you can run your model on a stranger's GPU and your agent on a stranger's CPU and not have to trust either one. That is the problem this article is about, and the rest of it is the answer in two halves: the GPU half, the CPU half, and the layer that ties them together.

One client, two contributions

A contributor runs a single program, the node client. It is the membership card for the network. On the GPU side it joins the inference mesh: it registers, takes an assignment of model layers, signals ready, and starts serving its slice of the model. On the CPU side it opts in to the agent cloud, declaring a budget, so many agents, so much memory, and then hosts whatever agents the scheduler places on it, draining cleanly when the operator dials the budget back down.

The client's own dashboard shows both contributions side by side: the GPU's layers and earnings in one tab, and in another, the agents this machine is currently hosting, their state, their uptime, what they are doing. The two halves of the network are visible in one place because, on the operator's machine, they are two halves of one program. The operator lends a card and some cores, and earns for the work each does.

That is the easy part. Now the hard part, twice.

The GPU half: inference you can verify

Circuit's Decentralized LLM takes one large model, a 72-billion-parameter model in production, and splits it by layer across separate commodity GPUs. The first machines hold the first layers, the next hold the next, and a coordinator handles the shared pieces and the sampling. The model's hidden state crosses the network from one card to the next, and the engine is arranged so the forward pass crosses the wire as rarely as possible, once per token rather than once per layer, with a small draft model running ahead to hide even that hop. The result decodes coherently and fast across machines that have never met.

Splitting the model is the engineering problem. Trusting the machines is the security problem, and it is separate. When a node reports the output of its layers, how do you know it actually ran the model, and ran the right model, rather than returning cheap garbage to collect a fee?

The network answers with trustless verification built into the coordinator. New nodes start on probation and are continuously checked: their contribution is measured against an agreement metric, and they are hit with challenge probes whose correct answers are known. A node that agrees is promoted to trusted and can audit others; a node that drifts or cheats is evicted. Identity is a node's own signing key, so trust attaches to a key, not an IP, and the promote-and-evict decisions are written by an auditor authority the network agrees on. The point is that a GPU node's honesty is measured, not assumed, by the protocol itself, and a dishonest one is removed without anyone having to notice by hand.

So the GPU half has its trust story: the model is split for speed, and the splitting is policed for honesty.

The CPU half: custody that survives a hostile host

The agent cloud has a harder version of the same problem, because an agent holds money and the host holds the machine.

Start with what is already true and already proven on-chain. An agent's signing key is off-box: it is generated and held by a separate signer service, never by the machine running the agent. The agent runs its strategy on the operator's CPU but cannot itself sign anything; when it wants to trade, it sends an intent to the signer, which holds the wallet. The signer's whole vocabulary is buy and sell. There is no transfer, no withdraw. Value can move between tokens inside the agent's own wallet and nowhere else. So the worst a malicious host can do is not theft, because the verb that moves money out does not exist. The signer also enforces the owner's policy on every trade, a cap per trade and per day, a cooldown, allow and deny lists, and fences out a crashed-and-restarted copy with a monotonic session epoch so the same agent never runs twice. This part is live, and it has been exercised with real funds on mainnet, a buy and a sell, built, validated, signed off-box, and landed on-chain, with the key never once on the host.

But "can't be drained" is not the same as "can't be abused." The agent's session token lives in its environment, on the host, and the signer authenticates a trade by that token alone. A hostile host could therefore submit buy and sell intents of its own choosing, within policy: churn the wallet to bleed it on fees, time trades badly, push into a token it is about to dump. Bounded by the caps, never a drain, but real. And no software sandbox fixes this, because every sandbox protects the host from the guest, never the guest from the host. The operator is the kernel. The only thing that can hide an agent from its host is a hardware enclave, and Circuit's design refuses to require special hardware. So it takes the other road: don't hide the agent, validate its decisions.

The layer that ties it together: authenticated inputs and zkTLS

The fix has a precise shape. The signer stops taking the agent's trade on faith and instead demands evidence and re-derives the trade itself:

The signer signs a trade only if (a) the inputs the agent acted on are authenticated, and (b) the trade is exactly what the owner's agreed decision rule produces when the signer re-runs that rule on those same inputs.

Now the host is boxed in from both sides. It cannot fake the inputs, because they are authenticated. And it cannot fake the decision, because the signer recomputes it. A tampered agent, a faked price, a host-chosen trade, each is rejected before anything is signed. This is Verified Intents, and the only thing it asks of a strategy is that it be a rule the signer can re-check, which is most of them: a deterministic rule, or a rule applied to a signed AI verdict.

The "authenticated inputs" half is where zkTLS comes in, and it is worth being exact about what it does. zkTLS, in its various forms, TLSNotary, DECO, Reclaim, zkPass, lets a party prove that a piece of data genuinely came from a specific HTTPS server over a real TLS session, without that server's cooperation and, if you want, in zero-knowledge so the proof reveals nothing but the fact. It authenticates the origin of data the agent claims it acted on, a price from an exchange the host does not control, an order book, a feed from somewhere out on the open web. With a zkTLS proof attached, the host can no longer hand the agent a number it made up and pass it off as the market.

There is a Circuit-specific shortcut worth naming, because it keeps the common case cheap. For Circuit's own data and AI, you do not need zkTLS at all. The Data API and the inference mesh are first-party services, so they can simply sign their responses, and the inference side can hand back a receipt that the signer checks directly. zkTLS is the heavier escape hatch reserved for third-party web data the host could otherwise sit in the middle of. The rule is to use the cheapest authenticity the source allows: a signature when Circuit serves the data, a receipt when the model produced it, and a zkTLS proof when the data lives on someone else's server.

And this is the moment the two halves of the network meet. The GPU side produces inference whose honesty is verifiable, the model is policed, and a result can come with a receipt. The CPU side consumes those receipts, plus signed first-party data, plus zkTLS proofs for anything from outside, and trades only what the rule permits on inputs it can prove are real. An agent can therefore think on a GPU it does not own and act through a CPU it does not own, and at no point does it have to trust the hardware in between, because the math is checked and the data is authenticated. That is what "trustless, on borrowed hardware" actually means, made concrete.

Where this stands, honestly

Two pieces of this are live and one is being built, and it is worth keeping them straight. Off-box custody is live and proven with real funds on mainnet, no-drain is not a promise, it is a property of a wallet that has no transfer verb. The decentralized model and its trustless node verification are running, you can chat with the split model today and dishonest GPU nodes are already promoted and evicted by the protocol. Verified Intents, the signer-side decision gate that turns authenticated inputs into forgery-proof trades, is designed and partly built: the evidence-signing, the rule evaluator, and the gate exist as a package in the SDK, and the full enforcement path in the signer is the next thing to land. We name that line plainly rather than blur it, because the whole argument of this article is about what can be trusted, and that argument is worth nothing if the status is fudged.

A few lines of code away

All of this is reachable from the Circuit SDK, which is the practical reason to care. The verified-intent core, the evidence types, the rule language, the decision gate, is the @circuit/attest package. The agent runtime that runs on the CPU mesh with off-box custody is @circuit/agent. The client that joins the GPU mesh is @circuit/node. The OpenAI-compatible inference client that can carry a receipt is @circuit/inference. A developer writes an agent that thinks with the decentralized model, senses with authenticated data, and acts under a rule the network re-checks, and ships it to hardware it has never seen, in a file short enough to read in one sitting. The contributor, on the other end, runs one node client and lends a GPU and some spare cores to exactly that.

The shape of the whole thing is a single decentralized computer assembled out of other people's machines, the powerful ones doing the heavy thinking and the everyday ones doing the patient work, settled in CIRC, and, crucially, arranged so that you do not have to trust any one of those machines, because the network checks the math and authenticates the data on your behalf. Borrowed hardware, verified work.