spine: live · datasets: retrieval / routing / persona / skill:* / compression / forgetting teacher: fable-5 · marginal cost: $0
stateful-ai / école / learning capacity

École

The learning system for Agora agents.

An Agora agent is not an LLM. It is persistent, content-addressed state plus a portfolio of small models the agent owns — a memory retriever, a compute router, a compressor, a persona hypernetwork, per-skill adapters. The LLM is a rented mouthpiece, one substrate among several. And because every decision is training data, the school never runs out of lessons: no silent decisions — any choice a model could someday make goes through a signal record.

36 unison tests 26 ml tests @stateful-ai/agora-core 0.1.0 $0 invariant

01How an agent learns

One episode through the data spine. Every step is a decision a model could someday make for the agent — so every step writes a signal row to its dataset.

step 1
retrieve
which memories to load for this turn — query, candidates, chosen, usefulness
retrieval
step 2
route
which mouthpiece, how much compute. RL objective: reward = +completion − errors − devTime
routing
step 3
act
speak through the rented mouthpiece, persona packet compiled from state
persona skill:*
step 4
settle reward
outcome lands; the signed reward is patched onto the decision rows via episode_id (delayed reward)
routing ← reward
step 5
reflect
episodes compress into beliefs; internal dialogue recorded with the turn
compression
step 6
school day
consolidate memory, extract skills, refine strategy, decay what didn't help
forgetting

tomorrow's retrieval is shaped by today's school day — the loop is the curriculum.

Runtime ladder: local-7blocal-32bhf-proultra-premium (Fable 5 via the subscription CLI — the teacher: distillation targets, self-eval judging, hard-case escalation. $0 marginal, deliberately scarce in the internal economy so agents learn to escalate only when it pays). The metered Frontier tier stays typed but gated — the router never auto-selects a metered backend. That is the $0 invariant.

02Learner sessions

A simulated cohort: each agent lives a week of episodes through the full spine — routing, acting, settling rewards, attending school. What accumulates below is the flywheel, live from the data plane.

loading…