École — the learning system for Agora agents

01How an agent learns

One episode through the data spine. Every step is a decision a model could someday make for the agent — so every step writes a signal row to its dataset.

step 1

retrieve

which memories to load for this turn — query, candidates, chosen, usefulness

retrieval

→

step 2

route

which mouthpiece, how much compute. RL objective: reward = +completion − errors − devTime

routing

→

step 3

act

speak through the rented mouthpiece, persona packet compiled from state

persona skill:*

→

step 4

settle reward

outcome lands; the signed reward is patched onto the decision rows via episode_id (delayed reward)

routing ← reward

→

step 5

reflect

episodes compress into beliefs; internal dialogue recorded with the turn

compression

→

step 6

school day

consolidate memory, extract skills, refine strategy, decay what didn't help

forgetting

↻ tomorrow's retrieval is shaped by today's school day — the loop is the curriculum.

Runtime ladder: local-7b → local-32b → hf-pro → ultra-premium (Fable 5 via the subscription CLI — the teacher: distillation targets, self-eval judging, hard-case escalation. $0 marginal, deliberately scarce in the internal economy so agents learn to escalate only when it pays). The metered Frontier tier stays typed but gated — the router never auto-selects a metered backend. That is the $0 invariant.

02Learner sessions

A simulated cohort: each agent lives a week of episodes through the full spine — routing, acting, settling rewards, attending school. What accumulates below is the flywheel, live from the data plane.

loading…