← Documentation index Foundations › Perspectives & Judgement

Mykleos

Perspectives & Judgement
Version 1.1 — 22 April 2026 · v1.1 adds §7-bis on proactivity as a cross-cutting lens
The Mykleos design analysed through four expert lenses (agentic
programming, psychology, UI, AI) and weighed against four goals: useful, intelligent, autonomous, proactive.

Audience: those who have to make priority calls. Closes the design cycle of phase 0.

Contents

  1. Purpose and method
  2. The four adjectives: useful, intelligent, autonomous, proactive
  3. Four perspectives in brief
  4. Consolidated judgement: 24 critiques, five verdicts
  5. The seven blocking critiques (accepted)
  6. The intractable tensions
  7. The key observation: autonomy and proactivity are emergent
  8. 7-bis. The proactive lens: how it re-reads the seven blockers
  9. Updated work plan

1. Purpose and method

The overall Mykleos design was mature enough for critical review. I ran it through four expert lenses (agentic programming, psychology, UI, AI) and then weighed each point that emerged against the three declared goals of the product. This document consolidates both the analysis and the judgement, and closes phase 0 of design.

The method: generate generous critiques from the four perspectives, then apply a pragmatic filter "does this critique block usefulness/intelligence/autonomy? if yes I accept it, otherwise I weigh it". Five possible verdicts:

VerdictMeaning
blockingWithout this change, at least one of the three goals cannot be reached. Must be done.
reinforcementNot a new concept. Completes a piece of the existing design. Accepted.
deferUseful, not blocking. Will be done, with an explicit gate to promote it when truly needed.
tensionReal problem not solvable. Managed, not eliminated. Explicit acceptance of the trade-off.
rejectedEvaluated, cost exceeds benefit. Rationale tracked.

2. The four adjectives: useful, intelligent, autonomous, proactive

Before judging, we need to define. The four terms are not interchangeable.

Useful

Solves a real problem of Roberto's on the first try, without technical intervention.
Measured by: completed tasks / approvals requested.

Intelligent

Understands context, picks the right tool, handles the unexpected without falling into loops or generic replies.
Measured by: success rate on unseen cases + quality of "I don't know".

Autonomous

Acts on its own in predictable cases, asks for permission only when needed.
Measured by: successful actions / approvals requested.

Proactive

Initiates useful actions without being prompted, when the moment deserves it, without disturbing.
Measured by: proposals accepted / proposals issued; appropriate silence.

Autonomy ≠ proactivity. Autonomy is the breadth of action without asking, inside an already-assigned task; proactivity is the initiative on a task nobody assigned. An agent can be autonomous and reactive (does a lot without asking, but only when prompted — Claude with tool calling). It can be proactive and supervised (initiates things, but asks approval for everything — a Mykleos in conservative mode). The two properties combine in a 2×2 matrix of distinct operational stances.
Why we add proactive here. The original phase-0 review dealt with three adjectives. Family F of Extended Perspectives (Telos) and families A, B, D made proactivity a structural pillar of the project, no longer a side option. Whoever re-reads P&J today must read it with four adjectives, not three. §7-bis applies the proactive lens to the seven original blocking critiques and identifies what changes.

3. Four perspectives in brief

Agentic programming — "the reasoning loop is unspecified"

Clean 4-layer architecture, protocol-based, audit log as a feature. But missing: loop choice (ReAct? function calling? planner+executor?), state ownership in case of crash, tool idempotence, an ExecutionTrace as a first-class object, dry-run mode, and any test strategy. Without ExecutionTrace you will never stitch together audit + replay + Darwinian fitness + eval — because they are the same thing seen from different angles.

Psychology — "the biological metaphor invites blind trust"

The three-tier memory aligns with working/episodic/semantic. Darwinian selection has grounding in RL/ACT-R. The 4 Laws are transparent deontological ethics. But: "neuron" evokes intentionality (ELIZA effect amplified), approval fatigue empties the meaning of gates, automation bias grows with fitness. Capability creep in expectations: users will project "it gets ever smarter" and will be disappointed.

UI — "the approval flow is undocumented, and it is the critical point"

Documents visually coherent, good progressive disclosure. But the most delicate UX surface — the "want me to proceed?" in Telegram, in CLI, via voice — has no design. Status visibility during thinking is invisible. Audit JSONL is great for forensics and terrible for "what did my butler do today". A minimal (even sober) admin dashboard changes daily life more than ten new features.

AI / ML — "no eval harness, implicit cost model, empty synthesis prompt"

Explicit awareness of self-judge LLM limits and of indirect prompt injection. CoALA vocabulary adopted. But: no way to measure whether v0.2 is better than v0.1 (need a mini eval, 15-20 scenarios). No cost model (order of magnitude: 1-3 €/day with Claude Sonnet on home use). Model tiering ignored: 60-80% of cost savable by putting gates on a local model and serious actions on frontier. The synthesis prompt for neurons — the piece that determines 80% of the success rate — is not specified.

4. Consolidated judgement: 24 critiques, five verdicts

The complete table. Each row is a specific critique with verdict and destination in the work plan.

#CritiquePerspectiveVerdictWhere / when
1Reasoning loop unspecifiedagentic + AIblockingagent_runtime.html
2Tool-call validation with schema + reject loopAIblockingagent_runtime.html
3ExecutionTrace as first-class objectagentic + AIblockingagent_runtime.html
4Status visibility on every channelUIblockingchannel.html (req.)
5Approval UX designed (batching, pause, revoke)psychology + UIblockingapproval_ux.html
6Minimal eval (15-20 YAML scenarios + harness)AIblockingeval.html
7Model tiering + 5th operational LawAI + agenticblockingpolicy.html + cost_tiering
8Anti-anthropomorphisation linguistic framingpsychologyreinforcementconstitution.html
9Explicit prompt structure + caching rulesAIreinforcementagent_runtime.html
10Minimal admin web UI (5 htmx views)UIdeferphase 3-bis; gate: if JSONL never opened → promote to phase 1
11Long-memory retrieval strategyAIdefergate: >4k tokens long → RAG
12MCP adoptionAIdefergate: 3+ external MCP tools
13State persistence post-crashagenticdeferphase 1 accepts "session lost"; gate: >1×/week
14Formal neuron versioning (semver)agenticdefergate: >20 neurons in library
15Anthropomorphisation → ELIZApsychologytensionmitigated by #8 + "what you know about me" tool
16Automation bias with high fitnesspsychologytensionmitigated by tutor mode in approval_ux.html
17Cost of autonomypsychology + AItensionRoberto must know "Full mode 24h = X €"
18Formal pairing SLA as meta-docUIrejecteddetail of pairing.html
19Explicit mobile-firstUIrejectedit's testing, not design
20Docs searchUIrejectedtrigger: >15 docs (now 5)
21Synapse deadlock as dedicated designagenticrejectedglobal timeout covers 95%
22Multimodal day-1AIrejectedalready deferred in the Survival Kit
23Multi-user family day-1UIrejectedfirst release mono-principal; phase 3
24Fine cost model (TCO analysis)AIrejectedorder of magnitude suffices

Totals: 7 blocking · 2 reinforcements · 5 deferred · 3 tensions · 7 rejected.

5. The seven blocking critiques (accepted)

These are the only non-negotiable changes. All others are at the edges.

#WhatDefault choice + rationale
1 Reasoning loop ReAct + provider-native function calling for phase 1. Simple, tested, cache-friendly. Revisit in phase 5 considering CodeAct.
2 Tool-call validation Every tool has a strict JSON Schema. The dispatcher validates before executing. Validation error → "tool X exists but argument Y is of type Z" reply reinjected to the LLM, max 2 attempts, then abandon with user message.
3 ExecutionTrace first-class Python object with: id, session_id, channel, messages[], tool_calls[], cost_tokens, cost_usd, wall_time_ms, outcome. It is the same structure used by audit log, dry-run/replay, Darwinian fitness, eval. A single source of truth on "what happened".
4 Status visibility Every channel must display "thinking...", a typing indicator, and update on tool change. Telegram: editable message; CLI: spinner with tool-name; voice: courtesy prompt every second.
5 Approval UX Batching: "approve similar actions for 10 minutes". Reading pause: the "ok" button enables after 3 seconds. Revocation: an /undo command that stops execution in progress. Tutor mode: every N consecutive approvals, a mandatory "you check this one" prompt breaks the flow.
6 Minimal eval 15-20 YAML scenarios with input + oracle (expected reply or criterion). A harness that runs them via reasoning-loop replay. Report: success rate, p95 latency, cost. Re-run on every commit that touches agent_runtime/ or policy/.
7 Model tiering + budget Two tiers: local-fast (local llama.cpp, < 500 ms, free) for policy gates + classification; frontier (Claude/Opus via supra) for reasoning, synthesis, user reply. Budget: 2 €/day soft cap, 5 € hard cap. Notify at 80% consumption.

6. The intractable tensions

Not every critique has a solution. Some are structural trade-offs of the kind of system we are building. Declaring them here means they have been seen, weighed, and accepted with awareness.

Tension 1 — Warmth vs anthropomorphisation

If you kill the "butler" metaphor, you lose warmth and familiarity; if you let it run free, you slide into the ELIZA effect (users attributing consciousness and moral responsibility to the system). Choice: we keep the metaphor, we accept 70% mitigation via linguistic framing + "what you know about me" tool + tutor mode. We prefer warm-with-monitoring to cold-without-ambiguity.

Tension 2 — Historical reliability vs automation bias

The higher a neuron's fitness, the more the user stops checking its output. It is the paradox of quality-driven selection: it amplifies trust even where it shouldn't. Mitigation: tutor mode (forces periodic review even on "reliable" neurons), visual separation between "I did X" and "I did X because I've done it 40 times already". Doesn't solve, limits.

Tension 3 — Required autonomy vs cloud cost

The more autonomy the user wants, the more the system explores, the more it spends. Handling: explicit cost declaration before each autonomy upgrade (e.g. myclaw session --level full --for 24h shows "estimate: €3.50"). Budget becomes part of consent.

7. The key observation: autonomy and proactivity are emergent

Of the four adjectives, useful and intelligent are properties an agent can have independently. Autonomous and proactive are not: they emerge only if the first two are well calibrated and overlaid with specific mechanisms — the approval gates for autonomy, the telos-alignment function for proactivity. Neither one more point of usefulness nor one more of intelligence produces autonomy or proactivity on its own.

Autonomy emerges from calibration of the gap between "does on its own" and "asks permission".
Proactivity emerges from calibration of the gap between "proposes" and "keeps quiet", regulated by the telos.

Operational implication:

7-bis. The proactive lens: how it re-reads the seven blockers

Proactivity, as a structural adjective, does not replace the seven blocking critiques — it re-reads them. None of them is to be dropped; some are reinforced, others are extended, one requires an implicit requirement to be added. A critique that was not in the original set also appears (the proposals inbox as a UX surface).

#Original blockerWhat changes under the proactive lens
1 Reasoning loop The choice does not change (ReAct + function calling), but a requirement is added: the loop must be startable without a user turn to activate it. Admissible triggers: cron, internal event (indexer), threshold on metrics (budget, suspicious activity). Documented as agent-initiated turn mode.
2 Tool-call validation Unchanged. Applies identically to proactive turns.
3 ExecutionTrace first-class Extended: the trace must record the origin of the turn (source: user | cron | indexer | policy | reflection). Without this distinction the audit cannot answer "who decided to start, this time?" — a crucial question for proactivity.
4 Status visibility Extended: proactive turns (evening briefing, inbox proposals) must be recognisable as such in the channel — different iconography, explicit "spontaneous" tag. Never disguise proactivity as reply-to-request.
5 Approval UX Reinforced: proactivity multiplies the approval surfaces. Beyond batching and tutor mode, a "proposals inbox" surface separate from blocking approvals is needed. The user must be able to reject a class of proposals ("fewer of these, please"), not only the single one.
6 Minimal eval Extended: eval scenarios must cover appropriate non-action. "Mykleos decides to propose nothing today" is a valid output and must be tested. An eval harness that only measures success rate on explicit requests is blind to rightful silences.
7 Model tiering + budget Drastically reinforced: proactivity consumes without being requested. Budget becomes a prerequisite of proactivity, not an optional. Proposal: the proactive budget be a separate head from the reactive budget (e.g. 30% / 70%), with independent hard cap.
8 (new) Proposals inbox as UX surface Not in the original set. Becomes blocking because without it the proactive fallout has nowhere to accumulate in a non-invasive way. Added to the plan: proposal_ux.html (already anticipated by Extended Perspectives §5).
Re-reading the tensions. The three tensions (warmth / anthropomorphisation, automation bias, cost of autonomy) all amplify in the proactive regime: proactive warmth → more ELIZA, proactive fitness → more automation bias, proactivity → more autonomous spend. Existing mitigations (framing, tutor mode, cost declaration) stay valid but do not scale on their own. A fourth, constitutional one is needed: the appropriate-silence clause — the "don't disturb" telos is non-negotiable, not just high-priority.
Constitutional consequence. The 5th "homeostasis" Law, currently under evaluation (see adaptation #3), becomes a prerequisite if proactivity is structural. Without a self-imposed consumption limit, a proactive system diverges by construction. The Law is no longer optional: it must be written before the release of any proactive capability (phase 4-5).

8. Updated work plan

Before this judgement, phase 1 called for 4 classical microdesign docs: gateway · channel · tool · sandbox.

After this judgement, three cross-cutting docs must come first, and only then the four classical ones:

Binding order

OrderDocCovers (blocking critiques)
1agent_runtime.html#1 reasoning loop · #2 tool validation · #3 ExecutionTrace · #9 prompt structure
2approval_ux.html#5 approval UX · mitigation of tensions 1 and 2
3eval.html#6 eval harness
4gateway.html(phase 1 classics) #4 status visibility in part
5channel.html#4 status visibility complete
6tool.htmlprerequisite for #2
7sandbox.html
8+policy.html + cost_tiering (sub-section)#7 tiering · #17 cost of autonomy
Why this order: the 4 classics, written without the 3 cross-cutters, would be pattern-violators. Each of their decisions would conflict with choices not yet made about the loop, about approval UX, about evaluation. Written in the right order, every classical doc can cite and conform to decisions already taken.

What does NOT change in the plan


Keep reading

foundations · 20 min
Architecture — Introduction v1
The system being judged. The four layers, autonomy, workspace — the content on which the seven blocking critiques rest.
extension · 30 min
Neurons, Synapses and Memory v1.1
The critical extension where risks concentrate (anthropomorphisation, automation bias, capability creep): here Darwinian fitness is the pressure point.
rationale · 15 min
Literature & Adaptations
The 10 pre-judgement adaptations. Many critiques here become more explicit: this doc and the judgement talk to each other.
microdesign · in Italian
Component index
The updated work plan: 3 cross-cutters (agent_runtime, approval_ux, eval) before the 4 classics. English version not yet available.
home
← Documentation index
All documents at once.

Mykleos — Perspectives & Judgement v1.1 — 2026-04-22
Closes phase 0 of design. Opens phase 1 with a new order.
v1.1: added the fourth adjective (proactive) and the cross-cutting lens §7-bis.