← Documentation index Microarchitecture › introvertive fast-path

Metnos

introvertive fast-path — how the assistant learns to answer instantly
Microarchitecture

Audience: anyone who wants to understand why Metnos sometimes answers
in half a second, and sometimes takes ten.
Reading time: 10 minutes.

Table of contents

  1. The idea, in two lines
  2. Why a fast-path is necessary
  3. The two layers
  4. L0 — auto-produced cache (fastpath.py)
  5. L1 — feedback-promoted autopaths (autopath.py)
  6. Aging, death and inheritance
  7. The arguments extractor
  8. Configuration
  9. Promotion to a synthesised executor

1. The idea, in two lines

When a request reaches Metnos, before bothering the LLM planner we try to recognise it. If we have already seen it and we have already decided well how to answer, we replay it and return to the user in half a second instead of ten. No language models on the critical path, just memory.

request user query L0 fastpath hash + cosine L1 autopath feedback ✓ cluster engine Engine v2
Figure 1 — The memoization layers before the engine: L0 fastpath (hash + cosine), L1 autopath (user feedback); if neither fires, the request falls through to the engine (Engine v2).

2. Why a fast-path is necessary

The Metnos planner is a local Qwen 3.6 35B-A3B: it thinks well but takes about twelve seconds to decide the first step of a turn. For many requests this wait is disproportionate. “What time is it?” does not need an LLM: it needs get_now. Even “download this page and describe it in two lines” does not need the planner if it is a request we make often and whose sequence we already know: get_urls followed by describe_entries.

Hence the question: how do we recognise that a request is already known? And how do we accept that almost known is enough? The answer is the introvertive fast-path, organised in two layers — each captures a different degree of certainty, both converging to the same outcome: the assistant executes the right sequence without thinking twice.

3. The two layers

LayerWhat it recognisesHowCost per hit
L0A request already solved successfully, identical or semantically very closeExact hash (0a) + BGE-M3 cosine (0b)< 5 ms (hash) / < 150 ms (cosine)
L1An intent belonging to a cluster confirmed by positive user feedbackIntent hash + cluster_id lookup~30 ms

The flow is strictly sequential: first L0 is attempted, then L1 if L0 missed. If both miss, the LLM planner (Engine v2) takes over as always.

Note. The planner does not disappear: the fast-path is an additional route, not a substitute one. When the request is new, ambiguous, or below the threshold of any of the searches, the planner takes the wheel as if the fast-path were not there.

4. L0 — auto-produced cache (fastpath.py)

The first layer lives in runtime/engine/fastpath.py and manages a SQLite database (fastpaths.sqlite) of plans already executed. Entries are produced automatically: every time a turn completes successfully (full plan from the engine, L1 hit, or promotion from cosine 0b), Metnos records the canonical query, its hash, the BGE-M3 embedding, the complete plan (framework) and the intent (verb + object). No approval is needed: the chains are made of executors already vetted and tested.

Two sub-layers of matching

The lookup proceeds in two phases:

Auto-production and self-healing. If a fastpath is removed (by aging, death, or negative user feedback), it recreates itself on the next successful repetition. This property makes pruning low-cost: defaults can be aggressive without risk of permanent loss.

Safety valves

5. L1 — feedback-promoted autopaths (autopath.py)

The second layer lives in runtime/engine/autopath.py and operates on a different logic: it does not repeat the same query, but generalises to a cluster of intents confirmed by positive user feedback.

When the user gives a “✓” feedback, Metnos records the turn's framework, its hash and the semantic cluster (intent hash + cluster ID). After a minimum number of confirmations (configurable, default 1) on the same framework hash and cluster, the plan becomes a reusable autopath: the next time an intent in the same cluster arrives, the plan is replayed without going through the planner.

Anti-autopaths and champion/challenger

L0 vs L1 boundary. L0 is the repetition of the same query (admits query-specific plans, via exact hash). L1 is the generalisation to a cluster/intent with explicit user consent. The L0 fast-path always wins (first in the cascade): even if L1 has a autopath for that intent, an exact fastpath skips the entire chain.

6. Aging, death and inheritance

L0 fastpaths age and die deterministically, with no LLM involved. The nightly task_state_reaper job applies three aging rules and four death conditions.

Aging

RuleCriterionDefaultEnv
Never reusedCreated more than N days ago but never served a second time14 daysMETNOS_FASTPATH_GRACE_DAYS
StaleLast use more than N days ago30 daysMETNOS_FASTPATH_STALE_DAYS
LRU capTotal entries above the cap; least recently used are pruned500METNOS_FASTPATH_MAX

Death (only with complete catalogue)

CodeCauseInheritance
C1A tool in the plan no longer exists in the catalogue (retired, renamed, archived). Replay would fail.No
C2 provenanceThe fastpath was promoted to a synthetic executor (see §9) and that executor is now in the catalogue.Yes
C2 nameAn executor named {verb}_{object} matching the intent exists, but no tool in the plan belongs to that family. The fastpath would shadow the executor.Yes
C2 prefilterFor multi-step plans: the deterministic routing prefilter on the canonical query indicates that a single executor now covers the intent (even under a different name).Yes

Point inheritance

When a fastpath dies by supersession (C2), its usage counts (n_uses) are transferred to the heir executor via the executor aging system. Accumulated demand is not lost.

Auto-production economy. A fastpath pruned by mistake recreates itself on the next successful repetition. Pruning costs zero and defaults can be aggressive.

7. The arguments extractor

Recognising the request is half the work. The other half is reconstructing the concrete argument values: which paths, which URLs, which date, which threshold. Metnos has a deterministic extractor (args_extractor.py) that works by rules:

8. Configuration

Fast-path parameters are controlled by METNOS_* environment variables. A TOML file (~/.config/metnos/runtime.toml) provides persistent values; the default hardcoded in the module is the last safety net.

Layer 0 (fastpath)

VariableDefaultMeaning
METNOS_FASTPATH_STALE_DAYS30Calendar days after which an unused entry is pruned
METNOS_FASTPATH_GRACE_DAYS14Grace days for never-reused entries
METNOS_FASTPATH_MAX500Maximum rows (LRU cap)

Layer 1 (autopath)

VariableDefaultMeaning
METNOS_AUTOPATH_MIN_OBS1Minimum positive observations to promote an autopath
METNOS_AUTOPATH_TTL_ANTI2592000 (30 d)Anti-autopath duration in seconds
METNOS_AUTOPATH_TTL_REPEAT3600 (1 h)Soft window for repeated feedback

Promotion to executor

VariableDefaultMeaning
METNOS_FP_PROMOTE_MIN_CLUSTER3Minimum distinct fastpaths in the cluster
METNOS_FP_PROMOTE_MIN_USES15Minimum cumulative usage
METNOS_FP_PROMOTE_MIN_AGE_DAYS30Minimum cluster age
METNOS_FP_PROMOTE_MAX_PER_NIGHT3Maximum new emissions per night
METNOS_FASTPATH_AUTOPROMOTEoffEnables Tier 2 auto-promotion (no human approval)

9. Promotion to a synthesised executor

When a group of recurring L0 fastpaths shares the same plan structure (framework hash) and the same intent, a nightly job (task_fastpath_promotion) evaluates them as candidates for becoming a full synthetic executor. Promotion is cluster-based, never per-instance: at least 3 distinct fastpaths, 15 cumulative uses and 30 days of age are required. Only multi-step patterns are promoted: single-step ones already have their executor, and the fastpath value there is skipping the LLM, not the plan.

Two promotion tiers

Provenance

Every emitted candidate records the IDs and canonical hashes of the source fastpaths in a promotions table. When the executor enters the catalogue, C2 provenance-based death is exact: the link between the original fastpath and its heir is recorded, not inferred from the name.

© 2026 Roberto Brunialti · Metnos documentation