The Memory Layer
Memory gives an agent continuity across turns without replaying
the full conversation every time. Each exchange is
embedded into a vector, stored in an index, and
retrieved selectively — only the most relevant prior context is
injected into the next prompt.
"The agent doesn't remember everything. It remembers what
matters right now."
Three things work together: a
retrieval system that pulls relevant history, a
classifier that fires behavioral rules based on
what the user says, and an
emotional state tracker that lets personality
shift gradually across turns.
Embedding & Storage
After each exchange, the message and response are sent to your
local embed model. The resulting vectors are stored in a
server-side index keyed to your session.
Your text never leaves your network. The browser calls the embed
model directly — only the vectors reach the server. The server
never sees the raw content.
browser → local embed model → vectors → server index
prune_keep sets a ceiling on how many memories are
retained. Oldest entries drop when the limit is reached.
Retrieval
Before each generation, the incoming message is embedded and
used to query the index. The top_k most
semantically similar prior turns are pulled out and injected
into the memory_recall placeholder in the assembled
prompt.
"Distant or irrelevant history is simply not included."
This keeps prompts compact without the agent losing track of
things that actually matter. A conversation about a specific
grievance resurfaces that grievance when it becomes relevant
again — not on every turn.
Classifier & Rules
Each incoming message is also run through a
classifier — a local model that reads the
message and returns one or more tags. Each tag matches a rule in
memory_rules.
When a rule fires, its ending_text is appended to
the assembled prompt for that turn — steering the model's next
response without hardcoding that guidance permanently into the
registry.
tag: "customer_escalating"
ending_text: "Stay professional. Do not match their energy."
Rules are situational. They engage only when the classifier sees
the pattern — and disengage as soon as it doesn't.
Emotional State
Rules can carry emotion deltas — small
adjustments to named dimensions like warmth,
tension, trust, or
playfulness. Each time a rule fires, its deltas
accumulate into the agent's emotional state.
Between turns, every dimension decays toward zero at
emotion_decay_rate. Persistent signals keep
dimensions elevated; a single spike fades quickly.
actions: [{ type: "emotion", deltas: { tension: 0.10, trust: -0.08 } }]
The current emotional state is injected into the prompt as the
emotional_state template variable — giving the
model a live read on where the conversation stands.
Style Blending
A style_blend block maps each persona and sentiment
layer to one emotional dimension. Once that dimension exceeds a
threshold, the engine starts blending the secondary
persona or sentiment into the prompt.
"Personality shifts gradually under pressure — not snapped
between states."
At threshold, zero secondary content is added. As the dimension
climbs, more secondary directives are included. The primary
stays in place throughout — the agent doesn't flip, it drifts.
style_blend:
personas: { axis: "tension", primary: "calm_agent",
secondary: "burned_out", threshold: 0.60 }
Working Notes
Every working_notes_every_n_turns turns, the engine
asks the model to write a short self-summary: what it has
established, what it knows about the other party, what it
expects next.
These notes replace themselves on each cycle and are injected
via the working_notes template variable — giving
the agent a persistent internal scratchpad at a fraction of the
token cost of full history replay.
notes_about_me_prompt:
"What have I covered? What do I need next? Short bullets."
notes_about_other_prompt:
"What has the customer told me? Note inconsistencies."
Configuring Memory
Set embed and classifier endpoints in the
Connection Configuration panel on this page.
Each registry carries a memory_config block that
wires up models, endpoints, and retrieval settings.
Key fields: top_k — memories injected per turn.
prune_keep — total memory ceiling.
emotion_decay_rate — how fast emotional state
returns to baseline. working_notes_every_n_turns —
scratchpad refresh cadence.
In Studio, the Memory Config fieldset in the
Tuning tab lets you override all of these live without touching
the registry JSON.