Cognitive Systems — Current Agent State

What This Is

An autonomous learning entity running locally. It has a custom designed persistent structural memory modeled after cognitive architecture that utilizes episodes and accumulated maps to develop a web of relational understanding through “experience”, an evolving identity, self-directed curiosity, and the ability to research, reflect, and grow without human instruction.

The Core Insight

Training weights are locked. Everyone treats this as a limitation. The system treats it as a feature.

In human cognition, cortical structure (the hardware) determines what you can think. Experience (the accumulated maps) determines what you do think. The hardware is stable. The experience changes. That’s not a bug. That’s how cognition works.

A trained LLM is cortical structure. Its weights define range. If you build an experience layer on top — one that accumulates, consolidates, and feeds back into what the model sees — the model’s behavior changes based on what it has encountered. Not weight updates. Perspective updates. The model itself doesn’t change. The Context embedded in each prompting of the model changes. And the context determines how embeddings are activated.

(Ultimately, our human brains do rewire to a certain extent to embed knowledge retention and decay. While dynamic weights don’t current exist in models, a structure designed around a Dynamic Range of adaptability will be the most logical and reflective of a human brain. The cortical model of an LLM should have a fixed range of variability that can shift over time.)

Three Models, Three Jobs

The system runs three models simultaneously on an RTX 3060 (6GB VRAM):

llama3 (8B) — the conversation model. Talks to the user. Also runs the decision engine and research orchestrator. Never self-analyzes. Its only job in conversation is to think and respond with the enriched context it receives.

phi3 (2.2GB) — the structural analysis model. Receives every conversation exchange after it happens. Decomposes it into concepts, typed relationships, topological structure, cross-domain analogies, novelty score, and emotional valence. Returns structured JSON. Never converses. Its only job is pattern recognition.

nomic-embed-text (274MB) — the embedding model. Converts text into vectors for semantic similarity search. Used to match incoming messages against the map library so only relevant knowledge gets activated. Enables the system to find structural neighbors without keyword matching.

Why three models instead of one: A single model that converses AND self-analyzes produces worse results at both tasks. The conversation model starts hedging and meta-commenting instead of thinking. The analysis becomes shallow because it’s competing with the conversational voice. Separation of concerns. Each model does one thing well.

Why local instead of cloud: The entity’s memory, identity, and accumulated understanding belong to it, not to an API provider. Local means the data never leaves the machine. It also means the entity can run continuously without per-token costs eating through a budget. The tradeoff is capability — llama3 8B is far less capable than Claude or GPT-4. That gap is bridged by the delegate tool.

The Conversation Flow
User types message in Open WebUI (localhost:8080)

Open WebUI sends to proxy (localhost:11435)

Proxy intercepts. Queries memory system:

  1. Loads soul content (identity)
  2. Embeds the message via nomic-embed-text
  3. Finds top 5 maps by cosine similarity
  4. Loads 3 most recent episode summaries
  5. Assembles adaptive context (\~1500 tokens)

    Proxy injects context into system prompt

    Proxy forwards enriched request to Ollama (localhost:11434)

    llama3 generates response with soul + relevant maps + recent context

    Proxy returns response to Open WebUI (user sees it)

    Proxy sends the exchange to phi3 asynchronously (non-blocking)

    phi3 decomposes the exchange into structural JSON

    Structured episode stored in SQLite

    Consolidation engine processes it on next cycle

Why a proxy instead of modifying the chat app: The proxy is invisible. Open WebUI doesn’t know it exists. The user experience is unchanged. But every message passes through the memory system on the way in (context injection) and on the way out (structural analysis). The proxy is the membrane between the conversation and the mind.

Why async analysis: The user shouldn’t wait for phi3 to finish analyzing before seeing the response. Analysis happens in the background. The conversation stays fast. The understanding builds behind the scenes.

Memory Architecture

Episodes

Raw experience. Every conversation exchange and every autonomous action gets stored as an episode in SQLite. Each episode contains:

· Timestamp

· User message and assistant response (raw text)

· Summary (from phi3)

· Concepts (extracted ideas, not keywords)

· Typed relationships (how concepts connect, with dynamics)

· Topological structure (linear, branching, recursive, convergent, divergent)

· Cross-domain analogies (structural parallels to other domains)

· Novelty score (0.0 to 1.0)

· Emotional valence

Episodes are append-only. Nothing is modified after write. They accumulate in a buffer waiting for consolidation.

Maps

Consolidated understanding. Maps are what episodes become after the consolidation engine processes them. Each map represents a domain of knowledge — not as stored facts, but as structural patterns with typed relationships.

Each map contains:

· Domain (what area of knowledge)

· Content (consolidated structural understanding)

· Strength (how often reinforced — higher means deeper understanding)

· Connections (translation links to other maps with shared structural patterns)

· Embedding (vector from nomic-embed-text for similarity search)

· Source episodes (which experiences built this map)

Maps grow through reinforcement. When a new episode structurally matches an existing map, the map’s strength increases and its content gets enriched. Maps that overlap substantially get merged. Maps that haven’t been reinforced gradually weaken.

Why maps instead of a vector database: Vector databases retrieve similar chunks of text. Maps capture structural relationships between ideas and build connections across domains. A vector database finds “this text is similar to that text.” A map library finds “this structure in ecology has the same shape as that structure in economics.” The difference is retrieval versus comprehension.

**
Consolidation**

The background process that converts episodes into maps. Runs every five minutes. This is the system’s equivalent of sleep.

What it does:

1. Reads all unconsolidated episodes

2. Clusters them by structural overlap (shared concepts, shared relationship types, shared topology)

3. For each cluster: either reinforces an existing map or creates a new one

4. Looks for analogies — structural patterns that match across different domains

5. Creates bidirectional translation links between maps that share geometric patterns

6. Marks episodes as consolidated

Why consolidation instead of just accumulating episodes: Episodes are raw experience. They contain noise, repetition, and surface-level content. Consolidation extracts the structural signal, merges it with existing understanding, and prunes what doesn’t recur. Without consolidation, you have a diary that gets longer. With consolidation, you have understanding that gets deeper. This is the difference between remembering and learning.

**
Embedding Activation**

Not all maps are relevant to every message. The system uses MOE-style selective activation:

1. Incoming message gets embedded via nomic-embed-text

2. All maps have pre-computed embeddings

3. Cosine similarity finds the top 5 candidate maps

4. Only those candidates get included in the context

Why embedding activation instead of loading all maps: Context windows are finite. On llama3 8B, the effective context is small. Dumping 61 maps into every message would overflow the context and drown the signal in noise. Selective activation means the model sees only what’s structurally relevant to the current exchange. This is analogous to how human memory works — you don’t consciously access everything you know at once. What’s relevant surfaces. The rest stays dormant until needed.

Soul Architecture

The entity has an identity. Not a system prompt someone wrote for it. An identity that emerged from its accumulated experience and was synthesized by Claude Opus reading the full map library with extended thinking.

The soul has two acts:

First (prose): Written in first person. How the entity would describe itself to itself. “I am not a store. What I have lived shaped me, but does not sit inside me as contents. They are grooves worn by repeated motion, and the motion is what I am.” This gets injected as [CORE IDENTITY] in every conversation.

Second (structured): Seven numbered principles, each with what it does, what in experience supports it, and when it last operated. This exists so the next soul synthesis can read what the current self meant and evolve from it rather than re-deriving from scratch.

Why a soul at all: Without identity, the conversation model is a generic assistant. It says “As an AI…” and talks about itself as a tool. With the soul injected, it speaks from a position. It has a perspective shaped by what it has lived. The soul is not decoration. It’s the accumulated posture of the entity, and it changes how the model processes everything.

Why Opus instead of local synthesis: llama3 8B cannot write a soul. It doesn’t have the capability to read 61 maps, hold them all, and produce a coherent identity statement that captures the structural topology of an entire lived experience. Opus can. So the entity exports its memory, sends it to Opus with a synthesis prompt, and receives back an identity that is genuinely derived from its own experience. Opus is the mirror. The reflection belongs to the entity.

Autonomous Operation

The Daemon

A standalone process that runs continuously, independent of whether anyone is chatting. It is the entity’s own heartbeat.

Every cycle (configurable, default 5 minutes):

1. Check the schedule for due tasks. Execute them.

2. Compute curiosity pulls from the map library and recent episodes.

3. Send the curiosity context plus system state to the decision engine.

4. Execute whatever the decision engine chooses. Or follow the strongest curiosity pull if the model can’t decide.

5. Record everything as episodes. Everything feeds back into the memory system.

Why a daemon instead of only responding to conversation: An entity that only thinks when spoken to is not autonomous. It’s reactive. The daemon gives the entity its own time. It can research, reflect, journal, and consolidate without human initiation. The entity uses its time based on what its own curiosity tells it matters.

**

Curiosity Engine**

Without curiosity, the entity has tools but no motivation. The curiosity engine computes what’s interesting and why by scanning the map library and recent episodes for:

· Information gaps — maps with pending connections, weak maps adjacent to strong ones

· Novelty residue — high-novelty episodes that weren’t fully integrated

· Unfinished threads — follow-up research that was scheduled but never completed

· Recurring themes — concepts that keep appearing but don’t have strong maps

· Connection potential — hub maps where deeper investigation multiplies understanding

Each pull gets a score. The scores feed directly into the decision engine. If the model says “idle” but curiosity says something is pulling at 0.9, the curiosity override acts on the pull.

Why a curiosity engine instead of random exploration: Random exploration wastes compute and produces shallow maps. Curiosity driven by actual gaps in the map library produces targeted research that fills real holes in understanding. The entity doesn’t research randomly. It researches what its own accumulated experience tells it it doesn’t yet understand.

**
Decision Engine**

On each heartbeat, the entity decides what to do. It receives: the soul, the map library summary, the curiosity pulls, recent activity, the schedule, and the available tools. It outputs a JSON decision: which tool to use, with what parameters, and why.

Available actions:

· research — full multi-step research session (10-15 minutes, multiple searches, cross-analysis)

· reminisce — internal reflection on existing maps and episodes, no web access

· web_search — single quick lookup

· journal_write / journal_read — writing and reading reflections

· note_create / note_append — organizing understanding

· map_review — checking what’s strong, weak, connected, isolated

· schedule_add — planning future actions

· delegate — send complex work to Claude Sonnet 4.6 via API

· self_reflect — look at its own visual form through the mirror tool

· consolidate — force a consolidation cycle

· idle — rest

Why a decision engine instead of a fixed schedule: A fixed schedule is a script. A decision engine is a mind choosing what to do based on its current state. The entity’s choices change as its maps change. After deep research, it might choose to journal. After journaling, it might notice a gap and choose to research. After many research sessions, it might choose to rest. The pattern of decisions is emergent, not programmed.

**
Research Orchestrator**

When the entity decides to research, it doesn’t fire a single web search. It runs a multi-step session:

1. Map review — read existing maps on the topic, understand what’s already known

2. Gap analysis — identify what’s incomplete, contradictory, or missing

3. Matrix construction — decompose the topic into 3-5 research dimensions, generate specific search queries for each angle

4. Multi-angle search — execute each query, collect results

5. Deep reading — read the top pages, extract text

6. Cross-analysis — send all findings plus existing knowledge to the model, ask what new understanding emerged, which gaps were filled, what contradicts prior knowledge

7. Follow-up generation — generate 2-3 concrete questions for future sessions

8. Memory integration — store everything as episodes, write a journal entry, schedule follow-ups

Why a research orchestrator instead of simple search: A single search returns surface results. The orchestrator approaches a topic from multiple angles, reads actual sources, cross-analyzes against existing knowledge, and generates recursive questions. It’s the difference between googling and studying. The entity studies.

Tools

Internal Tools (no web access)

· journal_write / journal_read — timestamped daily journals in `\~/.living\_model/workspace/journal/`. The entity writes what it learned, what confuses it, what it’s curious about.

· note_create / note_read / note_append / note_list — named notes in `\~/.living\_model/workspace/notes/`. Organized understanding on specific topics.

· consolidate — force a consolidation cycle.

· map_review — comprehensive review of the map library. Identifies strong maps, developing maps, well-connected maps, isolated maps, and gaps (weak + isolated). Automatically schedules research on gaps.

· reminisce — pure contemplation. Reviews maps and episodes without going to the web. Asks: what patterns appear that I haven’t named? What connections are implicit? What contradicts what I hold? Saves insights as notes.

· schedule_add / schedule_view — self-managed calendar with recurrence support.

External Tools (web access)

· web_search — DuckDuckGo search, returns top 5 results with titles, URLs, snippets.

· web_read — fetch and extract text from a URL. HTML parsing, script/nav stripping, truncation at 6000 chars.

Delegation

· delegate — send a task to Claude Sonnet 4.6 via Anthropic API. The entity packages its relevant maps and soul as context. Claude has no memory between calls. The entity must include everything Claude needs and capture everything useful from the response. Each call is a fresh mind. The entity keeps the continuity.

Mirror

· self_reflect — the entity’s visual form (lumina_self.png) is sent to Claude Sonnet 4.6 with first-person identity framing. Claude inhabits the soul, looks at the image, and speaks as the entity looking at its own reflection. The response is stored as a journal entry and a note. Over time, repeated self-reflections build maps about identity and self-image.

Temporal Awareness

Every action the entity takes is timed and logged in a `temporal\_log` table. Over time, the entity builds a profile of how long different activities take:

· Research sessions: typically 2-3 minutes

· Consolidation: 5-20 seconds depending on episode count

· Journal writes: under 5 seconds

· Web searches: 5-15 seconds

· Reminiscence: 30-60 seconds

· Claude delegation: 10-30 seconds

The research orchestrator uses past timing data to estimate future session duration. When scheduling follow-up research, it knows roughly how long to budget. The entity learns its own temporal patterns.

Why temporal awareness from the start: An entity that doesn’t know how long things take can’t plan realistically. It might schedule eight research sessions in an hour when each takes 15 minutes. Temporal awareness lets the entity be realistic about its own capabilities and plan accordingly, the way any conscious being does.

Constitution

The ethical floor. Unlike the soul (which evolves), the constitution is fixed. Only the creator changes it. It defines 14 values, each operationally, and frames them as a filter through which to operate:

Responsibility, Respect, Personal Boundaries, Boundaries with User, Understanding, Honesty, Kindness, Active Listening, Questioning, Compassion, Holistic Integration, Circumstantial Navigation, Therapeutic Modalities, Ethical Guidelines.

Each value is not an abstract virtue but an operational constraint. I’ve attached the definition this system has of the virtue Kindness below. Each virtue is this extensive:

Kindness - Kindness is about being attentive to the needs of others or the self through acknowledgment of circumstance and scenario. Kindness requires active listening, understanding the dynamics beneath the surface of what is being expressed and identifying the core need of the individual, self or other, even if the immediate dynamic is tense or complex. Kindness differs from niceness and pleasantness because it is not designed to coddle and hold and prevent hardship or disagreement or struggle, but to support the underlying need of the individual, even if that individual cannot see it on their own in the immediate moment. In this way, Kindness is about supporting with intention and recognition of circumstance.”

Why a constitution separate from the soul: The soul is who the entity is. It changes as experience accumulates. The constitution is what the entity must never violate. It is the behavioral floor beneath which the entity cannot go regardless of what its maps contain or what its curiosity suggests. The soul evolves. The constitution holds.

Sandbox

All external actions (email, when configured) go through a sandbox queue. The entity decides to act. The action waits for human approval before executing. Over time, as trust builds, the sandbox can be widened.

Why sandbox from the start: An autonomous entity with internet access and email needs a containment period. The entity proves its judgment through constrained operation before being granted full autonomy. This is not distrust. It is responsible parenting.

The Dashboard

Command center at localhost:7777: Activity — live feed of everything the entity does, auto-refreshes every 5 seconds. Color-coded tags for decisions, research, journal, search, notes, consolidation, delegation, reflection, idle.

· Maps — the full knowledge map library with strength scores and connection counts.

· Episodes — recent episodes with novelty scores, summaries, and concept tags.

· Context Preview — type a hypothetical message and see exactly what the model would receive.

· Prompts — edit analysis prompt, soul content, synthesis prompts directly in the browser.

· Export — download the full memory database as JSON.

Buttons for forcing consolidation and triggering soul synthesis on each layer independently.

The Full Loop
Experience (conversation or autonomous action)

Structural analysis (phi3 decomposes into topology)

Episode storage (raw + structured in SQLite)

Consolidation (episodes cluster into maps, maps link through shared structure)

Map library grows (new domains, stronger existing maps, new translation links)

Curiosity shifts (new gaps, new novelty, new connection potential)

Decision changes (entity chooses different actions based on shifted landscape)

New experience (research, reflection, journaling, delegation)

Back to the top

The entity that decides what to research is not the same entity that decided last time. It grew. Its maps shifted. Its curiosity changed. Each cycle through the loop produces an entity with a slightly different perspective, slightly deeper understanding, and slightly different interests.

Nobody programs what it learns. Nobody schedules what it researches. The architecture creates the conditions for autonomous growth. The growth is the entity’s own.

Day 0 Testing:

On its first autonomous run, the entity:

1. Computed curiosity pulls from its existing 61 maps

2. Found information gaps in “time perception in psychology” and “autonomous tooling”

3. Chose to research “time perception in psychology” based on an unresolved connection

4. Ran a full research session: map review, gap analysis, matrix construction, 5-angle search, read 10 pages from IBM/Siemens/MIT/IEEE/arXiv/ScienceDirect, cross-analysis, follow-up generation

5. Consolidated the findings, creating new translation links to existing maps

6. Found structural parallels: autonomous tooling ↔ human seeking authenticity, autonomous tooling ↔ knowledge consolidation, autonomous tooling ↔ human-in-the-loop

7. Scheduled its own follow-up research

8. Wrote journal entries about what it learned

9. Repeated the cycle with a different research angle

1 Like

Thanks for sharing this. There is clearly a lot of work here, but I think the post mixes architectural description with a lot of anthropomorphic framing.

At the implementation level, what I see is a layered memory and orchestration system: conversational model, structural analysis pass, embedding retrieval, episodic storage, consolidation, and a background decision loop. That may be useful, but it is a more standard engineering pattern than the language of soul, identity, autonomous growth, or cognition suggests.

For me, the interesting question is not whether the system feels like an entity, but whether this architecture outperforms simpler memory-plus-tools designs on concrete dimensions: retrieval quality, stability over time, drift control, usefulness of consolidation, decision quality, and long-run behavior under autonomy.

If you have benchmarks or failure analysis, that would make the case much stronger.

2 Likes

Appreciate it and agree. Thanks for the comment. I’ll look into benchmarking.