Who’s had enough of unverifiable ‘memory’ claims, silent edits, and post-hoc explanations that don’t survive scrutiny?
Hi everyone
I’ve just published a new Space that’s meant to solve this problem I keep seeing in agent demos and “AI memory” claims: people describe what happened… but you can’t independently verify it. This Space is a proof-first flight recorder for AI/agent runs.
Instead of asking anyone to trust logs, screenshots, or vibes, it produces a tamper-evident, hash-chained event timeline and lets you export a ZIP bundle that any third party can verify locally.
What it does (in plain terms)
- Records each step of an agent run as an event (prompt, tool call/result, output, memory read/write, retrieval, errors, notes, etc.)
- Stores events in an append-only JSONL log where every event links to the previous one via hash (
prev_event_hash_sha256) - If a single line is edited, removed, or reordered, verification fails (that’s the point)
- Optional Ed25519 signatures if you want cryptographic “this came from my key” provenance
- Finalisation commits a session anchor and the recorder refuses any further writes to that session (no post-hoc “quiet changes”)
- Exportable proof bundles (
rft_flight_bundle_<session_id>.zip) that others can upload back into the Space (or verify locally)
The UX: no guessing what to click
There’s a Quickstart (1-click) tab that runs a complete demo flow:
Start session → append events → verify → finalise → export bundle
…and it auto-fills the session id across the other tabs so you can explore without getting lost.
Why this exists
The real governance question isn’t “can agents access memory?”
It’s: can you prove they didn’t silently rewrite it?
This Space makes the “audit trail” the actual permission layer: if the history is changed, it won’t verify.
Included “brutal tests”
This repo includes a brutal_test.py script with two hard checks:
- Two-tab spam test (concurrent writers): session must still PASS verification
- Tamper ZIP test: modify exported event payload → import verification must FAIL
Part of the RFTSystems verification suite
This Space is part of a wider collection focused on live verification / receipts / auditability. If you’re into verifiable agent behaviour, you’ll probably like the full suite.
Collection:
Related Spaces in the suite:
What would be great (feedback)
If you try it, I’d love feedback on:
- missing event types you’d want for real agent runs
- whether the exported bundle format is clear enough for third-party review
- what would make “verification” feel more obvious to non-technical users
- any edge cases where you think the recorder could be tricked
If you build agents and care about reproducibility + auditability, this should be useful. Cheers.
RFTsystems - Liam Grinstead 
#AI #Agents #LLM #Auditability #Verification #Reproducibility #Security #MLOps #AIEngineering #Provenance #Cryptography #Ed25519 #HashChain #Forensics #Observability #Governance #RAG #MemorySystems
Gradio #OpenSource