Agent Flight Recorder - a Hugging Face Space by RFTSystems

Who’s had enough of unverifiable ‘memory’ claims, silent edits, and post-hoc explanations that don’t survive scrutiny?

Hi everyone :waving_hand: I’ve just published a new Space that’s meant to solve this problem I keep seeing in agent demos and “AI memory” claims: people describe what happened… but you can’t independently verify it. This Space is a proof-first flight recorder for AI/agent runs.

Instead of asking anyone to trust logs, screenshots, or vibes, it produces a tamper-evident, hash-chained event timeline and lets you export a ZIP bundle that any third party can verify locally.

What it does (in plain terms)

  • Records each step of an agent run as an event (prompt, tool call/result, output, memory read/write, retrieval, errors, notes, etc.)
  • Stores events in an append-only JSONL log where every event links to the previous one via hash (prev_event_hash_sha256)
  • If a single line is edited, removed, or reordered, verification fails (that’s the point)
  • Optional Ed25519 signatures if you want cryptographic “this came from my key” provenance
  • Finalisation commits a session anchor and the recorder refuses any further writes to that session (no post-hoc “quiet changes”)
  • Exportable proof bundles (rft_flight_bundle_<session_id>.zip) that others can upload back into the Space (or verify locally)

The UX: no guessing what to click

There’s a Quickstart (1-click) tab that runs a complete demo flow:
Start session → append events → verify → finalise → export bundle
…and it auto-fills the session id across the other tabs so you can explore without getting lost.

Why this exists

The real governance question isn’t “can agents access memory?”
It’s: can you prove they didn’t silently rewrite it?
This Space makes the “audit trail” the actual permission layer: if the history is changed, it won’t verify.

Included “brutal tests”

This repo includes a brutal_test.py script with two hard checks:

  • Two-tab spam test (concurrent writers): session must still PASS verification
  • Tamper ZIP test: modify exported event payload → import verification must FAIL

Part of the RFTSystems verification suite

This Space is part of a wider collection focused on live verification / receipts / auditability. If you’re into verifiable agent behaviour, you’ll probably like the full suite.
Collection:

Related Spaces in the suite:

What would be great (feedback)

If you try it, I’d love feedback on:

  • missing event types you’d want for real agent runs
  • whether the exported bundle format is clear enough for third-party review
  • what would make “verification” feel more obvious to non-technical users
  • any edge cases where you think the recorder could be tricked

If you build agents and care about reproducibility + auditability, this should be useful. Cheers.

RFTsystems - Liam Grinstead :nerd_face:

#AI #Agents #LLM #Auditability #Verification #Reproducibility #Security #MLOps #AIEngineering #Provenance #Cryptography #Ed25519 #HashChain #Forensics #Observability #Governance #RAG #MemorySystems :locked: Gradio #OpenSource

1 Like