DenseMem-Folded Ram Tech

Zoberzzz · April 23, 2026, 3:49pm

Title: I built a KV cache compression protocol — 256x ratio, 0.9994 fidelity, running live on an RTX 4090

Hey r/LocalLLaMA,

I’ve been running a 72B model’s full KV cache in 640MB of DDR5 RAM on my RTX 4090 + Core i9. Wanted to share what I built.

DenseMem v0.2.0 — FoldedMemory Protocol

The problem: a 72B model at 32K context needs ~160GB of KV cache. That’s H100 territory. Most of us can’t touch it.

The insight: KV cache activations aren’t random. They’re highly structured and correlated. SVD at rank=64 exploits that geometry. The compression is lossy in theory but in practice the fidelity holds at 0.9994 cosine similarity — because real transformer activations live in a low-dimensional subspace.

Live benchmark (RTX 4090 + Core i9 + DDR5):

Compression: 256x
Fidelity: 0.9994 cosine similarity
Negative control (random noise): 0.12 — confirms it’s exploiting structure, not luck
Avg fetch latency: 1.95ms
Max fetch latency under load: 3.96ms
Evictions: 2,944 clean
16,384 MB → 63.9 MB live test

Architecture:

Two-tier hierarchy — VRAM hot, DDR5 warm. Attention-weighted eviction (0.5 attn + 0.3 recency + 0.2 freq). Prefetcher using layer lookahead + sequential token prediction. Two-method API: store() and fetch().

Current limitation:

Hit rate is 25% — my i9’s 2-channel DDR5 is the bottleneck (~38 GB/s). On Threadripper PRO 8-channel DDR5 (~224 GB/s) I’m projecting 65-75% hit rate with sub-2ms latency.

Running live:

Qwen2.5-7B at 32K context on a single 4090. Every tick compressed INT8 via PCA into DDR5. Context went from 4K to 32K — 8x expansion via DenseMem.

Cost:

Uncompressed 72B KV cache at 32K ctx: $32,000 in HBM3e.
FoldedMemory: $1.88 in DDR5.

GitHub: GitHub - thorshammerztp-arch/densemem-protocol: DenseMem protocol package · GitHub Patent pending (US 64/045,595).

Happy to answer questions on the compression math, architecture, or benchmark methodology.

Built by a solo developer / Navy veteran on personal hardware. No funding.

Topic		Replies	Views
Densemem-Folded-Ram Tech Awesome paper	0	9	April 23, 2026
KV Cache Compression Research	2	138	February 10, 2026
KV cache for llama and comfui Show and Tell	0	15	April 15, 2026
Generate: using k-v cache is faster but no difference to memory usage 🤗Transformers	5	16669	June 3, 2025
RAG Retriever : Exact vs. Compressed Index? Models	3	1145	November 10, 2020

DenseMem-Folded Ram Tech

Related topics