ESPResso V2: Carbon Footprint Prediction Model
LUPI-Enhanced Multi-Encoder with Dual CLS for predicting product-level carbon footprints in textiles. 546K parameters, 4 output heads (raw materials, transport, processing, packaging). Achieves R2=0.988 on test set. Developed at the University of Amsterdam.
Architecture
Input Features
|
+-- MaterialEncoder ---------> mat_emb [B, 96]
| emb(48) + pct -> MLP(96) + 2-head self-attn + LN
|
+-- StepLocProxy ------------> proxy_t [B, 96], proxy_p [B, 96]
| step_emb(32) + coords(32) -> MLP(96)
| dual CLS tokens + 4-head self-attn
| gated geo fusion (27 features -> sigmoid gate)
|
+-- ProductEncoder ----------> product_emb [B, 64]
| cat_emb(24) + subcat_emb(24) + weight + zscore + coverage + flags
|
+-- TransportEncoder --------> transport_emb [B, 96] (TRAINING ONLY)
| 6 log-distances -> MLP(96)
|
+-- MaterialLocAssignment ---> mat_loc_feature [B, 32]
cross-attn(48) + Sinkhorn(3 iters) -> MLP(32)
|
v
Residual Trunk (192-dim, 3 blocks, LN + Linear + GELU + Dropout)
|
+-- head_raw_materials -----> pred [B, 1]
+-- head_transport ---------> pred [B, 1] (trunk + mat_loc_feature)
+-- head_processing --------> pred [B, 1] (+ gradient-isolated branch)
+-- head_packaging ---------> pred [B, 1] (+ direct shortcut bypass)
Key Design Decisions
- LUPI Distillation: Transport data (per-leg distances, modes) is unavailable at inference. The TransportEncoder sees ground-truth distances during training. RKD distillation (rkd_alpha=0.5) transfers knowledge to the StepLocProxy. Warmup over 10 epochs, peak at 0.10, decay to floor of 0.02.
- Dual CLS Tokens: Two learned CLS tokens prepended to the step-location sequence. CLS_transport is distilled toward the privileged encoder; CLS_processing trains freely via task loss. Prevents transport optimization from corrupting processing attention.
- Gated Geo Fusion: 27 geographic features (haversine stats + distance histogram + top-K pairs) pass through a learned sigmoid gate, preventing the MLP from ignoring the attention mechanism.
- MaterialLocAssignment: Cross-attention with Sinkhorn normalization (3 iterations) learns soft doubly-stochastic assignments between materials and locations (the dataset does not specify which material is processed where).
- Packaging Shortcut: Direct bypass predicts packaging from mass + category embeddings. Summed with trunk output.
- Gradient-Isolated Processing: Separate MLP from detached StepLocProxy output provides independent gradient path for processing head.
Results
Overall test set:
| Metric | Value |
|---|---|
| R2 | 0.988 |
| MAE | 0.479 kgCO2e |
| SMAPE | 6.3% |
Per-component:
| Component | MAE | R2 | SMAPE |
|---|---|---|---|
| Raw materials | 0.229 kgCO2e | 0.992 | 6.2% |
| Processing | 0.307 kgCO2e | 0.971 | 9.7% |
| Transport | 0.048 kgCO2e | 0.691 | 18.6% |
| Packaging | 0.010 kgCO2e | 0.964 | 4.2% |
Transport is weakest (R2=0.691) because the proxy must infer distances from indirect geographic signals without actual route data at inference time. LUPI distillation substantially improves over a naive proxy.
Tier Degradation
Six data availability tiers (A=minimal through F=full traceability). Overall degradation factor: 2.4x (Tier A MAE / Tier F MAE). Degradation concentrates in transport and processing heads where geographic/manufacturing data provides the most signal.
Tier distribution during training: A:10%, B:15%, C:20%, D:20%, E:20%, F:15%.
Training Configuration
| Parameter | Value |
|---|---|
| Optimizer | AdamW (differential LR: 0.2x attention, 1x MLP) |
| Learning rate | 5e-4 base, cosine schedule |
| Batch size | 1024 |
| Weight decay | 0.01 (MLP), 0.005 (attention), 0.0 (embeddings) |
| Curriculum warmup | 30 epochs |
| LUPI priv_ratio | 0.60 |
| Target transform | log1p + z-score normalization |
| Epochs | 105 |
| Parameters | ~546,000 |
Loss Function
Three-group hierarchical loss:
- Main task: 4 per-component losses (MSE for raw/processing, log-cosh for transport/packaging) with analytical UW-SO + DB-MTL log-normalization, 10% floor per head
- Auxiliary: distance prediction, mode fraction, weight prediction (aux_alpha=0.1)
- Structural: RKD distillation, proxy diversity (cosine), attention entropy regularization (entropy_alpha=0.01)
Training Data
Dataset: Tr4m0ryp/espresso-v2-carbon-water-data
49,732 records, 70/15/15 train/val/test split stratified by category.
Limitations
- Trained on synthetic data from ESPResso V2 pipeline; not a substitute for formal LCA
- Transport head is weakest due to LUPI proxy limitations
- Emission factors from EcoInvent 3.12 (2024 vintage)
- Covers 47 textile product categories; not designed for non-textile products
- Requires the same preprocessing pipeline (log1p + z-score) used during training
Citation
@misc{espresso-v2-2026,
title={ESPResso V2: LLM-Orchestrated Synthetic Data Pipeline and Neural Estimation of Product-Level Carbon and Water Footprints in Textiles},
author={Ouallaf, Moussa},
year={2026},
institution={University of Amsterdam},
url={https://github.com/tr4m0ryp/ESPResso-V2}
}
License
CC BY-SA 4.0.