ESPResso V2: Carbon Footprint Prediction Model

LUPI-Enhanced Multi-Encoder with Dual CLS for predicting product-level carbon footprints in textiles. 546K parameters, 4 output heads (raw materials, transport, processing, packaging). Achieves R2=0.988 on test set. Developed at the University of Amsterdam.

Architecture

Input Features
    |
    +-- MaterialEncoder ---------> mat_emb [B, 96]
    |     emb(48) + pct -> MLP(96) + 2-head self-attn + LN
    |
    +-- StepLocProxy ------------> proxy_t [B, 96], proxy_p [B, 96]
    |     step_emb(32) + coords(32) -> MLP(96)
    |     dual CLS tokens + 4-head self-attn
    |     gated geo fusion (27 features -> sigmoid gate)
    |
    +-- ProductEncoder ----------> product_emb [B, 64]
    |     cat_emb(24) + subcat_emb(24) + weight + zscore + coverage + flags
    |
    +-- TransportEncoder --------> transport_emb [B, 96]  (TRAINING ONLY)
    |     6 log-distances -> MLP(96)
    |
    +-- MaterialLocAssignment ---> mat_loc_feature [B, 32]
          cross-attn(48) + Sinkhorn(3 iters) -> MLP(32)
    |
    v
Residual Trunk (192-dim, 3 blocks, LN + Linear + GELU + Dropout)
    |
    +-- head_raw_materials -----> pred [B, 1]
    +-- head_transport ---------> pred [B, 1]  (trunk + mat_loc_feature)
    +-- head_processing --------> pred [B, 1]  (+ gradient-isolated branch)
    +-- head_packaging ---------> pred [B, 1]  (+ direct shortcut bypass)

Key Design Decisions

  • LUPI Distillation: Transport data (per-leg distances, modes) is unavailable at inference. The TransportEncoder sees ground-truth distances during training. RKD distillation (rkd_alpha=0.5) transfers knowledge to the StepLocProxy. Warmup over 10 epochs, peak at 0.10, decay to floor of 0.02.
  • Dual CLS Tokens: Two learned CLS tokens prepended to the step-location sequence. CLS_transport is distilled toward the privileged encoder; CLS_processing trains freely via task loss. Prevents transport optimization from corrupting processing attention.
  • Gated Geo Fusion: 27 geographic features (haversine stats + distance histogram + top-K pairs) pass through a learned sigmoid gate, preventing the MLP from ignoring the attention mechanism.
  • MaterialLocAssignment: Cross-attention with Sinkhorn normalization (3 iterations) learns soft doubly-stochastic assignments between materials and locations (the dataset does not specify which material is processed where).
  • Packaging Shortcut: Direct bypass predicts packaging from mass + category embeddings. Summed with trunk output.
  • Gradient-Isolated Processing: Separate MLP from detached StepLocProxy output provides independent gradient path for processing head.

Results

Overall test set:

Metric Value
R2 0.988
MAE 0.479 kgCO2e
SMAPE 6.3%

Per-component:

Component MAE R2 SMAPE
Raw materials 0.229 kgCO2e 0.992 6.2%
Processing 0.307 kgCO2e 0.971 9.7%
Transport 0.048 kgCO2e 0.691 18.6%
Packaging 0.010 kgCO2e 0.964 4.2%

Transport is weakest (R2=0.691) because the proxy must infer distances from indirect geographic signals without actual route data at inference time. LUPI distillation substantially improves over a naive proxy.

Tier Degradation

Six data availability tiers (A=minimal through F=full traceability). Overall degradation factor: 2.4x (Tier A MAE / Tier F MAE). Degradation concentrates in transport and processing heads where geographic/manufacturing data provides the most signal.

Tier distribution during training: A:10%, B:15%, C:20%, D:20%, E:20%, F:15%.

Training Configuration

Parameter Value
Optimizer AdamW (differential LR: 0.2x attention, 1x MLP)
Learning rate 5e-4 base, cosine schedule
Batch size 1024
Weight decay 0.01 (MLP), 0.005 (attention), 0.0 (embeddings)
Curriculum warmup 30 epochs
LUPI priv_ratio 0.60
Target transform log1p + z-score normalization
Epochs 105
Parameters ~546,000

Loss Function

Three-group hierarchical loss:

  • Main task: 4 per-component losses (MSE for raw/processing, log-cosh for transport/packaging) with analytical UW-SO + DB-MTL log-normalization, 10% floor per head
  • Auxiliary: distance prediction, mode fraction, weight prediction (aux_alpha=0.1)
  • Structural: RKD distillation, proxy diversity (cosine), attention entropy regularization (entropy_alpha=0.01)

Training Data

Dataset: Tr4m0ryp/espresso-v2-carbon-water-data

49,732 records, 70/15/15 train/val/test split stratified by category.

Limitations

  • Trained on synthetic data from ESPResso V2 pipeline; not a substitute for formal LCA
  • Transport head is weakest due to LUPI proxy limitations
  • Emission factors from EcoInvent 3.12 (2024 vintage)
  • Covers 47 textile product categories; not designed for non-textile products
  • Requires the same preprocessing pipeline (log1p + z-score) used during training

Citation

@misc{espresso-v2-2026,
  title={ESPResso V2: LLM-Orchestrated Synthetic Data Pipeline and Neural Estimation of Product-Level Carbon and Water Footprints in Textiles},
  author={Ouallaf, Moussa},
  year={2026},
  institution={University of Amsterdam},
  url={https://github.com/tr4m0ryp/ESPResso-V2}
}

License

CC BY-SA 4.0.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Tr4m0ryp/espresso-v2-carbon-model