CIMP: Contrastive Image-Metadata Pre-training (ResNet-18, crop 512)
A contrastive encoder that aligns HAADF-STEM microscopy images with their acquisition metadata in a shared 128-d embedding space. This variant uses a ResNet-18 image encoder trained from scratch on 512Γ512 patches at effective batch size 512, and is the best-performing ResNet configuration reported in the accompanying paper.
Model Details
- Architecture: ResNet-18 image encoder (trained from scratch, single-channel input) + 3-layer MLP metadata encoder (hidden dim 256)
- Embedding dimension: 128
- Image input: Single-channel grayscale, 512Γ512 pixels
- Metadata input: 7-d z-scored vector (pixel_size, dwell_time, convergence_angle, beam_current, gain, offset, inner_collection_angle)
- Loss: Symmetric cross-entropy (CLIP-style) with learnable temperature and bias
- Parameters: ~11M (ResNet-18 backbone)
Retrieval Performance
Evaluated on the held-out validation split (733 images from the CMMP dataset).
| Metric | Value |
|---|---|
| Top-1 | 0.8594 |
| Top-5 | 1.0000 |
| Top-10 | 1.0000 |
| Best epoch | 956 / 1000 |
Context among CMMP variants
| Variant | Top-1 | Top-5 | Top-10 |
|---|---|---|---|
| This model (ResNet-18, crop 512, batch 512) | 0.859 | 1.000 | 1.000 |
| ResNet-18, crop 256 | 0.844 | 0.969 | 0.984 |
| ViT-pretrained, crop 256 | 0.828 | 0.969 | 1.000 |
Linear-Probe Metadata Recovery
A Ridge regression ($\alpha = 1.0$) trained on the frozen visual embedding recovers all seven acquisition parameters. Coefficient of determination ($R^2$), SMAPE (in physical units), and Pearson $r$:
| Dimension | $R^2$ | SMAPE | Pearson $r$ |
|---|---|---|---|
| pixel_size | 0.748 | 39.5% | 0.867 |
| dwell_time | 0.819 | 25.9% | 0.905 |
| convergence_angle | 0.629 | 11.6% | 0.793 |
| beam_current | 0.695 | 33.4% | 0.835 |
| gain | 0.862 | 5.0% | 0.929 |
| offset | 0.824 | 9.1% | 0.912 |
| inner_coll_angle | 0.626 | 8.5% | 0.792 |
| Mean | 0.743 | 19.0% | 0.862 |
The higher SMAPE on pixel_size, dwell_time, and beam_current is expected: those dimensions are stored log10-transformed because they span several orders of magnitude in physical units, so small residuals in log-space amplify when exponentiated back.
Training Configuration
| Parameter | Value |
|---|---|
| Dataset | CMMP HAADF-STEM (7,330 images, 6,597/733 train/val) |
| Image encoder | ResNet-18 (trained from scratch, 1-channel input) |
| Metadata encoder | 3-layer MLP, hidden dim 256 |
| Crop size | 512Γ512 (on-the-fly from full-resolution images) |
| Loss function | CLIP (symmetric cross-entropy) with learnable logit bias |
| Optimizer | AdamW (lr=1e-4, weight_decay=0.01) |
| Scheduler | Cosine annealing (LR floor ~1e-10 by epoch 1000) |
| Batch size | 64 per GPU Γ 8 GPUs = 512 effective |
| Epochs | 1000 (best checkpoint at epoch 956) |
| Hardware | 8Γ H100 GPUs |
Usage
import torch
from models import CMMP
# Load model
model = CMMP(
meta_input_dim=7,
embed_dim=128,
image_encoder="resnet18",
image_size=512,
meta_hidden_dim=256,
meta_num_layers=3,
)
model.load_state_dict(torch.load("model.pth", map_location="cpu"))
model.eval()
# Embed an image and its metadata
image = torch.randn(1, 1, 512, 512) # single-channel grayscale [0, 1]
metadata = torch.randn(1, 7) # z-scored metadata vector
with torch.no_grad():
img_emb, meta_emb, temp, bias = model(image, metadata)
# img_emb: (1, 128) β L2-normalized image embedding
# meta_emb: (1, 128) β L2-normalized metadata embedding
Files
model.pthβ Best checkpoint (epoch 956, highest Top-1 on val)last.pthβ Final checkpoint (epoch 1000)config.jsonβ Full training configuration (args.jsonfrom the run)training_log.csvβ Per-epoch training metricssplit_indices.npyβ Train/val split indices (seed 67) for reproducibilitylinear_probe_metadata.jsonβ Ridge-probe metadata recovery metrics
Related Models
- Stemson-AI/cmmp-resnet18-256 β Earlier ResNet-18 variant trained at crop 256
- Stemson-AI/cmmp-vit-pretrained-256 β ViT-B/16 variant
- Stemson-AI/cmmp-vit-pretrained-256-with-sample-atomagined β ViT variant trained with atomagined simulated data
Citation
@misc{cimp2026,
title={Contrastive Image-Metadata Pre-training for Materials Transmission Electron Microscopy},
author={Channing, Georgia and Keller, Debora and Rossell, Marta D. and Torr, Philip and Erni, Rolf and Helveg, Stig and Eliasson, Henrik},
year={2026},
}
- Downloads last month
- 16