---
license: mit
tags:
- image-classification
- remote-sensing
- resnet
- pytorch
- transformers
- self-supervised-learning
- contrastive-learning
- moco
---

# MoCo-TP-ResNet-50

ResNet-50 model pre-trained using MoCo-v2 with Temporal Pairing (TP) for geography-aware self-supervised learning on remote sensing images.

## Model Details

- **Architecture:** ResNet-50
- **Pre-training:** MoCo-v2 with Temporal Pairing (TP)
- **Input size:** 224×224×3
- **Feature dimension:** 2048 (before classification head)
- **Parameters:** ~23.6M
- **Training:** Self-supervised pre-training on fMoW dataset (200 epochs)

## Usage

### Feature Extraction

```python
from transformers import AutoModelForImageClassification
import torch

# Load model for feature extraction
model = AutoModelForImageClassification.from_pretrained(
    "BiliSakura/MoCo-TP-ResNet-50",
    trust_remote_code=True
)

# Inference - extract features
model.eval()
input_image = torch.randn(1, 3, 224, 224)  # (batch, channels, height, width)

with torch.no_grad():
    outputs = model(pixel_values=input_image, return_dict=True)
    features = outputs["features"]  # Shape: (1, 2048)
```

### Fine-tuning for Classification

To fine-tune the model for a specific classification task, you can add a classification head:

```python
from transformers import AutoModelForImageClassification, AutoConfig
import torch.nn as nn

# Load config and modify num_labels
config = AutoConfig.from_pretrained(
    "BiliSakura/MoCo-TP-ResNet-50",
    trust_remote_code=True
)
config.num_labels = 10  # Your number of classes

# Load model
model = AutoModelForImageClassification.from_pretrained(
    "BiliSakura/MoCo-TP-ResNet-50",
    config=config,
    trust_remote_code=True
)

# The model will automatically replace the identity head with a classification head
# Now you can fine-tune on your dataset
```

## Model Architecture

The model consists of:
- **Backbone:** ResNet-50 (conv1, bn1, layer1-4)
- **Feature extractor:** Adaptive average pooling + flattening
- **Classification head:** Linear layer (2048 -> num_labels), or Identity for feature extraction

## Pre-training Details

This model was pre-trained using:
- **Method:** MoCo-v2 (Momentum Contrast) with Temporal Pairing
- **Dataset:** fMoW (Functional Map of the World)
- **Epochs:** 200
- **Loss:** Contrastive Predictive Coding (CPC)
- **Augmentation:** MoCo v2 augmentation (random resized crop, color jitter, grayscale, Gaussian blur)

## Citation

If you use this model, please cite the original Geography-Aware SSL paper:

```bibtex
@article{ayush2021geography,
    title={Geography-Aware Self-Supervised Learning},
    author={Ayush, Kumar and Uzkent, Burak and Meng, Chenlin and Tanmay, Kumar and Burke, Marshall and Lobell, David and Ermon, Stefano},
    journal={ICCV},
    year={2021}
}
```

**Original Repository:** [sustainlab-group/geography-aware-ssl](https://github.com/sustainlab-group/geography-aware-ssl)

## License

MIT License - for academic use only.