--- license: mit tags: - image-classification - remote-sensing - resnet - pytorch - transformers - self-supervised-learning - contrastive-learning - moco --- # MoCo-TP-ResNet-50 ResNet-50 model pre-trained using MoCo-v2 with Temporal Pairing (TP) for geography-aware self-supervised learning on remote sensing images. ## Model Details - **Architecture:** ResNet-50 - **Pre-training:** MoCo-v2 with Temporal Pairing (TP) - **Input size:** 224×224×3 - **Feature dimension:** 2048 (before classification head) - **Parameters:** ~23.6M - **Training:** Self-supervised pre-training on fMoW dataset (200 epochs) ## Usage ### Feature Extraction ```python from transformers import AutoModelForImageClassification import torch # Load model for feature extraction model = AutoModelForImageClassification.from_pretrained( "BiliSakura/MoCo-TP-ResNet-50", trust_remote_code=True ) # Inference - extract features model.eval() input_image = torch.randn(1, 3, 224, 224) # (batch, channels, height, width) with torch.no_grad(): outputs = model(pixel_values=input_image, return_dict=True) features = outputs["features"] # Shape: (1, 2048) ``` ### Fine-tuning for Classification To fine-tune the model for a specific classification task, you can add a classification head: ```python from transformers import AutoModelForImageClassification, AutoConfig import torch.nn as nn # Load config and modify num_labels config = AutoConfig.from_pretrained( "BiliSakura/MoCo-TP-ResNet-50", trust_remote_code=True ) config.num_labels = 10 # Your number of classes # Load model model = AutoModelForImageClassification.from_pretrained( "BiliSakura/MoCo-TP-ResNet-50", config=config, trust_remote_code=True ) # The model will automatically replace the identity head with a classification head # Now you can fine-tune on your dataset ``` ## Model Architecture The model consists of: - **Backbone:** ResNet-50 (conv1, bn1, layer1-4) - **Feature extractor:** Adaptive average pooling + flattening - **Classification head:** Linear layer (2048 -> num_labels), or Identity for feature extraction ## Pre-training Details This model was pre-trained using: - **Method:** MoCo-v2 (Momentum Contrast) with Temporal Pairing - **Dataset:** fMoW (Functional Map of the World) - **Epochs:** 200 - **Loss:** Contrastive Predictive Coding (CPC) - **Augmentation:** MoCo v2 augmentation (random resized crop, color jitter, grayscale, Gaussian blur) ## Citation If you use this model, please cite the original Geography-Aware SSL paper: ```bibtex @article{ayush2021geography, title={Geography-Aware Self-Supervised Learning}, author={Ayush, Kumar and Uzkent, Burak and Meng, Chenlin and Tanmay, Kumar and Burke, Marshall and Lobell, David and Ermon, Stefano}, journal={ICCV}, year={2021} } ``` **Original Repository:** [sustainlab-group/geography-aware-ssl](https://github.com/sustainlab-group/geography-aware-ssl) ## License MIT License - for academic use only.