12b7eea334e195972fda879fccaa3d4a

This model is a fine-tuned version of google/umt5-base on the Helsinki-NLP/opus_books [de-fr] dataset. It achieves the following results on the evaluation set:

  • Loss: 1.7793
  • Data Size: 1.0
  • Epoch Runtime: 205.0323
  • Bleu: 9.2706

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 11.1853 0 16.8775 0.0954
No log 1 872 10.7783 0.0078 18.1996 0.1123
No log 2 1744 10.1798 0.0156 20.4251 0.1555
0.2703 3 2616 9.7975 0.0312 23.4899 0.1377
0.8809 4 3488 7.7114 0.0625 29.8361 0.2158
9.1948 5 4360 4.6778 0.125 41.7857 0.9647
4.2417 6 5232 2.7966 0.25 64.4291 2.8274
3.2175 7 6104 2.4069 0.5 110.3889 4.3369
2.8296 8.0 6976 2.1887 1.0 201.2492 5.4576
2.6175 9.0 7848 2.0855 1.0 201.5218 6.0483
2.5123 10.0 8720 2.0231 1.0 203.7766 6.5073
2.4066 11.0 9592 1.9783 1.0 204.0477 6.8182
2.2775 12.0 10464 1.9493 1.0 205.4439 7.0689
2.2363 13.0 11336 1.9083 1.0 204.6456 7.2838
2.1466 14.0 12208 1.8907 1.0 204.6194 7.4938
2.1032 15.0 13080 1.8707 1.0 204.2757 7.6754
2.0502 16.0 13952 1.8485 1.0 206.7366 7.8382
1.9568 17.0 14824 1.8301 1.0 207.7873 7.9788
1.9315 18.0 15696 1.8225 1.0 203.7663 8.1180
1.9148 19.0 16568 1.8116 1.0 205.0227 8.2364
1.8543 20.0 17440 1.7997 1.0 201.9450 8.3445
1.8677 21.0 18312 1.7845 1.0 203.1910 8.4587
1.8084 22.0 19184 1.7864 1.0 202.1538 8.5304
1.7365 23.0 20056 1.7742 1.0 204.7126 8.5680
1.7111 24.0 20928 1.7745 1.0 202.5110 8.6804
1.6884 25.0 21800 1.7700 1.0 206.2581 8.7424
1.668 26.0 22672 1.7731 1.0 203.0778 8.8157
1.631 27.0 23544 1.7631 1.0 201.7815 8.9063
1.5975 28.0 24416 1.7713 1.0 204.1630 8.9118
1.5381 29.0 25288 1.7637 1.0 202.4205 8.9410
1.5759 30.0 26160 1.7693 1.0 203.7057 8.9786
1.5469 31.0 27032 1.7613 1.0 202.3730 9.0594
1.4868 32.0 27904 1.7612 1.0 203.5100 9.0807
1.4895 33.0 28776 1.7609 1.0 203.5876 9.1370
1.4205 34.0 29648 1.7661 1.0 202.8884 9.2094
1.4164 35.0 30520 1.7723 1.0 202.6136 9.2228
1.4079 36.0 31392 1.7753 1.0 202.4356 9.2698
1.3664 37.0 32264 1.7793 1.0 205.0323 9.2706

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
1
Safetensors
Model size
1.0B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/12b7eea334e195972fda879fccaa3d4a

Base model

google/umt5-base
Finetuned
(48)
this model