12b7eea334e195972fda879fccaa3d4a

This model is a fine-tuned version of google/umt5-base on the Helsinki-NLP/opus_books [de-fr] dataset. It achieves the following results on the evaluation set:

Loss: 1.7793
Data Size: 1.0
Epoch Runtime: 205.0323
Bleu: 9.2706

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	11.1853	0	16.8775	0.0954
No log	1	872	10.7783	0.0078	18.1996	0.1123
No log	2	1744	10.1798	0.0156	20.4251	0.1555
0.2703	3	2616	9.7975	0.0312	23.4899	0.1377
0.8809	4	3488	7.7114	0.0625	29.8361	0.2158
9.1948	5	4360	4.6778	0.125	41.7857	0.9647
4.2417	6	5232	2.7966	0.25	64.4291	2.8274
3.2175	7	6104	2.4069	0.5	110.3889	4.3369
2.8296	8.0	6976	2.1887	1.0	201.2492	5.4576
2.6175	9.0	7848	2.0855	1.0	201.5218	6.0483
2.5123	10.0	8720	2.0231	1.0	203.7766	6.5073
2.4066	11.0	9592	1.9783	1.0	204.0477	6.8182
2.2775	12.0	10464	1.9493	1.0	205.4439	7.0689
2.2363	13.0	11336	1.9083	1.0	204.6456	7.2838
2.1466	14.0	12208	1.8907	1.0	204.6194	7.4938
2.1032	15.0	13080	1.8707	1.0	204.2757	7.6754
2.0502	16.0	13952	1.8485	1.0	206.7366	7.8382
1.9568	17.0	14824	1.8301	1.0	207.7873	7.9788
1.9315	18.0	15696	1.8225	1.0	203.7663	8.1180
1.9148	19.0	16568	1.8116	1.0	205.0227	8.2364
1.8543	20.0	17440	1.7997	1.0	201.9450	8.3445
1.8677	21.0	18312	1.7845	1.0	203.1910	8.4587
1.8084	22.0	19184	1.7864	1.0	202.1538	8.5304
1.7365	23.0	20056	1.7742	1.0	204.7126	8.5680
1.7111	24.0	20928	1.7745	1.0	202.5110	8.6804
1.6884	25.0	21800	1.7700	1.0	206.2581	8.7424
1.668	26.0	22672	1.7731	1.0	203.0778	8.8157
1.631	27.0	23544	1.7631	1.0	201.7815	8.9063
1.5975	28.0	24416	1.7713	1.0	204.1630	8.9118
1.5381	29.0	25288	1.7637	1.0	202.4205	8.9410
1.5759	30.0	26160	1.7693	1.0	203.7057	8.9786
1.5469	31.0	27032	1.7613	1.0	202.3730	9.0594
1.4868	32.0	27904	1.7612	1.0	203.5100	9.0807
1.4895	33.0	28776	1.7609	1.0	203.5876	9.1370
1.4205	34.0	29648	1.7661	1.0	202.8884	9.2094
1.4164	35.0	30520	1.7723	1.0	202.6136	9.2228
1.4079	36.0	31392	1.7753	1.0	202.4356	9.2698
1.3664	37.0	32264	1.7793	1.0	205.0323	9.2706

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 1

Safetensors

Model size

1.0B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/12b7eea334e195972fda879fccaa3d4a

Base model

google/umt5-base

Finetuned

(48)

this model