--- library_name: peft license: apache-2.0 base_model: Qwen/Qwen3-32B tags: - axolotl - base_model:adapter:Qwen/Qwen3-32B - lora - transformers datasets: - sam2ai/en-oriya-translation pipeline_tag: text-generation model-index: - name: qwen3-32b-en-indic-mt results: [] --- [

](https://github.com/axolotl-ai-cloud/axolotl)

See axolotl config

axolotl version: `0.12.2` ```yaml base_model: Qwen/Qwen3-32B # Automatically upload checkpoint and final model to HF hub_model_id: sam2ai/qwen3-32b-en-indic-mt #plugins: #- axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin strict: false chat_template: qwen3 datasets: - path: sam2ai/en-oriya-translation type: chat_template field_messages: conversations message_property_mappings: role: from content: value roles: assistant: - gpt user: - human val_set_size: 0.0 output_dir: ./outputs/Qwen3/Qwen3-32b-wat25 dataset_prepared_path: last_run_prepared sequence_len: 1096 sample_packing: true eval_sample_packing: true load_in_4bit: true adapter: qlora lora_r: 16 lora_alpha: 32 lora_target_modules: - q_proj - k_proj - v_proj - o_proj - down_proj - up_proj lora_mlp_kernel: true lora_qkv_kernel: true lora_o_kernel: true wandb_project: QWEN3-WAT2025 wandb_entity: wandb_watch: wandb_name: Qwen3-27B-en-indic-mt wandb_log_model: gradient_accumulation_steps: 4 micro_batch_size: 2 num_epochs: 1 optimizer: adamw_torch_4bit lr_scheduler: cosine learning_rate: 0.0002 bf16: auto tf32: false gradient_checkpointing: offload gradient_checkpointing_kwargs: use_reentrant: false resume_from_checkpoint: logging_steps: 1 flash_attention: true warmup_ratio: 0.1 evals_per_epoch: 4 saves_per_epoch: 1 weight_decay: 0.0 special_tokens: # save_first_step: true # unc # # # omment this to validate checkpoint saving works with your config ```

# qwen3-32b-en-indic-mt This model is a fine-tuned version of [Qwen/Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) on the sam2ai/en-oriya-translation dataset. ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0002 - train_batch_size: 2 - eval_batch_size: 2 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - gradient_accumulation_steps: 4 - total_train_batch_size: 64 - total_eval_batch_size: 16 - optimizer: Use OptimizerNames.ADAMW_TORCH_4BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 274 - training_steps: 2745 ### Training results ### Framework versions - PEFT 0.17.0 - Transformers 4.55.2 - Pytorch 2.7.0+gitf717b2a - Datasets 4.0.0 - Tokenizers 0.21.4