RP model trained with GRPO
Note Lora grpo
Note Lora grpo wirh reasoning
Note Model merging of all RP models and base model ft evolution merging
Note MoE version, finetuned