view article Article Preference Tuning LLMs with Direct Preference Optimization Methods +3 Jan 18, 2024 • 77
view post Post 2393 New TRL + OpenEnv example! 💥Fine tune an LLM for playing Sudoku using an RL env via OpenEnvIncludes a script that runs on 1 or multiple GPUs with vLLM, plus a Colab-ready notebook.Enjoy!Notebook: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/openenv_sudoku_grpo.ipynbScript: https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/sudoku.py See translation 1 reply · 🔥 4 4 + Reply
view article Article Transformers v5: Simple model definitions powering the AI ecosystem +2 Dec 1, 2025 • 288