zswzswzsw commited on
Commit
66407c5
·
verified ·
1 Parent(s): 033d853

Upload folder using huggingface_hub

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .github/CODEOWNERS +18 -0
  2. .github/PULL_REQUEST_TEMPLATE.md +44 -0
  3. .github/dependabot.yml +9 -0
  4. .github/workflows/check-pr-title.yml +52 -0
  5. .github/workflows/checkpoint_converter.yml +136 -0
  6. .github/workflows/cpu_unit_tests.yml +83 -0
  7. .github/workflows/disabled/e2e_prime.yml +66 -0
  8. .github/workflows/doc.yml +92 -0
  9. .github/workflows/e2e_ascend.yml +142 -0
  10. .github/workflows/e2e_dapo.yml +110 -0
  11. .github/workflows/e2e_eval_aime24.yml +116 -0
  12. .github/workflows/e2e_ppo_trainer.yml +407 -0
  13. .github/workflows/e2e_ppo_trainer_megatron.yml +379 -0
  14. .github/workflows/e2e_sft.yml +121 -0
  15. .github/workflows/e2e_spin.yml +89 -0
  16. .github/workflows/e2e_sppo.yml +87 -0
  17. .github/workflows/gpu_unit_tests.yml +100 -0
  18. .github/workflows/model.yml +144 -0
  19. .github/workflows/pre-commit-full.yml +30 -0
  20. .github/workflows/pre-commit.yml +33 -0
  21. .github/workflows/sanity.yml +95 -0
  22. .github/workflows/scorecard.yml +66 -0
  23. .github/workflows/secrets_scan.yml +22 -0
  24. .github/workflows/sgl.yml +124 -0
  25. .github/workflows/type-coverage-check.yml +29 -0
  26. .github/workflows/vllm.yml +131 -0
  27. .gitignore +126 -0
  28. .pre-commit-config.yaml +8 -0
  29. .readthedocs.yaml +19 -0
  30. LICENSE +202 -0
  31. Notice.txt +1 -0
  32. README.md +269 -0
  33. docker/Apptainerfile.rocm +57 -0
  34. docker/Dockerfile.awsefa +53 -0
  35. docker/Dockerfile.ngc.vllm +47 -0
  36. docker/Dockerfile.ngc.vllm0.8 +75 -0
  37. docker/Dockerfile.ngc.vllm0.8.sagemaker +46 -0
  38. docker/Dockerfile.rocm +55 -0
  39. docker/Dockerfile.sglang +55 -0
  40. docker/Dockerfile.vemlp.vllm.te +41 -0
  41. docker/Dockerfile.vllm.sglang.megatron +124 -0
  42. docker/Dockerfile.vllm.sglang.megatron.deepseek +115 -0
  43. docs/Makefile +20 -0
  44. docs/README.md +19 -0
  45. docs/README_vllm0.7.md +73 -0
  46. docs/README_vllm0.8.md +55 -0
  47. docs/_static/js/runllm-widget.js +14 -0
  48. docs/_static/logo.png +0 -0
  49. docs/advance/checkpoint.rst +159 -0
  50. docs/advance/dpo_extension.rst +271 -0
.github/CODEOWNERS ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /docs @eric-haibin-lin @zhaochenyang20 @hongpeng-guo
2
+ /docs/amd_tutorial @yushengsu-thu
3
+ /docs/slang_multiturn @zhaochenyang20 @SwordFaith
4
+
5
+ /recipe/dapo @tongyx361 @PeterSH6
6
+ /recipe/spin @zhaochenyang20
7
+ /recipe/sppo @zhaochenyang20
8
+
9
+ /third_party/sglang @zhaochenyang20 @SwordFaith
10
+ /third_party/vllm @PeterSH6 @wuxibin89
11
+ /verl/single_controller @zw0610 @wuxibin89
12
+ /verl/trainer @eric-haibin-lin @vermouth1992 @tongyx361 @PeterSH6
13
+ /verl/workers/rollout/vllm_rollout @wuxibin89 @PeterSH6 @chenhaiq
14
+ /verl/workers/rollout/sglang_rollout @zhaochenyang20 @SwordFaith @chenhaiq
15
+
16
+ /tests/single_controller @zw0610 @wuxibin89
17
+ /tests/trainer @eric-haibin-lin @vermouth1992 @tongyx361 @PeterSH6
18
+ /tests/workers/rollout/vllm_rollout @wuxibin89 @PeterSH6 @chenhaiq
.github/PULL_REQUEST_TEMPLATE.md ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ### What does this PR do?
2
+
3
+ > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review.
4
+
5
+ ### Checklist Before Starting
6
+
7
+ - [ ] Search for similar PRs. Paste at least one query link here: ...
8
+ - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI)
9
+ - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`
10
+ - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]`
11
+ - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
12
+ - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title.
13
+ - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
14
+
15
+ ### Test
16
+
17
+ > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.
18
+
19
+ ### API and Usage Example
20
+
21
+ > Demonstrate how the API changes if any, and provide usage example(s) if possible.
22
+
23
+ ```python
24
+ # Add code snippet or script demonstrating how to use this
25
+ ```
26
+
27
+ ### High-Level Design
28
+
29
+ > Demonstrate the high-level design if this PR is complex.
30
+
31
+ ### Specific Changes
32
+
33
+ > List the specific changes.
34
+
35
+ ### Checklist Before Submitting
36
+
37
+ > [!IMPORTANT]
38
+ > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
39
+
40
+ - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
41
+ - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always`
42
+ - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs).
43
+ - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ...
44
+ - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
.github/dependabot.yml ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ ## Enabled the dependabot to check the dependencies of the project
2
+ ## Dependabot will open pull requests to update dependencies automatically
3
+
4
+ version: 2
5
+ updates:
6
+ - package-ecosystem: pip
7
+ directory: "/"
8
+ schedule:
9
+ interval: weekly
.github/workflows/check-pr-title.yml ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # # Tests layout
2
+
3
+ # Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
4
+ # - `tests/trainer` for testing functionality related to `verl/trainer`
5
+ # - `tests/models` for testing functionality related to `verl/models`
6
+ # - ...
7
+
8
+ # There are a few folders with `special_` prefix, created for special purposes:
9
+ # - `special_distributed`: unit tests that must run with multiple GPUs
10
+ # - `special_e2e`: end-to-end tests with training/generation scripts
11
+ # - `special_npu`: tests for NPUs
12
+ # - `special_sanity`: a suite of quick sanity tests
13
+ # - `special_standalone`: a set of test that are designed to run in dedicated environments
14
+
15
+ # Accelerators for tests
16
+ # - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
17
+ # - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
18
+
19
+ # # Workflow layout
20
+
21
+ # All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
22
+ # 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
23
+ # 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
24
+ # 3. End-to-end tests: `e2e_*.yml`
25
+ # 4. Unit tests
26
+ # - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
27
+ # - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
28
+ # - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
29
+ # - new workflow yaml is added to `.github/workflows`
30
+ # - new tests are added to workflow mentioned in 2.
31
+
32
+
33
+ on:
34
+ pull_request:
35
+ types: [opened, edited, synchronize]
36
+
37
+ jobs:
38
+ check-title:
39
+ runs-on: ubuntu-latest
40
+ steps:
41
+ - name: Checkout code
42
+ uses: actions/checkout@v4
43
+
44
+ - name: Set up Python
45
+ uses: actions/setup-python@v5
46
+ with:
47
+ python-version: '3.11'
48
+
49
+ - name: Run PR title checker
50
+ run: python3 tests/special_sanity/check_pr_title.py
51
+ env:
52
+ PR_TITLE: ${{ github.event.pull_request.title }}
.github/workflows/checkpoint_converter.yml ADDED
@@ -0,0 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # # Tests layout
2
+
3
+ # Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
4
+ # - `tests/trainer` for testing functionality related to `verl/trainer`
5
+ # - `tests/models` for testing functionality related to `verl/models`
6
+ # - ...
7
+
8
+ # There are a few folders with `special_` prefix, created for special purposes:
9
+ # - `special_distributed`: unit tests that must run with multiple GPUs
10
+ # - `special_e2e`: end-to-end tests with training/generation scripts
11
+ # - `special_npu`: tests for NPUs
12
+ # - `special_sanity`: a suite of quick sanity tests
13
+ # - `special_standalone`: a set of test that are designed to run in dedicated environments
14
+
15
+ # Accelerators for tests
16
+ # - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
17
+ # - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
18
+
19
+ # # Workflow layout
20
+
21
+ # All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
22
+ # 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
23
+ # 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
24
+ # 3. End-to-end tests: `e2e_*.yml`
25
+ # 4. Unit tests
26
+ # - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
27
+ # - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
28
+ # - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
29
+ # - new workflow yaml is added to `.github/workflows`
30
+ # - new tests are added to workflow mentioned in 2.
31
+
32
+
33
+
34
+ name: checkpoint_converter
35
+ # latest version: Megatron-LM core_r0.11.0 https://github.com/NVIDIA/Megatron-LM/tree/core_r0.11.0
36
+
37
+ on:
38
+ # Trigger the workflow on push or pull request,
39
+ # but only for the main branch
40
+ push:
41
+ branches:
42
+ - main
43
+ - v0.*
44
+ pull_request:
45
+ branches:
46
+ - main
47
+ - v0.*
48
+ paths:
49
+ - "**/*.py"
50
+ # Other entrypoints
51
+ - "!examples/**"
52
+ - "!tests/**"
53
+ - "!verl/trainer/main_*.py"
54
+ - "!verl/trainer/fsdp_sft_trainer.py"
55
+ # Recipes
56
+ - "!recipe/**"
57
+ # FSDP
58
+ - "!verl/workers/**/*dp_*.py"
59
+ # Entrypoints
60
+ - ".github/workflows/checkpoint_converter.yml"
61
+ - ".github/workflows/e2e_ppo_trainer_megatron.yml"
62
+ - "examples/data_preprocess/gsm8k.py"
63
+ - "tests/special_e2e/run_ppo_trainer_megatron.sh"
64
+ - "verl/trainer/main_ppo.py"
65
+ - "verl/trainer/config/ppo_megatron_trainer.yaml"
66
+
67
+
68
+ # Cancel jobs on the same ref if a new one is triggered
69
+ concurrency:
70
+ group: ${{ github.workflow }}-${{ github.ref }}
71
+ cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
72
+
73
+ # Declare permissions just read content.
74
+ permissions:
75
+ contents: read
76
+
77
+ jobs:
78
+ checkpoint_converter:
79
+ runs-on: [L20x8]
80
+ timeout-minutes: 20 # Increase this timeout value as needed
81
+ env:
82
+ HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
83
+ HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
84
+ NO_PROXY: "localhost,127.0.0.1"
85
+ HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
86
+ container:
87
+ image: whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6.post5-mcore0.12.0-te2.3
88
+ options: --gpus all --shm-size=10g
89
+ steps:
90
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
91
+ with:
92
+ fetch-depth: 0
93
+ - name: Install the current repository
94
+ run: |
95
+ pip3 install -e .[test]
96
+ - name: Running Huggingface to Megatron dist_ckpt converter (Qwen/Qwen2.5-0.5B)
97
+ run: |
98
+ ray stop --force
99
+ python scripts/converter_hf_to_mcore.py --hf_model_path=${HOME}/models/Qwen/Qwen2.5-0.5B --output_path checkpoints/Qwen/Qwen2.5-0.5B --test
100
+ - name: Running Huggingface to Megatron dist_ckpt converter (deepseek-ai/deepseek-coder-1.3b-instruct)
101
+ run: |
102
+ ray stop --force
103
+ python scripts/converter_hf_to_mcore.py --hf_model_path=${HOME}/models/deepseek-ai/deepseek-coder-1.3b-instruct --output_path checkpoints/deepseek-ai/deepseek-coder-1.3b-instruct --test
104
+ - name: Clean up
105
+ run: |
106
+ rm -rf checkpoints
107
+ checkpoint_converter_large_moe_models:
108
+ runs-on: [L20x8]
109
+ timeout-minutes: 30 # Increase this timeout value as needed
110
+ env:
111
+ HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
112
+ HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
113
+ NO_PROXY: "localhost,127.0.0.1"
114
+ HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
115
+ HF_ENDPOINT: "https://hf-mirror.com"
116
+ container:
117
+ image: whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6.post5-mcore0.12.0-te2.3
118
+ options: --gpus all --shm-size=10g
119
+ steps:
120
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
121
+ with:
122
+ fetch-depth: 0
123
+ - name: Install the current repository
124
+ run: |
125
+ pip3 install -e .[test]
126
+ - name: Download Model to Use
127
+ run: |
128
+ huggingface-cli download Qwen/Qwen1.5-MoE-A2.7B-Chat --local-dir ${HOME}/models/Qwen/Qwen1.5-MoE-A2.7B-Chat
129
+ export HF_HUB_OFFLINE=1
130
+ - name: Running Huggingface to Megatron dist_ckpt CPU converter (Qwen/Qwen1.5-MoE-A2.7B-Chat)
131
+ run: |
132
+ ray stop --force
133
+ python scripts/converter_hf_to_mcore.py --hf_model_path=${HOME}/models/Qwen/Qwen1.5-MoE-A2.7B-Chat --output_path checkpoints/Qwen/Qwen1.5-MoE-A2.7B-Chat --use_cpu_initialization
134
+ - name: clean up
135
+ run: |
136
+ rm -rf checkpoints
.github/workflows/cpu_unit_tests.yml ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # # Tests layout
2
+
3
+ # Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
4
+ # - `tests/trainer` for testing functionality related to `verl/trainer`
5
+ # - `tests/models` for testing functionality related to `verl/models`
6
+ # - ...
7
+
8
+ # There are a few folders with `special_` prefix, created for special purposes:
9
+ # - `special_distributed`: unit tests that must run with multiple GPUs
10
+ # - `special_e2e`: end-to-end tests with training/generation scripts
11
+ # - `special_npu`: tests for NPUs
12
+ # - `special_sanity`: a suite of quick sanity tests
13
+ # - `special_standalone`: a set of test that are designed to run in dedicated environments
14
+
15
+ # Accelerators for tests
16
+ # - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
17
+ # - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
18
+
19
+ # # Workflow layout
20
+
21
+ # All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
22
+ # 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
23
+ # 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
24
+ # 3. End-to-end tests: `e2e_*.yml`
25
+ # 4. Unit tests
26
+ # - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
27
+ # - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
28
+ # - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
29
+ # - new workflow yaml is added to `.github/workflows`
30
+ # - new tests are added to workflow mentioned in 2.
31
+
32
+
33
+ name: cpu_unit_tests
34
+
35
+ on:
36
+ # Trigger the workflow on push or pull request,
37
+ # but only for the main branch
38
+ push:
39
+ branches:
40
+ - main
41
+ - v0.*
42
+ pull_request:
43
+ branches:
44
+ - main
45
+ - v0.*
46
+ paths:
47
+ - "**/*.py"
48
+ - .github/workflows/cpu_unit_tests.yml
49
+ - "!recipe/**/*.py"
50
+
51
+ # Cancel jobs on the same ref if a new one is triggered
52
+ concurrency:
53
+ group: ${{ github.workflow }}-${{ github.ref }}
54
+ cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
55
+
56
+ # Declare permissions just read content.
57
+ permissions:
58
+ contents: read
59
+
60
+ jobs:
61
+ cpu_unit_tests:
62
+ runs-on: ubuntu-latest
63
+ timeout-minutes: 10 # Increase this timeout value as needed
64
+ strategy:
65
+ matrix:
66
+ python-version: ["3.10"]
67
+ steps:
68
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
69
+ - name: Set up Python ${{ matrix.python-version }}
70
+ uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0
71
+ with:
72
+ python-version: ${{ matrix.python-version }}
73
+ - name: Install the current repository
74
+ run: |
75
+ pip install -e .[test,prime,geo]
76
+ pip install --upgrade "ray>=2.40.0" pillow
77
+ - name: Running CPU unit tests
78
+ run: |
79
+ [ ! -d "$HOME/verl-data" ] && git clone --depth 1 https://github.com/eric-haibin-lin/verl-data ~/verl-data
80
+ python3 examples/data_preprocess/geo3k.py
81
+ echo '[pytest]' > pytest.ini
82
+ echo 'python_files = *_on_cpu.py' >> pytest.ini
83
+ pytest -s -x tests/
.github/workflows/disabled/e2e_prime.yml ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: e2e_prime
2
+
3
+ on:
4
+ # Trigger the workflow on push or pull request,
5
+ # but only for the main branch
6
+ push:
7
+ branches:
8
+ - disabled_ci
9
+ pull_request:
10
+ branches:
11
+ - disabled_ci
12
+ paths:
13
+ - "**/*.py"
14
+ # Other entrypoints
15
+ - "!examples/**"
16
+ - "!tests/**"
17
+ - "!verl/trainer/main_*.py"
18
+ - "!verl/trainer/fsdp_sft_trainer.py"
19
+ # Other recipes
20
+ - "!recipe/**"
21
+ # Megatron
22
+ - "!verl/workers/**/megatron_*.py"
23
+ # Home
24
+ - "recipe/prime"
25
+ # Entrypoints
26
+ - ".github/workflows/e2e_prime.yml"
27
+ - "examples/data_preprocess/gsm8k.py"
28
+ - "tests/special_e2e/run_prime.sh"
29
+
30
+ # Cancel jobs on the same ref if a new one is triggered
31
+ concurrency:
32
+ group: ${{ github.workflow }}-${{ github.ref }}
33
+ cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
34
+
35
+ # Declare permissions just read content.
36
+ permissions:
37
+ contents: read
38
+
39
+ jobs:
40
+ e2e_prime:
41
+ runs-on: [L20x8]
42
+ timeout-minutes: 50 # Increase this timeout value as needed
43
+ env:
44
+ HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
45
+ HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
46
+ NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
47
+ HF_ENDPOINT: "https://hf-mirror.com"
48
+ HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
49
+ container:
50
+ image: whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6.post5-mcore0.12.0-te2.3
51
+ options: --gpus all --shm-size=10g
52
+ steps:
53
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
54
+ with:
55
+ fetch-depth: 0
56
+ - name: Install the current repository
57
+ run: |
58
+ pip3 install --no-deps -e .[test,gpu]
59
+ - name: Prepare gsm8k dataset
60
+ run: |
61
+ ray stop --force
62
+ python3 examples/data_preprocess/gsm8k.py
63
+ - name: Running GSM8K E2E with prime alg
64
+ run: |
65
+ ray stop --force
66
+ bash tests/special_e2e/run_prime.sh
.github/workflows/doc.yml ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # # Tests layout
2
+
3
+ # Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
4
+ # - `tests/trainer` for testing functionality related to `verl/trainer`
5
+ # - `tests/models` for testing functionality related to `verl/models`
6
+ # - ...
7
+
8
+ # There are a few folders with `special_` prefix, created for special purposes:
9
+ # - `special_distributed`: unit tests that must run with multiple GPUs
10
+ # - `special_e2e`: end-to-end tests with training/generation scripts
11
+ # - `special_npu`: tests for NPUs
12
+ # - `special_sanity`: a suite of quick sanity tests
13
+ # - `special_standalone`: a set of test that are designed to run in dedicated environments
14
+
15
+ # Accelerators for tests
16
+ # - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
17
+ # - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
18
+
19
+ # # Workflow layout
20
+
21
+ # All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
22
+ # 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
23
+ # 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
24
+ # 3. End-to-end tests: `e2e_*.yml`
25
+ # 4. Unit tests
26
+ # - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
27
+ # - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
28
+ # - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
29
+ # - new workflow yaml is added to `.github/workflows`
30
+ # - new tests are added to workflow mentioned in 2.
31
+
32
+
33
+ name: doc_test
34
+
35
+ on:
36
+ # Trigger the workflow on push or pull request,
37
+ # but only for the main branch
38
+ push:
39
+ branches:
40
+ - main
41
+ - v0.*
42
+ pull_request:
43
+ branches:
44
+ - main
45
+ - v0.*
46
+ paths:
47
+ - "**/*.py"
48
+ - "docs/**"
49
+ - .github/workflows/doc.yml
50
+
51
+ # Cancel jobs on the same ref if a new one is triggered
52
+ concurrency:
53
+ group: ${{ github.workflow }}-${{ github.ref }}
54
+ cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
55
+
56
+ # Declare permissions just read content.
57
+ permissions:
58
+ contents: read # for checkout
59
+ pages: write # for deploy-pages
60
+ id-token: write # for deploy-pages
61
+
62
+ jobs:
63
+ doc_test:
64
+ runs-on: ubuntu-latest
65
+ timeout-minutes: 5 # Increase this timeout value as needed
66
+ strategy:
67
+ matrix:
68
+ python-version: ["3.10"]
69
+ steps:
70
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
71
+ - name: Set up Python ${{ matrix.python-version }}
72
+ uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0
73
+ with:
74
+ python-version: ${{ matrix.python-version }}
75
+ - name: Install the current repository
76
+ run: |
77
+ pip install -e .[test]
78
+ pip install -r docs/requirements-docs.txt
79
+
80
+ - name: Run doc make html
81
+ run: |
82
+ cd docs
83
+ make clean
84
+ make html SPHINXOPTS="--keep-going -w _build/sphinx.log"
85
+ if grep -q ": ERROR:" _build/sphinx.log; then
86
+ echo "🚨 Sphinx doc build contained ERRORs - see _build/sphinx.log"
87
+ exit 1
88
+ fi
89
+ if grep -q "WARNING: document isn't included in any toctree" _build/sphinx.log; then
90
+ echo "🚨 Sphinx doc build contained WARNING. Please include newly added docs in index.rst. See _build/sphinx.log for details"
91
+ exit 1
92
+ fi
.github/workflows/e2e_ascend.yml ADDED
@@ -0,0 +1,142 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # # Tests layout
2
+
3
+ # Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
4
+ # - `tests/trainer` for testing functionality related to `verl/trainer`
5
+ # - `tests/models` for testing functionality related to `verl/models`
6
+ # - ...
7
+
8
+ # There are a few folders with `special_` prefix, created for special purposes:
9
+ # - `special_distributed`: unit tests that must run with multiple GPUs
10
+ # - `special_e2e`: end-to-end tests with training/generation scripts
11
+ # - `special_npu`: tests for NPUs
12
+ # - `special_sanity`: a suite of quick sanity tests
13
+ # - `special_standalone`: a set of test that are designed to run in dedicated environments
14
+
15
+ # Accelerators for tests
16
+ # - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
17
+ # - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
18
+
19
+ # # Workflow layout
20
+
21
+ # All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
22
+ # 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
23
+ # 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
24
+ # 3. End-to-end tests: `e2e_*.yml`
25
+ # 4. Unit tests
26
+ # - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
27
+ # - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
28
+ # - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
29
+ # - new workflow yaml is added to `.github/workflows`
30
+ # - new tests are added to workflow mentioned in 2.
31
+
32
+
33
+ name: e2e_ascend
34
+
35
+ on:
36
+ # Trigger the workflow on push or pull request,
37
+ # but only for the main branch
38
+ push:
39
+ branches:
40
+ - main
41
+ - v0.*
42
+ pull_request:
43
+ branches:
44
+ - main
45
+ paths:
46
+ - "**/*.py"
47
+ - "requirements-npu.txt"
48
+ # Other entrypoints
49
+ - "!examples/**"
50
+ - "!tests/**"
51
+ - "!verl/trainer/main_*.py"
52
+ - "!verl/trainer/fsdp_sft_trainer.py"
53
+ # Recipes
54
+ - "!recipe/**"
55
+ # Entrypoints
56
+ - ".github/workflows/e2e_ascend.yml"
57
+ - "examples/data_preprocess/gsm8k.py"
58
+ - "examples/data_preprocess/geo3k.py"
59
+ - "tests/special_e2e/ppo_trainer"
60
+ - "verl/trainer/main_ppo.py"
61
+ - "verl/trainer/config/ppo_trainer.yaml"
62
+
63
+ # Cancel jobs on the same ref if a new one is triggered
64
+ concurrency:
65
+ group: ${{ github.workflow }}-${{ github.ref }}
66
+ cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
67
+
68
+ permissions:
69
+ contents: read
70
+
71
+ jobs:
72
+ test:
73
+ name: verl Ascend test (self-host)
74
+ runs-on: [self-hosted, npu-0]
75
+ timeout-minutes: 30 # Increase this timeout value as needed
76
+ container:
77
+ image: crispig/verl_npu:cann8.1rc1-py3.10-torch2.5.1-vllm-ascend0.7.3.post1-250616
78
+ volumes:
79
+ - /usr/local/dcmi:/usr/local/dcmi
80
+ - /usr/local/bin/npu-smi:/usr/local/bin/npu-smi
81
+ - /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/
82
+ - /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info
83
+ - /etc/ascend_install.info:/etc/ascend_install.info
84
+ # Use self-host cache speed up pip and model download
85
+ # - /home/action/actions-runner/_work/cache:/github/home/.cache/
86
+ options: >-
87
+ --device /dev/davinci0
88
+ --device /dev/davinci_manager
89
+ --device /dev/devmm_svm
90
+ --device /dev/hisi_hdc
91
+ --network host
92
+ --privileged
93
+ --shm-size 16g
94
+ env:
95
+ HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
96
+ HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
97
+ NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
98
+ HF_ENDPOINT: "https://hf-mirror.com"
99
+ HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
100
+ steps:
101
+ - name: Check npu and CANN info
102
+ run: |
103
+ cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
104
+ npu-smi info
105
+ - name: Checkout volcengine/verl repo
106
+ uses: actions/checkout@v4
107
+ - name: Install the current repository
108
+ run: |
109
+ pip3 install hf_transfer peft
110
+ pip3 install -r requirements-npu.txt
111
+ pip install -e .
112
+ - name: Install torchviison
113
+ run: |
114
+ pip install torchvision==0.20.1+cpu --index-url https://download.pytorch.org/whl/cpu
115
+ - name: Prepare gsm8k dataset
116
+ run: |
117
+ ray stop --force
118
+ python3 examples/data_preprocess/gsm8k.py
119
+ - name: Prepare geo3k dataset
120
+ run: |
121
+ ray stop --force
122
+ python3 examples/data_preprocess/geo3k.py
123
+ - name: Running gsm8k e2e training tests with LoRA on ASCEND NPU
124
+ run: |
125
+ ray stop --force
126
+ bash tests/special_e2e/sft/run_sft.sh
127
+ rm -rf $HOME/ckpts
128
+ - name: Running gsm8k e2e training tests with GRPO on ASCEND NPU
129
+ run: |
130
+ ray stop --force
131
+ bash tests/special_npu/run_qwen2_5_05b_grpo.sh
132
+ rm -rf $HOME/ckpts
133
+ - name: Running geo3k e2e training tests with GRPO on ASCEND NPU
134
+ run: |
135
+ ray stop --force
136
+ bash tests/special_npu/run_qwen2_5_vl_3b_npu.sh
137
+ rm -rf $HOME/ckpts
138
+ - name: Running gsm8k e2e training tests with DAPO on ASCEND NPU
139
+ run: |
140
+ ray stop --force
141
+ bash tests/special_npu/run_qwen2_5_05b_dapo.sh
142
+ rm -rf $HOME/ckpts
.github/workflows/e2e_dapo.yml ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # # Tests layout
2
+
3
+ # Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
4
+ # - `tests/trainer` for testing functionality related to `verl/trainer`
5
+ # - `tests/models` for testing functionality related to `verl/models`
6
+ # - ...
7
+
8
+ # There are a few folders with `special_` prefix, created for special purposes:
9
+ # - `special_distributed`: unit tests that must run with multiple GPUs
10
+ # - `special_e2e`: end-to-end tests with training/generation scripts
11
+ # - `special_npu`: tests for NPUs
12
+ # - `special_sanity`: a suite of quick sanity tests
13
+ # - `special_standalone`: a set of test that are designed to run in dedicated environments
14
+
15
+ # Accelerators for tests
16
+ # - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
17
+ # - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
18
+
19
+ # # Workflow layout
20
+
21
+ # All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
22
+ # 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
23
+ # 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
24
+ # 3. End-to-end tests: `e2e_*.yml`
25
+ # 4. Unit tests
26
+ # - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
27
+ # - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
28
+ # - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
29
+ # - new workflow yaml is added to `.github/workflows`
30
+ # - new tests are added to workflow mentioned in 2.
31
+
32
+
33
+ name: e2e_dapo
34
+
35
+ on:
36
+ # Trigger the workflow on push or pull request,
37
+ # but only for the main branch
38
+ # For push, for now only anti-patterns are specified so it is more conservative
39
+ # and achieves higher coverage.
40
+ push:
41
+ branches:
42
+ - main
43
+ - v0.*
44
+ paths:
45
+ - "**/*.py"
46
+ # Other entrypoints
47
+ - "!examples/*trainer*"
48
+ - "!tests/**"
49
+ - "!verl/trainer/main_*.py"
50
+ - "!verl/trainer/fsdp_sft_trainer.py"
51
+ # Megatron
52
+ - "!verl/workers/**/megatron_*.py"
53
+ pull_request:
54
+ branches:
55
+ - main
56
+ - v0.*
57
+ paths:
58
+ - "**/*.py"
59
+ # Other entrypoints
60
+ - "!examples/**"
61
+ - "!tests/**"
62
+ - "!verl/trainer/main_*.py"
63
+ - "!verl/trainer/fsdp_sft_trainer.py"
64
+ # Other recipes
65
+ - "!recipe/**"
66
+ # Megatron
67
+ - "!verl/workers/**/megatron_*.py"
68
+ # Home
69
+ - "recipe/dapo"
70
+ # Entrypoints
71
+ - ".github/workflows/e2e_dapo.yml"
72
+ - "examples/data_preprocess/gsm8k.py"
73
+ - "tests/special_e2e/run_dapo.sh"
74
+
75
+ # Cancel jobs on the same ref if a new one is triggered
76
+ concurrency:
77
+ group: ${{ github.workflow }}-${{ github.ref }}
78
+ cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
79
+
80
+ # Declare permissions just read content.
81
+ permissions:
82
+ contents: read
83
+
84
+ jobs:
85
+ e2e_dapo:
86
+ runs-on: [L20x8]
87
+ timeout-minutes: 40 # Increase this timeout value as needed
88
+ env:
89
+ HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
90
+ HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
91
+ NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
92
+ HF_ENDPOINT: "https://hf-mirror.com"
93
+ HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
94
+ container:
95
+ image: whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6.post5-mcore0.12.0-te2.3
96
+ options: --gpus all --shm-size=10g
97
+ steps:
98
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
99
+ with:
100
+ fetch-depth: 0
101
+ - name: Install the current repository
102
+ run: |
103
+ pip3 install --no-deps -e .[test,gpu]
104
+ - name: Prepare GSM8K dataset
105
+ run: |
106
+ python3 examples/data_preprocess/gsm8k.py
107
+ - name: Running the E2E test with the DAPO algorithm
108
+ run: |
109
+ ray stop --force
110
+ bash tests/special_e2e/run_dapo.sh
.github/workflows/e2e_eval_aime24.yml ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # # Tests layout
2
+
3
+ # Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
4
+ # - `tests/trainer` for testing functionality related to `verl/trainer`
5
+ # - `tests/models` for testing functionality related to `verl/models`
6
+ # - ...
7
+
8
+ # There are a few folders with `special_` prefix, created for special purposes:
9
+ # - `special_distributed`: unit tests that must run with multiple GPUs
10
+ # - `special_e2e`: end-to-end tests with training/generation scripts
11
+ # - `special_npu`: tests for NPUs
12
+ # - `special_sanity`: a suite of quick sanity tests
13
+ # - `special_standalone`: a set of test that are designed to run in dedicated environments
14
+
15
+ # Accelerators for tests
16
+ # - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
17
+ # - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
18
+
19
+ # # Workflow layout
20
+
21
+ # All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
22
+ # 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
23
+ # 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
24
+ # 3. End-to-end tests: `e2e_*.yml`
25
+ # 4. Unit tests
26
+ # - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
27
+ # - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
28
+ # - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
29
+ # - new workflow yaml is added to `.github/workflows`
30
+ # - new tests are added to workflow mentioned in 2.
31
+
32
+
33
+ name: e2e_eval_aime24
34
+
35
+ on:
36
+ # Trigger the workflow on push or pull request,
37
+ # but only for the main branch
38
+ # For push, for now only anti-patterns are specified so it is more conservative
39
+ # and achieves higher coverage.
40
+ push:
41
+ branches:
42
+ - main
43
+ - v0.*
44
+ paths:
45
+ - "**/*.py"
46
+ # Other entrypoints
47
+ - "!*.md"
48
+ - "!docker/**"
49
+ - "!docs/**"
50
+ - "!examples/**"
51
+ - "!tests/**"
52
+ - "!verl/trainer/main_*.py"
53
+ - "!verl/trainer/fsdp_sft_trainer.py"
54
+ - "!recipe/r1/README.md"
55
+ pull_request:
56
+ branches:
57
+ - main
58
+ paths:
59
+ - "**/*.py"
60
+ # Other entrypoints
61
+ - "!*.md"
62
+ - "!docker/**"
63
+ - "!docs/**"
64
+ - "!examples/**"
65
+ - "!tests/**"
66
+ - "!verl/trainer/main_*.py"
67
+ - "!verl/trainer/fsdp_sft_trainer.py"
68
+ # Home
69
+ - "recipe/r1"
70
+ - "!recipe/r1/README.md"
71
+ # Other recipes
72
+ - "!recipe/**"
73
+ # Entrypoints
74
+ - ".github/workflows/e2e_eval_aime24.yml"
75
+ - "tests/special_e2e/run_r1_distill_qwen_aime24_eval.sh"
76
+ - "verl/trainer/main_generation.py"
77
+ - "verl/trainer/config/generation.yaml"
78
+
79
+ # Cancel jobs on the same ref if a new one is triggered
80
+ concurrency:
81
+ group: ${{ github.workflow }}-${{ github.ref }}
82
+ cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
83
+
84
+ # Declare permissions just read content.
85
+ permissions:
86
+ contents: read
87
+
88
+ jobs:
89
+ e2e_eval_aime24:
90
+ runs-on: [L20x8]
91
+ timeout-minutes: 40 # Increase this timeout value as needed
92
+ env:
93
+ HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
94
+ HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
95
+ NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
96
+ HF_ENDPOINT: "https://hf-mirror.com"
97
+ HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
98
+ container:
99
+ image: whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6.post5-mcore0.12.0-te2.3
100
+ options: --gpus all --shm-size=10g
101
+ steps:
102
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
103
+ with:
104
+ fetch-depth: 0
105
+ - name: Install the current repository
106
+ run: |
107
+ pip3 install --no-deps -e .[test,gpu,math]
108
+ pip3 install math-verify
109
+ - name: Prepare aime24 dataset
110
+ run: |
111
+ ray stop --force
112
+ python3 recipe/r1/data_process.py --task aime2024
113
+ - name: Running generation and evaluation in AIME 2024
114
+ run: |
115
+ ray stop --force
116
+ bash tests/special_e2e/run_r1_distill_qwen_aime24_eval.sh
.github/workflows/e2e_ppo_trainer.yml ADDED
@@ -0,0 +1,407 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: e2e_ppo_trainer
2
+
3
+ on:
4
+ # Trigger the workflow on push or pull request,
5
+ # but only for the main branch
6
+ # For push, for now only anti-patterns are specified so it is more conservative
7
+ # and achieves higher coverage.
8
+ push:
9
+ branches:
10
+ - main
11
+ - v0.*
12
+ paths:
13
+ - "**/*.py"
14
+ # Other entrypoints
15
+ - "!verl/trainer/fsdp_sft_trainer.py"
16
+ # Recipes
17
+ - "!recipe/**"
18
+ # Megatron
19
+ - "!verl/workers/**/megatron_*.py"
20
+
21
+ pull_request:
22
+ branches:
23
+ - main
24
+ - v0.*
25
+ paths:
26
+ - "**/*.py"
27
+ # Other entrypoints
28
+ - "!**/*.md"
29
+ - "!docker/**"
30
+ - "!examples/**"
31
+ - "!tests/**"
32
+ - "!verl/trainer/main_*.py"
33
+ - "!verl/trainer/fsdp_sft_trainer.py"
34
+ # Docs
35
+ - "!docs/**"
36
+ # Recipes
37
+ - "!recipe/**"
38
+ # Megatron
39
+ - "!verl/workers/**/megatron_*.py"
40
+ # Entrypoints
41
+ - ".github/workflows/e2e_ppo_trainer.yml"
42
+ - "examples/data_preprocess/gsm8k.py"
43
+ - "examples/data_preprocess/geo3k.py"
44
+ - "tests/special_e2e/ppo_trainer"
45
+ - "verl/trainer/main_ppo.py"
46
+ - "verl/trainer/config/ppo_trainer.yaml"
47
+
48
+ # Cancel jobs on the same ref if a new one is triggered
49
+ concurrency:
50
+ group: ${{ github.workflow }}-${{ github.ref }}
51
+ cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
52
+
53
+ # Declare permissions just read content.
54
+ permissions:
55
+ contents: read
56
+
57
+ jobs:
58
+ pre_commit_for_ppo:
59
+ runs-on: ubuntu-latest
60
+ strategy:
61
+ matrix:
62
+ python-version: ["3.12"]
63
+ steps:
64
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
65
+ - name: Set up Python ${{ matrix.python-version }}
66
+ uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0
67
+ with:
68
+ python-version: ${{ matrix.python-version }}
69
+ - name: Set ruff --output-format=github
70
+ run: |
71
+ sed -i 's/--output-format=full/--output-format=github/' .pre-commit-config.yaml
72
+ git add .pre-commit-config.yaml
73
+ - uses: pre-commit/[email protected]
74
+ with:
75
+ extra_args: "" # Overriding default "--all-files"
76
+
77
+ e2e_ppo_trainer_vllm:
78
+ runs-on: [L20x8]
79
+ timeout-minutes: 60 # Increase this timeout value as needed
80
+ env:
81
+ HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
82
+ HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
83
+ NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
84
+ HF_ENDPOINT: "https://hf-mirror.com"
85
+ HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
86
+ container:
87
+ image: whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6.post5-mcore0.12.0-te2.3
88
+ options: --gpus all --shm-size=10g
89
+ steps:
90
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
91
+ with:
92
+ fetch-depth: 0
93
+ - name: Install the current repository
94
+ run: |
95
+ pip3 install --no-deps -e .[test,vllm]
96
+ - name: Prepare GSM8K dataset
97
+ run: |
98
+ ray stop --force
99
+ python3 examples/data_preprocess/gsm8k.py
100
+ # Function RM
101
+ - name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm with validation and saving (FSDP_SIZE=8)
102
+ run: |
103
+ ray stop --force
104
+ VAL_BEFORE_TRAIN=True TEST_FREQ=1 SAVE_FREQ=1 SAVE_HF_MODEL=True VERL_EXP_NAME="qwen2.5-0.5b-function-reward-minimal-fsdp-size8" bash tests/special_e2e/ppo_trainer/run_function_reward.sh
105
+ - name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm after resuming
106
+ run: |
107
+ ray stop --force
108
+ RESUME_MODE=auto VERL_EXP_NAME="qwen2.5-0.5b-function-reward-minimal-fsdp-size8" bash tests/special_e2e/ppo_trainer/run_function_reward.sh
109
+ - name: Test merging FSDP checkpoints (Qwen Actor)
110
+ run: |
111
+ exp_name="qwen2.5-0.5b-function-reward-minimal-fsdp-size8"
112
+ python -m verl.model_merger test --backend fsdp --local_dir checkpoints/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface
113
+ - name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm with validation and saving (DDP_SIZE=2, FSDP_SIZE=4)
114
+ run: |
115
+ ray stop --force
116
+ VAL_BEFORE_TRAIN=True TEST_FREQ=1 SAVE_FREQ=1 SAVE_HF_MODEL=True FSDP_SIZE=4 VERL_EXP_NAME="qwen2.5-0.5b-function-reward-minimal-ddp-size2-fsdp-size4" bash tests/special_e2e/ppo_trainer/run_function_reward.sh
117
+ - name: Test merging DDP+FSDP checkpoints (Qwen Actor)
118
+ run: |
119
+ exp_name="qwen2.5-0.5b-function-reward-minimal-ddp-size2-fsdp-size4"
120
+ python -m verl.model_merger test --backend fsdp --local_dir checkpoints/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface
121
+ - name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm with validation and saving (FSDP2)
122
+ run: |
123
+ ray stop --force
124
+ VAL_BEFORE_TRAIN=True TEST_FREQ=1 SAVE_FREQ=1 SAVE_HF_MODEL=True VERL_EXP_NAME="qwen2.5-0.5b-function-reward-minimal-fsdp2-size8" STRATEGY=fsdp2 bash tests/special_e2e/ppo_trainer/run_function_reward.sh
125
+ - name: Test merging FSDP2 checkpoints (Qwen Actor)
126
+ run: |
127
+ exp_name="qwen2.5-0.5b-function-reward-minimal-fsdp2-size8"
128
+ python -m verl.model_merger test --backend fsdp --local_dir checkpoints/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface
129
+ - name: Running GSM8K E2E without rmpad using function rm
130
+ run: |
131
+ ray stop --force
132
+ RM_PAD=False bash tests/special_e2e/ppo_trainer/run_function_reward.sh
133
+ - name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm (GRPO)
134
+ run: |
135
+ ray stop --force
136
+ ADV_ESTIMATOR=grpo USE_KL=True bash tests/special_e2e/ppo_trainer/run_function_reward.sh
137
+ - name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm (ReMax)
138
+ run: |
139
+ ray stop --force
140
+ ADV_ESTIMATOR=remax USE_KL=True bash tests/special_e2e/ppo_trainer/run_function_reward.sh
141
+ - name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using customized reward function
142
+ run: |
143
+ ray stop --force
144
+ CUSTOM_REWARD_FN=True bash tests/special_e2e/ppo_trainer/run_function_reward.sh
145
+ - name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm with in-reward kl and kl loss
146
+ run: |
147
+ ray stop --force
148
+ USE_KL=True bash tests/special_e2e/ppo_trainer/run_function_reward.sh
149
+ # LoRA tests
150
+ - name: Running GSM8K E2E training tests on 8 L20 GPUs with grpo lora using function rm with use_shm
151
+ run: |
152
+ ray stop --force
153
+ ADV_ESTIMATOR=grpo USE_SHM=True LORA_RANK=32 LOAD_FORMAT=safetensors bash tests/special_e2e/ppo_trainer/run_function_reward.sh
154
+ - name: Running GSM8K E2E training tests on 8 L20 GPUs with grpo lora using function rm with use_shm and layered_summon
155
+ run: |
156
+ ray stop --force
157
+ ADV_ESTIMATOR=grpo USE_SHM=True LORA_RANK=32 LOAD_FORMAT=safetensors LAYERED_SUMMON=True TOTAL_TRAIN_STEPS=1 SAVE_FREQ=1 FSDP_SIZE=4 VERL_EXP_NAME="qwen2.5-0.5b-function-reward-minimal" bash tests/special_e2e/ppo_trainer/run_function_reward.sh
158
+ - name: Test GRPO LoRA checkpoints merging function
159
+ run: |
160
+ export EXP_NAME="qwen2.5-0.5b-function-reward-minimal"
161
+ ls checkpoints/verl-test/${EXP_NAME}/global_step_1/actor
162
+ cat checkpoints/verl-test/${EXP_NAME}/global_step_1/actor/huggingface/config.json
163
+ python3 -m verl.model_merger merge --backend fsdp --local_dir checkpoints/verl-test/${EXP_NAME}/global_step_1/actor/ --target_dir checkpoints/verl-test/${EXP_NAME}/global_step_1/actor/huggingface
164
+ - name: Running GSM8K E2E training tests on 8 L20 GPUs with grpo lora using function rm with use_shm and layered_summon with fsdp2
165
+ run: |
166
+ ray stop --force
167
+ ADV_ESTIMATOR=grpo USE_SHM=True LORA_RANK=32 LOAD_FORMAT=safetensors LAYERED_SUMMON=True STRATEGY=fsdp2 bash tests/special_e2e/ppo_trainer/run_function_reward.sh
168
+ # Model RM
169
+ - name: Running GRPO GSM8K E2E training tests with FSDP on 8 L20 GPUs (DeepSeek)
170
+ run: |
171
+ ray stop --force
172
+ MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct bash tests/special_e2e/ppo_trainer/run_function_reward.sh
173
+ - name: Running GSM8K E2E with rmpad using model rm
174
+ run: |
175
+ ray stop --force
176
+ bash tests/special_e2e/ppo_trainer/run_model_reward.sh
177
+ - name: Running GSM8K E2E without rmpad using model rm
178
+ run: |
179
+ ray stop --force
180
+ RM_PAD=False bash tests/special_e2e/ppo_trainer/run_model_reward.sh
181
+ - name: Running GSM8K E2E with rmpad using model rm and ulysses sp=2
182
+ run: |
183
+ ray stop --force
184
+ SP_SIZE=2 bash tests/special_e2e/ppo_trainer/run_model_reward.sh
185
+ - name: Running GSM8K E2E with rmpad using model rm and dynamic batch size
186
+ run: |
187
+ ray stop --force
188
+ SEQ_BALANCE=True bash tests/special_e2e/ppo_trainer/run_model_reward.sh
189
+ - name: Running GSM8K E2E with rmpad using model rm with Liger Kernel enabled
190
+ run: |
191
+ ray stop --force
192
+ LIGER=True bash tests/special_e2e/ppo_trainer/run_model_reward.sh
193
+ - name: Running GSM8K E2E with rmpad using model rm with Fused Kernel enabled
194
+ run: |
195
+ ray stop --force
196
+ FUSED_KERNELS=True bash tests/special_e2e/ppo_trainer/run_model_reward.sh
197
+ - name: Running GSM8K E2E with rmpad using model rm with Fused Kernel enabled
198
+ run: |
199
+ ray stop --force
200
+ FUSED_KERNEL=True FUSED_KERNEL_BACKEND=triton bash tests/special_e2e/ppo_trainer/run_model_reward.sh
201
+
202
+ e2e_ppo_trainer_vllm_vlm:
203
+ runs-on: [L20x8]
204
+ needs: pre_commit_for_ppo
205
+ timeout-minutes: 40 # Increase this timeout value as needed
206
+ env:
207
+ HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
208
+ HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
209
+ NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
210
+ HF_ENDPOINT: "https://hf-mirror.com"
211
+ HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
212
+ container:
213
+ image: hiyouga/verl:ngc-th2.6.0-cu126-vllm0.8.3-flashinfer0.2.2-cxx11abi0
214
+ options: --gpus all --shm-size=50g # Visual dataloader requires large memory
215
+ steps:
216
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
217
+ with:
218
+ fetch-depth: 0
219
+ - name: Install the current repository
220
+ run: |
221
+ pip3 install -e .[test,gpu,vllm,geo,trl]
222
+ # Geo3k
223
+ - name: Prepare GEO3K dataset
224
+ run: |
225
+ ray stop --force
226
+ python3 examples/data_preprocess/geo3k.py
227
+ - name: Running GEO3K VLM GRPO E2E training tests on 8 L20 GPUs with rmpad using function rm
228
+ run: |
229
+ ray stop --force
230
+ TRAIN_FILES=$HOME/data/geo3k/train.parquet VAL_FILES=$HOME/data/geo3k/test.parquet \
231
+ MAX_PROMPT_LEN=1536 MAX_RESPONSE_LEN=1536 \
232
+ MODEL_ID=Qwen/Qwen2-VL-2B-Instruct \
233
+ ADV_ESTIMATOR=grpo RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \
234
+ SP_SIZE=2 \
235
+ bash tests/special_e2e/ppo_trainer/run_function_reward.sh
236
+
237
+ - name: Running GEO3K VLM PPO E2E training tests on 8 L20 GPUs with rmpad using function rm
238
+ run: |
239
+ ray stop --force
240
+ TRAIN_FILES=$HOME/data/geo3k/train.parquet VAL_FILES=$HOME/data/geo3k/test.parquet \
241
+ MAX_PROMPT_LEN=1536 MAX_RESPONSE_LEN=1536 \
242
+ MODEL_ID=Qwen/Qwen2-VL-2B-Instruct \
243
+ ADV_ESTIMATOR=gae RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \
244
+ SP_SIZE=2 \
245
+ bash tests/special_e2e/ppo_trainer/run_function_reward.sh
246
+
247
+ e2e_ppo_trainer_sglang:
248
+ runs-on: [L20x8]
249
+ needs: pre_commit_for_ppo
250
+ timeout-minutes: 40 # Increase this timeout value as needed
251
+ env:
252
+ HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
253
+ HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
254
+ NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
255
+ HF_ENDPOINT: "https://hf-mirror.com"
256
+ HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
257
+ container:
258
+ image: whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6.post5-mcore0.12.0-te2.3
259
+ options: --gpus all --shm-size=10g
260
+ steps:
261
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
262
+ with:
263
+ fetch-depth: 0
264
+ - name: Install the current repository
265
+ run: |
266
+ pip3 install -e .[test,gpu,sglang] --no-deps
267
+ - name: Prepare gsm8k dataset
268
+ run: |
269
+ ray stop --force
270
+ python3 examples/data_preprocess/gsm8k.py
271
+ - name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm and save ckpt
272
+ run: |
273
+ ray stop --force
274
+ ENGINE=sglang bash tests/special_e2e/ppo_trainer/run_function_reward.sh
275
+ - name: Running GSM8K E2E training tests on sglang async
276
+ run: |
277
+ ray stop --force
278
+ ENGINE=sglang ROLLOUT_MODE=async bash tests/special_e2e/ppo_trainer/run_function_reward.sh
279
+ - name: Running GSM8K E2E training tests on vllm async
280
+ run: |
281
+ ray stop --force
282
+ export VLLM_USE_V1=1
283
+ ray start --head
284
+ ENGINE=vllm ROLLOUT_MODE=async bash tests/special_e2e/ppo_trainer/run_function_reward.sh
285
+
286
+ e2e_ppo_trainer_sglang_multiturn_with_tool:
287
+ runs-on: [L20x8]
288
+ needs: pre_commit_for_ppo
289
+ timeout-minutes: 40 # Increase this timeout value as needed
290
+ env:
291
+ HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
292
+ HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
293
+ NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
294
+ HF_ENDPOINT: "https://hf-mirror.com"
295
+ HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
296
+ container:
297
+ image: whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6.post5-mcore0.12.0-te2.3
298
+ options: --gpus all --shm-size=10g
299
+ steps:
300
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
301
+ with:
302
+ fetch-depth: 0
303
+ - name: Install the current repository
304
+ run: |
305
+ pip3 install -e .[test,gpu,sglang] --no-deps
306
+ - name: Prepare gsm8k dataset with tool
307
+ run: |
308
+ ray stop --force
309
+ python3 examples/data_preprocess/gsm8k_multiturn_w_tool.py --local_dir $HOME/data/gsm8k_verl_sgl_multi_turn_preprocessed
310
+ - name: Running GSM8K with tool E2E training tests on 8 L20 GPUs with rmpad using function rm and save ckpt with sglang
311
+ run: |
312
+ ray stop --force
313
+ bash tests/special_e2e/run_gsm8k_fsdp_sgl_multiturn_w_tool.sh
314
+ - name: Running GSM8K with tool E2E training tests with FSDP2
315
+ run: |
316
+ ray stop --force
317
+ FSDP_STRATEGY=fsdp2 bash tests/special_e2e/run_gsm8k_fsdp_sgl_multiturn_w_tool.sh
318
+
319
+ e2e_ppo_trainer_sglang_vlm:
320
+ runs-on: [L20x8]
321
+ needs: pre_commit_for_ppo
322
+ timeout-minutes: 60 # Increase this timeout value as needed
323
+ env:
324
+ HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
325
+ HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
326
+ NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
327
+ HF_ENDPOINT: "https://hf-mirror.com"
328
+ HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
329
+ container:
330
+ image: whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6.post5-mcore0.12.0-te2.3
331
+ options: --gpus all --shm-size=50g # Visual dataloader requires large memory
332
+ steps:
333
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
334
+ with:
335
+ fetch-depth: 0
336
+ - name: Install the current repository
337
+ run: |
338
+ pip3 install -e .[test,geo,gpu,sglang]
339
+ # Geo3k
340
+ - name: Prepare GEO3K dataset
341
+ run: |
342
+ ray stop --force
343
+ python3 examples/data_preprocess/geo3k.py
344
+ - name: Running GEO3K VLM E2E training tests on 8 L20 GPUs with rmpad using function rm
345
+ run: |
346
+ ray stop --force
347
+ TRAIN_FILES=$HOME/data/geo3k/train.parquet VAL_FILES=$HOME/data/geo3k/test.parquet \
348
+ MAX_PROMPT_LEN=1536 MAX_RESPONSE_LEN=1536 \
349
+ MODEL_ID=Qwen/Qwen2-VL-2B-Instruct \
350
+ ADV_ESTIMATOR=grpo RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \
351
+ ENGINE=sglang GPU_MEMORY_UTILIZATION=0.6 ACTOR_FSDP_PARAM_OFFLOAD=True \
352
+ ACTOR_FSDP_OPTIMIZER_OFFLOAD=True REF_FSDP_PARAM_OFFLOAD=True \
353
+ bash tests/special_e2e/ppo_trainer/run_function_reward.sh
354
+ - name: Running GEO3K VLM E2E with rmpad using torch fused kernel (Qwen2.5-VL)
355
+ run: |
356
+ ray stop --force
357
+ FUSED_KERNELS=True TRAIN_FILES=$HOME/data/geo3k/train.parquet VAL_FILES=$HOME/data/geo3k/test.parquet \
358
+ MAX_PROMPT_LEN=1536 MAX_RESPONSE_LEN=1536 \
359
+ MODEL_ID=Qwen/Qwen2.5-VL-3B-Instruct \
360
+ ADV_ESTIMATOR=grpo RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \
361
+ ENGINE=sglang GPU_MEMORY_UTILIZATION=0.6 ACTOR_FSDP_PARAM_OFFLOAD=True \
362
+ ACTOR_FSDP_OPTIMIZER_OFFLOAD=True REF_FSDP_PARAM_OFFLOAD=True \
363
+ bash tests/special_e2e/ppo_trainer/run_function_reward.sh
364
+ - name: Running GEO3K VLM E2E with rmpad using triton fused kernel (Qwen2.5-VL)
365
+ run: |
366
+ ray stop --force
367
+ FUSED_KERNELS=True FUSED_KERNEL_BACKEND=triton \
368
+ TRAIN_FILES=$HOME/data/geo3k/train.parquet VAL_FILES=$HOME/data/geo3k/test.parquet \
369
+ MAX_PROMPT_LEN=1536 MAX_RESPONSE_LEN=1536 \
370
+ MODEL_ID=Qwen/Qwen2.5-VL-3B-Instruct \
371
+ ADV_ESTIMATOR=grpo RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \
372
+ ENGINE=sglang GPU_MEMORY_UTILIZATION=0.6 ACTOR_FSDP_PARAM_OFFLOAD=True \
373
+ ACTOR_FSDP_OPTIMIZER_OFFLOAD=True REF_FSDP_PARAM_OFFLOAD=True \
374
+ bash tests/special_e2e/ppo_trainer/run_function_reward.sh
375
+
376
+ e2e_ppo_trainer_sglang_vlm_multiturn_with_tool:
377
+ runs-on: [L20x8]
378
+ needs: pre_commit_for_ppo
379
+ timeout-minutes: 40 # Increase this timeout value as needed
380
+ env:
381
+ HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
382
+ HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
383
+ NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
384
+ HF_ENDPOINT: "https://hf-mirror.com"
385
+ HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
386
+ container:
387
+ image: whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6.post5-mcore0.12.0-te2.3
388
+ options: --gpus all --shm-size=10g
389
+ steps:
390
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
391
+ with:
392
+ fetch-depth: 0
393
+ - name: Install the current repository
394
+ run: |
395
+ pip3 install -e .[test,geo,gpu,sglang]
396
+ - name: Prepare geo3k dataset with tool
397
+ run: |
398
+ ray stop --force
399
+ python3 examples/data_preprocess/geo3k_multiturn_w_tool.py --local_dir $HOME/data/geo3k_verl_sgl_multi_turn_preprocessed
400
+ - name: Running GEO3K with tool E2E training tests on 8 L20 GPUs with rmpad using function rm and save ckpt with sglang
401
+ run: |
402
+ ray stop --force
403
+ bash tests/special_e2e/run_geo3k_fsdp_sgl_multiturn_w_tool.sh
404
+ - name: Running GEO3K with tool E2E training tests with FSDP2
405
+ run: |
406
+ ray stop --force
407
+ FSDP_STRATEGY=fsdp2 bash tests/special_e2e/run_geo3k_fsdp_sgl_multiturn_w_tool.sh
.github/workflows/e2e_ppo_trainer_megatron.yml ADDED
@@ -0,0 +1,379 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # # Tests layout
2
+
3
+ # Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
4
+ # - `tests/trainer` for testing functionality related to `verl/trainer`
5
+ # - `tests/models` for testing functionality related to `verl/models`
6
+ # - ...
7
+
8
+ # There are a few folders with `special_` prefix, created for special purposes:
9
+ # - `special_distributed`: unit tests that must run with multiple GPUs
10
+ # - `special_e2e`: end-to-end tests with training/generation scripts
11
+ # - `special_npu`: tests for NPUs
12
+ # - `special_sanity`: a suite of quick sanity tests
13
+ # - `special_standalone`: a set of test that are designed to run in dedicated environments
14
+
15
+ # Accelerators for tests
16
+ # - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
17
+ # - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
18
+
19
+ # # Workflow layout
20
+
21
+ # All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
22
+ # 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
23
+ # 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
24
+ # 3. End-to-end tests: `e2e_*.yml`
25
+ # 4. Unit tests
26
+ # - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
27
+ # - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
28
+ # - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
29
+ # - new workflow yaml is added to `.github/workflows`
30
+ # - new tests are added to workflow mentioned in 2.
31
+
32
+ name: e2e_ppo_trainer_megatron
33
+ # latest version: Megatron-LM core_r0.11.0 https://github.com/NVIDIA/Megatron-LM/tree/core_r0.11.0
34
+
35
+ on:
36
+ # Trigger the workflow on push or pull request,
37
+ # but only for the main branch.
38
+ # For push, for now only anti-patterns are specified so it is more conservative
39
+ # and achieves higher coverage.
40
+ push:
41
+ branches:
42
+ - main
43
+ - v0.*
44
+ paths:
45
+ - "**/*.py"
46
+ # Other entrypoints
47
+ - "!verl/trainer/fsdp_sft_trainer.py"
48
+ # Recipes
49
+ - "!recipe/**"
50
+ # FSDP
51
+ - "!verl/workers/**/*dp_*.py"
52
+ pull_request:
53
+ branches:
54
+ - main
55
+ - v0.*
56
+ paths:
57
+ - "**/*.py"
58
+ # Other entrypoints
59
+ - "!docker/**"
60
+ # Docs
61
+ - "!**/*.md"
62
+ - "!docs/**"
63
+ - "!examples/**"
64
+ - "!tests/**"
65
+ - "!verl/trainer/main_*.py"
66
+ - "!verl/trainer/fsdp_sft_trainer.py"
67
+ # Recipes
68
+ - "!recipe/**"
69
+ # FSDP
70
+ - "!verl/workers/**/*dp_*.py"
71
+ # Entrypoints
72
+ - ".github/workflows/e2e_ppo_trainer_megatron.yml"
73
+ - "examples/data_preprocess/gsm8k.py"
74
+ - "examples/data_preprocess/geo3k.py"
75
+ - "tests/special_e2e/run_ppo_trainer_megatron.sh"
76
+ - "verl/trainer/main_ppo.py"
77
+ - "verl/trainer/config/ppo_megatron_trainer.yaml"
78
+
79
+ # Cancel jobs on the same ref if a new one is triggered
80
+ concurrency:
81
+ group: ${{ github.workflow }}-${{ github.ref }}
82
+ cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
83
+
84
+ # Declare permissions just read content.
85
+ permissions:
86
+ contents: read
87
+
88
+ env:
89
+ IMAGE: "verl-ci-cn-beijing.cr.volces.com/whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6.post5-mcore0.12.0-te2.3"
90
+ DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"
91
+
92
+ jobs:
93
+ setup:
94
+ runs-on: ubuntu-latest
95
+ outputs:
96
+ runner-label: ${{ steps.create-runner.outputs.runner_label }}
97
+ mlp-task-id: ${{ steps.create-runner.outputs.mlp_task_id }}
98
+ steps:
99
+ - name: create runner
100
+ id: create-runner
101
+ shell: bash
102
+ run: |
103
+ if [[ "${{ github.event.repository.full_name }}" != "volcengine/verl" ]]; then
104
+ echo "no need create runner, skip"
105
+ exit 0
106
+ fi
107
+ resp=$(curl -X POST "${{ env.DYNAMIC_RUNNER_ENDPOINT }}/create" \
108
+ -d '{"Image": "${{ env.IMAGE }}"}')
109
+ runner_label=$(echo $resp | jq -r '.runner_label')
110
+ if [[ -z $runner_label || $runner_label == "null" ]]; then
111
+ echo "create runner failed"
112
+ exit 1
113
+ fi
114
+ echo "runner_label=$runner_label" >> $GITHUB_OUTPUT
115
+ mlp_task_id=$(echo $resp | jq -r '.task_id')
116
+ echo "mlp_task_id=$mlp_task_id" >> $GITHUB_OUTPUT
117
+
118
+ e2e_ppo_trainer_megatron-deepseek:
119
+ needs: setup
120
+ runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
121
+ timeout-minutes: 60 # Increase this timeout value as needed
122
+ env:
123
+ HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
124
+ HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
125
+ NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
126
+ HF_ENDPOINT: "https://hf-mirror.com"
127
+ HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
128
+ steps:
129
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
130
+ with:
131
+ fetch-depth: 0
132
+ - name: Install the current repository
133
+ run: |
134
+ pip3 install --no-deps -e .[test]
135
+ - name: Prepare GSM8K dataset
136
+ run: |
137
+ python3 examples/data_preprocess/gsm8k.py
138
+ - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (DeepSeek)
139
+ run: |
140
+ ray stop --force
141
+ ALL_OFFLOAD=True SAVE_FREQ=1 MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct bash tests/special_e2e/run_ppo_trainer_megatron.sh
142
+ - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (DeepSeek)
143
+ run: |
144
+ ray stop --force
145
+ export VLLM_USE_V1=1
146
+ ray start --head
147
+ MODE=async RESUME_MODE=auto MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct TOTAL_TRAIN_STEPS=2 bash tests/special_e2e/run_ppo_trainer_megatron.sh
148
+ - name: Test Megatron checkpoints merging function (DeepSeek Actor and Critic)
149
+ run: |
150
+ exp_name="deepseek-coder-1.3b-instruct-megatron-gsm8k-minimal"
151
+ python -m verl.model_merger test --backend megatron --local_dir checkpoints/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface
152
+ python -m verl.model_merger test --backend megatron --is-value-model --local_dir checkpoints/verl-test/${exp_name}/global_step_1/critic --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/critic/huggingface
153
+ - name: Running GRPO GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Deepseek)
154
+ run: |
155
+ ray stop --force
156
+ ADV_ESTIMATOR=grpo USE_DYNAMIC_BSZ=False MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct bash tests/special_e2e/run_ppo_trainer_megatron.sh
157
+ - name: clean up
158
+ run: |
159
+ rm -rf checkpoints
160
+ e2e_ppo_trainer_megatron-qwen3:
161
+ needs: setup
162
+ runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
163
+ timeout-minutes: 60 # Increase this timeout value as needed
164
+ env:
165
+ HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
166
+ HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
167
+ NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
168
+ HF_ENDPOINT: "https://hf-mirror.com"
169
+ HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
170
+ steps:
171
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
172
+ with:
173
+ fetch-depth: 0
174
+ - name: Install the current repository
175
+ run: |
176
+ pip3 install --no-deps -e .[test]
177
+ - name: Prepare GSM8K dataset
178
+ run: |
179
+ python3 examples/data_preprocess/gsm8k.py
180
+ - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen3) with validation and saving
181
+ run: |
182
+ ray stop --force
183
+ ALL_OFFLOAD=True VAL_BEFORE_TRAIN=True TEST_FREQ=1 SAVE_FREQ=1 MODEL_ID=Qwen/Qwen3-0.6B bash tests/special_e2e/run_ppo_trainer_megatron.sh
184
+ - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen3) testing learning rate scheduler
185
+ run: |
186
+ ray stop --force
187
+ LR_WARMUP_STEPS=1 TOTAL_TRAIN_STEPS=2 MODEL_ID=Qwen/Qwen3-0.6B bash tests/special_e2e/run_ppo_trainer_megatron.sh
188
+
189
+ - name: Test Megatron checkpoints merging function (Qwen3 Actor and Critic)
190
+ run: |
191
+ exp_name="qwen3-0.6b-megatron-gsm8k-minimal"
192
+ python -m verl.model_merger test --backend megatron --tie-word-embedding --local_dir checkpoints/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface
193
+ python -m verl.model_merger test --backend megatron --is-value-model --local_dir checkpoints/verl-test/${exp_name}/global_step_1/critic --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/critic/huggingface
194
+ - name: clean up
195
+ run: |
196
+ rm -rf checkpoints
197
+ e2e_ppo_trainer_megatron-different-train-infer-tp-qwen-tie-embedding:
198
+ needs: setup
199
+ runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
200
+ timeout-minutes: 60 # Increase this timeout value as needed
201
+ env:
202
+ HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
203
+ HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
204
+ NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
205
+ HF_ENDPOINT: "https://hf-mirror.com"
206
+ HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
207
+ steps:
208
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
209
+ with:
210
+ fetch-depth: 0
211
+ - name: Install the current repository
212
+ run: |
213
+ pip3 install --no-deps -e .[test]
214
+ - name: Prepare GSM8K dataset
215
+ run: |
216
+ python3 examples/data_preprocess/gsm8k.py
217
+ - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with tie-embedding Megatron (Qwen) with train tp > infer tp
218
+ run: |
219
+ ray stop --force
220
+ VAL_BEFORE_TRAIN=True TEST_FREQ=1 SAVE_FREQ=1 TRAIN_TP=2 INFER_TP=1 MODEL_ID=Qwen/Qwen2.5-1.5B bash tests/special_e2e/run_ppo_trainer_megatron.sh
221
+ - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen) with train tp < infer tp
222
+ run: |
223
+ ray stop --force
224
+ VAL_BEFORE_TRAIN=True TEST_FREQ=1 SAVE_FREQ=1 TRAIN_TP=1 INFER_TP=2 MODEL_ID=Qwen/Qwen2.5-1.5B bash tests/special_e2e/run_ppo_trainer_megatron.sh
225
+ - name: clean up
226
+ run: |
227
+ rm -rf checkpoints
228
+ e2e_ppo_trainer_megatron-qwen-override-transformer-config:
229
+ needs: setup
230
+ runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
231
+ timeout-minutes: 60 # Increase this timeout value as needed
232
+ env:
233
+ HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
234
+ HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
235
+ NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
236
+ HF_ENDPOINT: "https://hf-mirror.com"
237
+ HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
238
+ steps:
239
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
240
+ with:
241
+ fetch-depth: 0
242
+ - name: Install the current repository
243
+ run: |
244
+ pip3 install --no-deps -e .[test]
245
+ - name: Prepare GSM8K dataset
246
+ run: |
247
+ python3 examples/data_preprocess/gsm8k.py
248
+ - name: Prepare dist_ckpt of Qwen2.5-0.5B, uneven layer distribution only supports dist_ckpt
249
+ run: |
250
+ python3 scripts/converter_hf_to_mcore.py --hf_model_path ${HOME}/models/Qwen/Qwen2.5-0.5B --output_path checkpoints/verl-test/qwen2.5-0.5b-megatron
251
+ - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen)
252
+ run: |
253
+ ray stop --force
254
+ SAVE_FREQ=1 COMMON_PP=4 COMMON_VPP=null COMMON_CP=1 SKIP_SAVE_HF_MODEL=1 bash tests/special_e2e/run_ppo_trainer_megatron.sh +actor_rollout_ref.actor.megatron.override_transformer_config.num_layers_in_first_pipeline_stage=8 +actor_rollout_ref.actor.megatron.override_transformer_config.num_layers_in_last_pipeline_stage=4 actor_rollout_ref.actor.megatron.use_dist_checkpointing=true actor_rollout_ref.actor.megatron.dist_checkpointing_path=checkpoints/verl-test/qwen2.5-0.5b-megatron actor_rollout_ref.ref.megatron.use_dist_checkpointing=true actor_rollout_ref.ref.megatron.dist_checkpointing_path=checkpoints/verl-test/qwen2.5-0.5b-megatron critic.megatron.use_dist_checkpointing=true critic.megatron.dist_checkpointing_path=checkpoints/verl-test/qwen2.5-0.5b-megatron reward_model.megatron.use_dist_checkpointing=true reward_model.megatron.dist_checkpointing_path=checkpoints/verl-test/qwen2.5-0.5b-megatron
255
+ cp -r checkpoints checkpoints-dut
256
+ SAVE_FREQ=1 COMMON_PP=4 COMMON_VPP=null COMMON_CP=1 bash tests/special_e2e/run_ppo_trainer_megatron.sh
257
+ - name: Test Megatron checkpoints merging function (Qwen Actor and Critic)
258
+ run: |
259
+ exp_name="qwen2.5-0.5b-megatron-gsm8k-minimal"
260
+ python -m verl.model_merger test --backend megatron --tie-word-embedding --local_dir checkpoints-dut/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface
261
+ python -m verl.model_merger test --backend megatron --is-value-model --local_dir checkpoints-dut/verl-test/${exp_name}/global_step_1/critic --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/critic/huggingface
262
+ - name: clean up
263
+ run: |
264
+ rm -rf checkpoints
265
+ e2e_ppo_trainer_megatron-deepseek-override-transformer-config:
266
+ needs: setup
267
+ runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
268
+ timeout-minutes: 60 # Increase this timeout value as needed
269
+ env:
270
+ HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
271
+ HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
272
+ NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
273
+ HF_ENDPOINT: "https://hf-mirror.com"
274
+ HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
275
+ steps:
276
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
277
+ with:
278
+ fetch-depth: 0
279
+ - name: Install the current repository
280
+ run: |
281
+ pip3 install --no-deps -e .[test]
282
+ - name: Prepare GSM8K dataset
283
+ run: |
284
+ python3 examples/data_preprocess/gsm8k.py
285
+ - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (DeepSeek)
286
+ run: |
287
+ ray stop --force
288
+ SAVE_FREQ=1 MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct COMMON_PP=2 COMMON_VPP=null bash tests/special_e2e/run_ppo_trainer_megatron.sh +actor_rollout_ref.actor.megatron.override_transformer_config.account_for_embedding_in_pipeline_split=true +actor_rollout_ref.actor.megatron.override_transformer_config.account_for_loss_in_pipeline_split=true
289
+ - name: Test Megatron checkpoints merging function (DeepSeek Actor and Critic)
290
+ run: |
291
+ exp_name="deepseek-coder-1.3b-instruct-megatron-gsm8k-minimal"
292
+ python -m verl.model_merger test --backend megatron --local_dir checkpoints/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface
293
+ python -m verl.model_merger test --backend megatron --is-value-model --local_dir checkpoints/verl-test/${exp_name}/global_step_1/critic --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/critic/huggingface
294
+ - name: clean up
295
+ run: |
296
+ rm -rf checkpoints
297
+ e2e_ppo_trainer_megatron-moe-expert-parallel:
298
+ needs: setup
299
+ runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
300
+ timeout-minutes: 60 # Increase this timeout value as needed
301
+ env:
302
+ HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
303
+ HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
304
+ NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
305
+ HF_ENDPOINT: "https://hf-mirror.com"
306
+ HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
307
+ steps:
308
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
309
+ with:
310
+ fetch-depth: 0
311
+ - name: Install the current repository
312
+ run: |
313
+ pip3 install --no-deps -e .[test]
314
+ - name: Prepare GSM8K dataset
315
+ run: |
316
+ python3 examples/data_preprocess/gsm8k.py
317
+ - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (DeepSeek)
318
+ run: |
319
+ ray stop --force
320
+ ADV_ESTIMATOR=grpo USE_DUMMY_MODEL=True DUMMY_MODEL_CONFIG_PATH=tests/special_e2e/ppo_trainer/expert_parallel/qwen2moe_minimal.json \
321
+ PPO_MAX_TOKEN_LEN=512 FWD_MAX_TOKEN_LEN=512 \
322
+ MAX_PROMPT_LENGTH=256 MAX_RESPONSE_LENGTH=256 \
323
+ MODEL_ID=Qwen/Qwen1.5-MoE-A2.7B-Chat \
324
+ COMMON_PP=2 COMMON_VPP=null COMMON_CP=1 COMMON_TP=4 COMMON_EP=4 COMMON_ETP=1 INFER_TP=8 \
325
+ USE_DIST_CKPT=True ALL_OFFLOAD=True SKIP_SAVE_HF_MODEL=1 bash tests/special_e2e/run_ppo_trainer_megatron.sh
326
+ - name: clean up
327
+ run: |
328
+ rm -rf checkpoints
329
+ e2e_ppo_trainer_megatron-qwen2_5vl-3b:
330
+ needs: setup
331
+ runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
332
+ timeout-minutes: 60 # Increase this timeout value as needed
333
+ env:
334
+ HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
335
+ HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
336
+ NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
337
+ HF_ENDPOINT: "https://hf-mirror.com"
338
+ HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
339
+ steps:
340
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
341
+ with:
342
+ fetch-depth: 0
343
+ - name: Install the current repository
344
+ run: |
345
+ pip3 install --no-deps -e .[test]
346
+ - name: Prepare Geo3k dataset
347
+ run: |
348
+ python3 examples/data_preprocess/geo3k.py
349
+ - name: Prepare dist_ckpt of Qwen2.5-VL-3B, only supports dist_ckpt
350
+ run: |
351
+ python3 scripts/converter_hf_to_mcore.py --hf_model_path ${HOME}/models/Qwen/Qwen2.5-VL-3B-Instruct --output_path checkpoints/verl-test/qwen2.5-vl-3b-megatron
352
+ - name: Running Geo3k E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen)
353
+ run: |
354
+ ray stop --force
355
+ TRAIN_FILES=${HOME}/data/geo3k/train.parquet VAL_FILES=${HOME}/data/geo3k/test.parquet MAX_PROMPT_LENGTH=1024 MAX_RESPONSE_LENGTH=2048 MODEL_ID=Qwen/Qwen2.5-VL-3B-Instruct ADV_ESTIMATOR=grpo USE_DYNAMIC_BSZ=False SKIP_SAVE_HF_MODEL=1 COMMON_PP=4 COMMON_VPP=null COMMON_CP=1 COMMON_TP=2 USE_DIST_CKPT=true DIST_CKPT_PATH=checkpoints/verl-test/qwen2.5-vl-3b-megatron bash tests/special_e2e/run_ppo_trainer_megatron.sh
356
+ - name: clean up
357
+ run: |
358
+ rm -rf checkpoints
359
+
360
+ cleanup:
361
+ runs-on: ubuntu-latest
362
+ needs: [setup,
363
+ e2e_ppo_trainer_megatron-deepseek,
364
+ e2e_ppo_trainer_megatron-qwen3,
365
+ e2e_ppo_trainer_megatron-different-train-infer-tp-qwen-tie-embedding,
366
+ e2e_ppo_trainer_megatron-qwen-override-transformer-config,
367
+ e2e_ppo_trainer_megatron-deepseek-override-transformer-config,
368
+ e2e_ppo_trainer_megatron-moe-expert-parallel,
369
+ e2e_ppo_trainer_megatron-qwen2_5vl-3b]
370
+ if: always()
371
+ steps:
372
+ - name: remove runner
373
+ run: |
374
+ if [[ -z "${{ needs.setup.outputs.mlp-task-id }}" ]]; then
375
+ echo "no need remove runner, skip"
376
+ exit 0
377
+ fi
378
+ resp=$(curl -X POST "${{ env.DYNAMIC_RUNNER_ENDPOINT }}/delete" \
379
+ -d '{"TaskId": "${{ needs.setup.outputs.mlp-task-id }}"}')
.github/workflows/e2e_sft.yml ADDED
@@ -0,0 +1,121 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # # Tests layout
2
+
3
+ # Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
4
+ # - `tests/trainer` for testing functionality related to `verl/trainer`
5
+ # - `tests/models` for testing functionality related to `verl/models`
6
+ # - ...
7
+
8
+ # There are a few folders with `special_` prefix, created for special purposes:
9
+ # - `special_distributed`: unit tests that must run with multiple GPUs
10
+ # - `special_e2e`: end-to-end tests with training/generation scripts
11
+ # - `special_npu`: tests for NPUs
12
+ # - `special_sanity`: a suite of quick sanity tests
13
+ # - `special_standalone`: a set of test that are designed to run in dedicated environments
14
+
15
+ # Accelerators for tests
16
+ # - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
17
+ # - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
18
+
19
+ # # Workflow layout
20
+
21
+ # All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
22
+ # 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
23
+ # 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
24
+ # 3. End-to-end tests: `e2e_*.yml`
25
+ # 4. Unit tests
26
+ # - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
27
+ # - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
28
+ # - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
29
+ # - new workflow yaml is added to `.github/workflows`
30
+ # - new tests are added to workflow mentioned in 2.
31
+
32
+ name: e2e_sft
33
+
34
+ on:
35
+ # Trigger the workflow on push or pull request,
36
+ # but only for the main branch
37
+ push:
38
+ branches:
39
+ - main
40
+ - v0.*
41
+ pull_request:
42
+ branches:
43
+ - main
44
+ - v0.*
45
+ paths:
46
+ - "**/*.py"
47
+ # Other entrypoints
48
+ - "!examples/**"
49
+ - "!tests/**"
50
+ - "!verl/trainer/main_*.py"
51
+ - "!verl/trainer/fsdp_sft_trainer.py"
52
+ # Recipes
53
+ - "!recipe/**"
54
+ # Megatron
55
+ - "!verl/workers/**/megatron_*.py"
56
+ # Entrypoints
57
+ - ".github/workflows/e2e_sft.yml"
58
+ - "examples/data_preprocess/gsm8k.py"
59
+ - "tests/special_e2e/sft"
60
+ - "verl/trainer/fsdp_sft_trainer.py"
61
+ - "verl/trainer/config/sft_trainer.yaml"
62
+
63
+ # Cancel jobs on the same ref if a new one is triggered
64
+ concurrency:
65
+ group: ${{ github.workflow }}-${{ github.ref }}
66
+ cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
67
+
68
+ # Declare permissions just read content.
69
+ permissions:
70
+ contents: read
71
+
72
+ jobs:
73
+ e2e_sft:
74
+ runs-on: [L20x8]
75
+ timeout-minutes: 20 # Increase this timeout value as needed
76
+ env:
77
+ HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
78
+ HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
79
+ NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
80
+ HF_ENDPOINT: "https://hf-mirror.com"
81
+ HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
82
+ container:
83
+ image: whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6.post5-mcore0.12.0-te2.3
84
+ options: --gpus all --shm-size=10g
85
+ steps:
86
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
87
+ with:
88
+ fetch-depth: 0
89
+ - name: Install the current repository
90
+ run: |
91
+ pip3 install peft
92
+ pip3 install --no-deps -e .[test,gpu]
93
+ - name: Prepare gsm8k dataset
94
+ run: |
95
+ ray stop --force
96
+ python3 examples/data_preprocess/gsm8k.py
97
+ - name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm
98
+ run: |
99
+ ray stop --force
100
+ bash tests/special_e2e/sft/run_sft.sh
101
+ - name: Running GSM8K E2E training tests on 8 L20 GPUs w/o rmpad using function rm
102
+ run: |
103
+ ray stop --force
104
+ RM_PAD=False bash tests/special_e2e/sft/run_sft.sh
105
+ - name: Running GSM8K E2E training tests on 8 L20 GPUs with sequence parallism
106
+ run: |
107
+ ray stop --force
108
+ SP_SIZE=2 bash tests/special_e2e/sft/run_sft.sh
109
+ - name: Check loss difference between sequence parallel vs. default implementation
110
+ run: |
111
+ ray stop --force
112
+ ENTRYPOINT="tests/special_e2e/sft/test_sp_loss_match.py" SP_SIZE=2 bash tests/special_e2e/sft/run_sft.sh
113
+ - name: Running GSM8K E2E training tests on 8 L20 GPUs with sequence parallism and liger
114
+ run: |
115
+ ray stop --force
116
+ SP_SIZE=2 LIGER=True bash tests/special_e2e/sft/run_sft.sh
117
+ - name: Running GSM8K E2E training tests with LoRA
118
+ run: |
119
+ ray stop --force
120
+ LORA_RANK=32 bash tests/special_e2e/sft/run_sft.sh
121
+ # TODO: multiturn
.github/workflows/e2e_spin.yml ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: e2e_spin
2
+
3
+ on:
4
+ # Trigger the workflow on push or pull request,
5
+ # but only for the main branch
6
+ push:
7
+ branches:
8
+ - main
9
+ - v0.*
10
+ paths:
11
+ - "**/*.py"
12
+ # Other entrypoints
13
+ - "!examples/**"
14
+ - "!tests/**"
15
+ - "!verl/trainer/main_*.py"
16
+ - "!verl/trainer/fsdp_sft_trainer.py"
17
+ # Other recipes
18
+ - "!recipe/**"
19
+ # Megatron
20
+ - "!verl/workers/**/megatron_*.py"
21
+ # Home
22
+ - "recipe/spin"
23
+ # Entrypoints
24
+ - ".github/workflows/e2e_spin.yml"
25
+ - "examples/data_preprocess/gsm8k.py"
26
+ - "tests/special_e2e/run_spin.sh"
27
+ - "!examples"
28
+ pull_request:
29
+ branches:
30
+ - main
31
+ - v0.*
32
+ paths:
33
+ - "**/*.py"
34
+ # Other entrypoints
35
+ - "!examples/**"
36
+ - "!tests/**"
37
+ - "!verl/trainer/main_*.py"
38
+ - "!verl/trainer/fsdp_sft_trainer.py"
39
+ # Other recipes
40
+ - "!recipe/**"
41
+ # Megatron
42
+ - "!verl/workers/**/megatron_*.py"
43
+ # Home
44
+ - "recipe/spin"
45
+ # Entrypoints
46
+ - ".github/workflows/e2e_spin.yml"
47
+ - "examples/data_preprocess/gsm8k.py"
48
+ - "tests/special_e2e/run_spin.sh"
49
+ - "!examples"
50
+
51
+ # Declare permissions just read content.
52
+ permissions:
53
+ contents: read
54
+
55
+ # Cancel jobs on the same ref if a new one is triggered
56
+ concurrency:
57
+ group: ${{ github.workflow }}-${{ github.ref }}
58
+ cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
59
+
60
+ jobs:
61
+ e2e_spin:
62
+ runs-on: [L20x8]
63
+ timeout-minutes: 40 # Increase this timeout value as needed
64
+ env:
65
+ HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
66
+ HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
67
+ NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
68
+ HF_ENDPOINT: "https://hf-mirror.com"
69
+ HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
70
+ container:
71
+ image: ocss884/verl-sglang:ngc-th2.6.0-cu126-sglang0.4.5.post3
72
+ options: --gpus all --shm-size=10g
73
+ steps:
74
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
75
+ with:
76
+ fetch-depth: 0
77
+ - name: Install the current repository
78
+ run: |
79
+ pip3 install -e .[test,gpu,sglang] --no-deps
80
+ - name: Prepare gsm8k dataset
81
+ run: |
82
+ python3 examples/data_preprocess/gsm8k.py --local_dir ./data/gsm8k
83
+ - name: Prepare Model checkpoint
84
+ run: |
85
+ huggingface-cli download Qwen/Qwen2.5-0.5B-Instruct --local-dir ./models/Qwen2.5-0.5B-Instruct
86
+ - name: Running the E2E test with the spin algorithm
87
+ run: |
88
+ ray stop --force
89
+ bash tests/special_e2e/run_spin.sh
.github/workflows/e2e_sppo.yml ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: e2e_sppo
2
+
3
+ on:
4
+ # Trigger the workflow on push or pull request,
5
+ # but only for the main branch
6
+ push:
7
+ branches:
8
+ - main
9
+ - v0.*
10
+ paths:
11
+ - "**/*.py"
12
+ # Other entrypoints
13
+ - "!examples/**"
14
+ - "!tests/**"
15
+ - "!verl/trainer/main_*.py"
16
+ - "!verl/trainer/fsdp_sft_trainer.py"
17
+ # Other recipes
18
+ - "!recipe/**"
19
+ # Megatron
20
+ - "!verl/workers/**/megatron_*.py"
21
+ # Home
22
+ - "recipe/sppo"
23
+ # Entrypoints
24
+ - ".github/workflows/e2e_sppo.yml"
25
+ - "examples/data_preprocess/gsm8k.py"
26
+ - "tests/special_e2e/run_sppo.sh"
27
+ pull_request:
28
+ branches:
29
+ - main
30
+ - v0.*
31
+ paths:
32
+ - "**/*.py"
33
+ # Other entrypoints
34
+ - "!examples/**"
35
+ - "!tests/**"
36
+ - "!verl/trainer/main_*.py"
37
+ - "!verl/trainer/fsdp_sft_trainer.py"
38
+ # Other recipes
39
+ - "!recipe/**"
40
+ # Megatron
41
+ - "!verl/workers/**/megatron_*.py"
42
+ # Home
43
+ - "recipe/sppo"
44
+ # Entrypoints
45
+ - ".github/workflows/e2e_sppo.yml"
46
+ - "examples/data_preprocess/gsm8k.py"
47
+ - "tests/special_e2e/run_sppo.sh"
48
+
49
+ # Declare permissions just read content.
50
+ permissions:
51
+ contents: read
52
+
53
+ # Cancel jobs on the same ref if a new one is triggered
54
+ concurrency:
55
+ group: ${{ github.workflow }}-${{ github.ref }}
56
+ cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
57
+
58
+ jobs:
59
+ e2e_sppo:
60
+ runs-on: [L20x8]
61
+ timeout-minutes: 40 # Increase this timeout value as needed
62
+ env:
63
+ HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
64
+ HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
65
+ NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
66
+ HF_ENDPOINT: "https://hf-mirror.com"
67
+ HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
68
+ container:
69
+ image: whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6.post5-mcore0.12.0-te2.3
70
+ options: --gpus all --shm-size=10g
71
+ steps:
72
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
73
+ with:
74
+ fetch-depth: 0
75
+ - name: Install the current repository
76
+ run: |
77
+ pip3 install -e .[test,gpu,sglang] --no-deps
78
+ - name: Prepare MATH dataset
79
+ run: |
80
+ python3 examples/data_preprocess/math_dataset.py --local_dir ./data/math
81
+ - name: Prepare Model checkpoint
82
+ run: |
83
+ huggingface-cli download Qwen/Qwen2.5-0.5B-Instruct --local-dir ./models/Qwen2.5-0.5B-Instruct
84
+ - name: Running the E2E test with the SPPO algorithm
85
+ run: |
86
+ ray stop --force
87
+ bash tests/special_e2e/run_sppo.sh
.github/workflows/gpu_unit_tests.yml ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # # Tests layout
2
+
3
+ # Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
4
+ # - `tests/trainer` for testing functionality related to `verl/trainer`
5
+ # - `tests/models` for testing functionality related to `verl/models`
6
+ # - ...
7
+
8
+ # There are a few folders with `special_` prefix, created for special purposes:
9
+ # - `special_distributed`: unit tests that must run with multiple GPUs
10
+ # - `special_e2e`: end-to-end tests with training/generation scripts
11
+ # - `special_npu`: tests for NPUs
12
+ # - `special_sanity`: a suite of quick sanity tests
13
+ # - `special_standalone`: a set of test that are designed to run in dedicated environments
14
+
15
+ # Accelerators for tests
16
+ # - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
17
+ # - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
18
+
19
+ # # Workflow layout
20
+
21
+ # All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
22
+ # 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
23
+ # 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
24
+ # 3. End-to-end tests: `e2e_*.yml`
25
+ # 4. Unit tests
26
+ # - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
27
+ # - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
28
+ # - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
29
+ # - new workflow yaml is added to `.github/workflows`
30
+ # - new tests are added to workflow mentioned in 2.
31
+
32
+ name: GPU unit tests
33
+
34
+ on:
35
+ # Trigger the workflow on push or pull request,
36
+ # but only for the main branch
37
+ push:
38
+ branches:
39
+ - main
40
+ - v0.4.x
41
+ paths:
42
+ - "**/*.py"
43
+ - .github/workflows/gpu_unit_tests.yml
44
+ pull_request:
45
+ branches:
46
+ - main
47
+ - v0.4.x
48
+ paths:
49
+ # The order that you define paths patterns matters:
50
+ # A matching negative pattern (prefixed with !) after a positive match will exclude the path.
51
+ # A matching positive pattern after a negative match will include the path again.
52
+ - "**/*.py"
53
+ # Other entrypoints
54
+ - "!examples/**"
55
+ - "!verl/trainer/main_*.py"
56
+ - "!verl/trainer/fsdp_sft_trainer.py"
57
+ - "!recipe/**"
58
+ # Entrypoints
59
+ - .github/workflows/gpu_unit_tests.yml
60
+ - "tests/**test_*.py"
61
+ # Ignore CPU tests
62
+ - "!tests/*_on_cpu.py"
63
+
64
+ # Cancel jobs on the same ref if a new one is triggered
65
+ concurrency:
66
+ group: ${{ github.workflow }}-${{ github.ref }}
67
+ cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
68
+
69
+ # Declare permissions just read content.
70
+ permissions:
71
+ contents: read
72
+
73
+ jobs:
74
+ gpu_unit_tests:
75
+ runs-on: [L20x8]
76
+ timeout-minutes: 40 # Increase this timeout value as needed
77
+ env:
78
+ HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
79
+ HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
80
+ NO_PROXY: "localhost,127.0.0.1"
81
+ HF_HUB_ENABLE_HF_TRANSFER: 1
82
+ container:
83
+ image: whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6.post5-mcore0.12.0-te2.3
84
+ options: --gpus all --shm-size=10g
85
+ steps:
86
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
87
+ with:
88
+ fetch-depth: 0
89
+ - name: Install the current repository
90
+ run: |
91
+ pip3 install hf_transfer
92
+ pip3 install --no-deps -e .[test]
93
+ pip3 install --upgrade "ray>=2.40.0"
94
+ pip3 install cupy-cuda12x
95
+ - name: Run all GPU unit tests
96
+ run: |
97
+ pytest -s -x --ignore-glob="*test_linear_cross_entropy_tp.py" --ignore-glob='*on_cpu.py' --ignore-glob="*test_vllm*" --ignore-glob="*_sglang*" --ignore-glob="*_hf_rollout*" --ignore-glob="tests/models/" --ignore-glob='tests/special*' tests/
98
+ - name: Testing LinearCrossEntropyTP Correctness, Computation Time and Memory Consumption
99
+ run: |
100
+ LOW_MEMORY=True torchrun --standalone --nnodes=1 --nproc-per-node=8 tests/utils/test_linear_cross_entropy_tp.py
.github/workflows/model.yml ADDED
@@ -0,0 +1,144 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # # Tests layout
2
+
3
+ # Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
4
+ # - `tests/trainer` for testing functionality related to `verl/trainer`
5
+ # - `tests/models` for testing functionality related to `verl/models`
6
+ # - ...
7
+
8
+ # There are a few folders with `special_` prefix, created for special purposes:
9
+ # - `special_distributed`: unit tests that must run with multiple GPUs
10
+ # - `special_e2e`: end-to-end tests with training/generation scripts
11
+ # - `special_npu`: tests for NPUs
12
+ # - `special_sanity`: a suite of quick sanity tests
13
+ # - `special_standalone`: a set of test that are designed to run in dedicated environments
14
+
15
+ # Accelerators for tests
16
+ # - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
17
+ # - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
18
+
19
+ # # Workflow layout
20
+
21
+ # All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
22
+ # 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
23
+ # 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
24
+ # 3. End-to-end tests: `e2e_*.yml`
25
+ # 4. Unit tests
26
+ # - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
27
+ # - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
28
+ # - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
29
+ # - new workflow yaml is added to `.github/workflows`
30
+ # - new tests are added to workflow mentioned in 2.
31
+ # name: Check PR Title
32
+
33
+ name: model_rmpad
34
+
35
+ on:
36
+ # Trigger the workflow on push or pull request,
37
+ # but only for the main branch
38
+ push:
39
+ branches:
40
+ - main
41
+ - v0.*
42
+ pull_request:
43
+ branches:
44
+ - main
45
+ - v0.*
46
+ paths:
47
+ - "verl/**/*.py"
48
+ # Entrypoints
49
+ - ".github/workflows/model.yml"
50
+ - "tests/special_distributed/test_fsdp_ckpt.py"
51
+ - "tests/models/**"
52
+ - "tests/special_distributed/run_all.sh"
53
+
54
+ # Declare permissions just read content.
55
+ permissions:
56
+ contents: read
57
+
58
+ # Cancel jobs on the same ref if a new one is triggered
59
+ concurrency:
60
+ group: ${{ github.workflow }}-${{ github.ref }}
61
+ cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
62
+
63
+ jobs:
64
+ model_rmpad:
65
+ runs-on: [L20x8]
66
+ timeout-minutes: 20 # Increase this timeout value as needed
67
+ env:
68
+ HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
69
+ HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
70
+ NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
71
+ HF_ENDPOINT: "https://hf-mirror.com"
72
+ HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
73
+ container:
74
+ image: whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6.post5-mcore0.12.0-te2.3
75
+ options: --gpus all --shm-size=10g
76
+ steps:
77
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
78
+ with:
79
+ fetch-depth: 0
80
+ - name: Install the current repository and upgrade to latest transformers/flash_attn
81
+ run: |
82
+ pip3 install --no-deps -e .[test]
83
+ pip3 install --upgrade transformers
84
+ - name: Running rmpad model tests on 8 L20 GPUs + flash_attn 2.5.8
85
+ run: |
86
+ pytest -s tests/models/test_transformer.py
87
+ - name: Running rmpad model tests on 8 L20 GPUs + latest flash_attn
88
+ run: |
89
+ pytest -s tests/models/test_transformer.py
90
+ - name: Running FSDP rmpad model tests on 8 L20 GPUs + latest flash_attn
91
+ run: |
92
+ STRATEGY=fsdp torchrun --nproc_per_node=8 tests/special_distributed/test_fsdp_ckpt.py
93
+ - name: Running transformers ulysses tests on 8 L20 GPUs + latest transformers
94
+ run: |
95
+ torchrun --nproc_per_node=8 -m pytest tests/models/test_transformers_ulysses.py
96
+ - name: Running transformers ulysses tests on 8 L20 GPUs + transformers 4.49.0
97
+ run: |
98
+ pip3 install transformers==4.49.0
99
+ torchrun --nproc_per_node=8 -m pytest tests/models/test_transformers_ulysses.py
100
+ - name: Running transformers ulysses tests on 8 L20 GPUs + transformers 4.48.0
101
+ run: |
102
+ pip3 install transformers==4.48.0
103
+ torchrun --nproc_per_node=8 -m pytest tests/models/test_transformers_ulysses.py
104
+ - name: Running transformers ulysses tests on 8 L20 GPUs + transformers 4.47.0
105
+ run: |
106
+ pip3 install transformers==4.47.0
107
+ torchrun --nproc_per_node=8 -m pytest tests/models/test_transformers_ulysses.py
108
+ - name: Running transformers ulysses tests on 8 L20 GPUs + transformers 4.46.0
109
+ run: |
110
+ pip3 install transformers==4.46.0
111
+ torchrun --nproc_per_node=8 -m pytest tests/models/test_transformers_ulysses.py
112
+ - name: Running transformers ulysses tests on 8 L20 GPUs + transformers 4.45.0
113
+ run: |
114
+ pip3 install transformers==4.45.0
115
+ torchrun --nproc_per_node=8 -m pytest tests/models/test_transformers_ulysses.py
116
+ - name: Run distributed test
117
+ run: |
118
+ bash tests/special_distributed/run_all.sh
119
+
120
+ # TODO: Move this back to model_rmpad once FSDP2 is stable.
121
+ # NOTE: List as an independent job to make rerun easier.
122
+ model_rmpad_fsdp2_unstable:
123
+ runs-on: [L20x8]
124
+ timeout-minutes: 20 # Increase this timeout value as needed
125
+ env:
126
+ HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
127
+ HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
128
+ NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
129
+ HF_ENDPOINT: "https://hf-mirror.com"
130
+ HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
131
+ container:
132
+ image: whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6.post5-mcore0.12.0-te2.3
133
+ options: --gpus all --shm-size=10g
134
+ steps:
135
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
136
+ with:
137
+ fetch-depth: 0
138
+ - name: Install the current repository and upgrade to latest transformers/flash_attn
139
+ run: |
140
+ pip3 install --no-deps -e .[test]
141
+ pip3 install --upgrade transformers
142
+ - name: Running FSDP2 rmpad model tests on 8 L20 GPUs + latest flash_attn
143
+ run: |
144
+ STRATEGY=fsdp2 torchrun --nproc_per_node=8 tests/special_distributed/test_fsdp_ckpt.py
.github/workflows/pre-commit-full.yml ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: pre-commit-full
2
+
3
+ # Run weekly on Sunday at 00:00 UTC
4
+ on:
5
+ schedule:
6
+ - cron: "0 0 * * 0"
7
+ # Allow manual triggering
8
+ workflow_dispatch:
9
+
10
+ # Declare permissions just read content.
11
+ permissions:
12
+ contents: read
13
+
14
+ jobs:
15
+ pre-commit-full:
16
+ runs-on: ubuntu-latest
17
+ strategy:
18
+ matrix:
19
+ python-version: ["3.12"]
20
+ steps:
21
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
22
+ - name: Set up Python ${{ matrix.python-version }}
23
+ uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0
24
+ with:
25
+ python-version: ${{ matrix.python-version }}
26
+ - name: Set ruff --output-format=github
27
+ run: |
28
+ sed -i 's/--output-format=full/--output-format=github/' .pre-commit-config.yaml
29
+ git add .pre-commit-config.yaml
30
+ - uses: pre-commit/[email protected]
.github/workflows/pre-commit.yml ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # c.f. https://github.com/pre-commit/action?tab=readme-ov-file#using-this-action
2
+ name: pre-commit
3
+
4
+ # No need to avoid / cancel lightweight pre-commit jobs
5
+ on:
6
+ pull_request:
7
+ push:
8
+ branches:
9
+ - main
10
+ - v0.*
11
+
12
+ # Declare permissions just read content.
13
+ permissions:
14
+ contents: read
15
+
16
+ jobs:
17
+ pre-commit:
18
+ runs-on: ubuntu-latest
19
+ strategy:
20
+ matrix:
21
+ python-version: ["3.12"]
22
+ steps:
23
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
24
+ - name: Set up Python ${{ matrix.python-version }}
25
+ uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0
26
+ with:
27
+ python-version: ${{ matrix.python-version }}
28
+ - name: Set ruff --output-format=github
29
+ run: |
30
+ sed -i 's/--output-format=full/--output-format=github/' .pre-commit-config.yaml
31
+ git add .pre-commit-config.yaml
32
+ # Check "--all-files" by default
33
+ - uses: pre-commit/[email protected]
.github/workflows/sanity.yml ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # # Tests layout
2
+
3
+ # Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
4
+ # - `tests/trainer` for testing functionality related to `verl/trainer`
5
+ # - `tests/models` for testing functionality related to `verl/models`
6
+ # - ...
7
+
8
+ # There are a few folders with `special_` prefix, created for special purposes:
9
+ # - `special_distributed`: unit tests that must run with multiple GPUs
10
+ # - `special_e2e`: end-to-end tests with training/generation scripts
11
+ # - `special_npu`: tests for NPUs
12
+ # - `special_sanity`: a suite of quick sanity tests
13
+ # - `special_standalone`: a set of test that are designed to run in dedicated environments
14
+
15
+ # Accelerators for tests
16
+ # - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
17
+ # - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
18
+
19
+ # # Workflow layout
20
+
21
+ # All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
22
+ # 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
23
+ # 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
24
+ # 3. End-to-end tests: `e2e_*.yml`
25
+ # 4. Unit tests
26
+ # - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
27
+ # - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
28
+ # - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
29
+ # - new workflow yaml is added to `.github/workflows`
30
+ # - new tests are added to workflow mentioned in 2.
31
+ # name: Check PR Title
32
+
33
+ name: sanity
34
+
35
+ on:
36
+ # Trigger the workflow on push or pull request,
37
+ # but only for the main branch
38
+ push:
39
+ branches:
40
+ - main
41
+ - v0.*
42
+ pull_request:
43
+ branches:
44
+ - main
45
+ - v0.*
46
+ paths:
47
+ - "**/*.py"
48
+ - .github/workflows/sanity.yml
49
+ - "tests/special_sanity/**"
50
+
51
+ # Cancel jobs on the same ref if a new one is triggered
52
+ concurrency:
53
+ group: ${{ github.workflow }}-${{ github.ref }}
54
+ cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
55
+
56
+ # Declare permissions just read content.
57
+ permissions:
58
+ contents: read
59
+
60
+ jobs:
61
+ sanity:
62
+ runs-on: ubuntu-latest
63
+ timeout-minutes: 5 # Increase this timeout value as needed
64
+ strategy:
65
+ matrix:
66
+ python-version: ["3.10"]
67
+ steps:
68
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
69
+ - name: Set up Python ${{ matrix.python-version }}
70
+ uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0
71
+ with:
72
+ python-version: ${{ matrix.python-version }}
73
+ - name: Install the current repository
74
+ run: |
75
+ pip install -e .[test]
76
+ - name: Run sanity test
77
+ run: |
78
+ pytest -s -x tests/special_sanity
79
+ - name: Run license test
80
+ run: |
81
+ python3 tests/special_sanity/check_license.py --directory .
82
+ - name: Assert naming convention
83
+ run: |
84
+ if grep -rIn --exclude-dir=.git --exclude-dir=.github --exclude-dir=venv --exclude-dir=__pycache__ 'veRL' .; then
85
+ echo "Please use verl instead of veRL in the codebase"
86
+ exit 1
87
+ fi
88
+ - name: Validate test folder structure
89
+ run: python3 tests/special_sanity/validate_structure.py
90
+ - name: Assert documentation requirement for functions
91
+ run: python3 tests/special_sanity/validate_imported_docs.py
92
+ - name: Assert device api usage in verl/recipe
93
+ run: python3 tests/special_sanity/check_device_api_usage.py --directory ./recipe
94
+ - name: Assert device api usage in verl/verl
95
+ run: python3 tests/special_sanity/check_device_api_usage.py --directory ./verl
.github/workflows/scorecard.yml ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # This workflow uses actions that are not certified by GitHub. They are provided
2
+ # by a third-party and are governed by separate terms of service, privacy
3
+ # policy, and support documentation.
4
+
5
+ name: Scorecard supply-chain security
6
+ on:
7
+ # For Branch-Protection check. Only the default branch is supported. See
8
+ # https://github.com/ossf/scorecard/blob/main/docs/checks.md#branch-protection
9
+ branch_protection_rule:
10
+ # To guarantee Maintained check is occasionally updated. See
11
+ # https://github.com/ossf/scorecard/blob/main/docs/checks.md#maintained
12
+ schedule:
13
+ - cron: "27 7 * * 1"
14
+ push:
15
+ branches:
16
+ - main
17
+ - v0.*
18
+
19
+ # Declare default permissions as read only.
20
+ permissions: read-all
21
+
22
+ jobs:
23
+ analysis:
24
+ name: Scorecard analysis
25
+ runs-on: ubuntu-latest
26
+ permissions:
27
+ # Needed to upload the results to code-scanning dashboard.
28
+ security-events: write
29
+ # Needed to publish results and get a badge (see publish_results below).
30
+ id-token: write
31
+ # Uncomment the permissions below if installing in a private repository.
32
+ # contents: read
33
+ # actions: read
34
+
35
+ steps:
36
+ - name: "Checkout code"
37
+ uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
38
+ with:
39
+ persist-credentials: false
40
+
41
+ - name: "Run analysis"
42
+ uses: ossf/scorecard-action@0864cf19026789058feabb7e87baa5f140aac736 # v2.3.1
43
+ with:
44
+ results_file: results.sarif
45
+ results_format: sarif
46
+ # (Optional) "write" PAT token. Uncomment the `repo_token` line below if:
47
+ # - you want to enable the Branch-Protection check on a *public* repository, or
48
+ # - you are installing Scorecard on a *private* repository
49
+ # To create the PAT, follow the steps in https://github.com/ossf/scorecard-action?tab=readme-ov-file#authentication-with-fine-grained-pat-optional.
50
+ # repo_token: ${{ secrets.SCORECARD_TOKEN }}
51
+
52
+ # Public repositories:
53
+ # - Publish results to OpenSSF REST API for easy access by consumers
54
+ # - Allows the repository to include the Scorecard badge.
55
+ # - See https://github.com/ossf/scorecard-action#publishing-results.
56
+ # For private repositories:
57
+ # - `publish_results` will always be set to `false`, regardless
58
+ # of the value entered here.
59
+ publish_results: true
60
+
61
+ # Upload the results to GitHub's code scanning dashboard (optional).
62
+ # Commenting out will disable upload of results to your repo's Code Scanning dashboard
63
+ - name: "Upload to code-scanning"
64
+ uses: github/codeql-action/upload-sarif@9e8d0789d4a0fa9ceb6b1738f7e269594bdd67f0 #v3.28.9
65
+ with:
66
+ sarif_file: results.sarif
.github/workflows/secrets_scan.yml ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ on:
2
+ push:
3
+ branches:
4
+ - main
5
+ - v0.*
6
+ pull_request:
7
+
8
+ permissions:
9
+ contents: read
10
+
11
+ jobs:
12
+ test:
13
+ runs-on: ubuntu-latest
14
+ steps:
15
+ - name: Checkout code
16
+ uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
17
+ with:
18
+ fetch-depth: 0
19
+ - name: Secret Scanning
20
+ uses: trufflesecurity/trufflehog@7dc056a193116ba8d82154bf0549381c8fb8545c # v3.88.14
21
+ with:
22
+ extra_args: --results=verified,unknown
.github/workflows/sgl.yml ADDED
@@ -0,0 +1,124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # # Tests layout
2
+
3
+ # Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
4
+ # - `tests/trainer` for testing functionality related to `verl/trainer`
5
+ # - `tests/models` for testing functionality related to `verl/models`
6
+ # - ...
7
+
8
+ # There are a few folders with `special_` prefix, created for special purposes:
9
+ # - `special_distributed`: unit tests that must run with multiple GPUs
10
+ # - `special_e2e`: end-to-end tests with training/generation scripts
11
+ # - `special_npu`: tests for NPUs
12
+ # - `special_sanity`: a suite of quick sanity tests
13
+ # - `special_standalone`: a set of test that are designed to run in dedicated environments
14
+
15
+ # Accelerators for tests
16
+ # - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
17
+ # - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
18
+
19
+ # # Workflow layout
20
+
21
+ # All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
22
+ # 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
23
+ # 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
24
+ # 3. End-to-end tests: `e2e_*.yml`
25
+ # 4. Unit tests
26
+ # - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
27
+ # - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
28
+ # - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
29
+ # - new workflow yaml is added to `.github/workflows`
30
+ # - new tests are added to workflow mentioned in 2.
31
+
32
+ name: sgl
33
+
34
+ on:
35
+ workflow_dispatch: # Manual
36
+ # Trigger the workflow on push or pull request,
37
+ # but only for the main branch
38
+ push:
39
+ branches:
40
+ - main
41
+ - v0.2.x
42
+ paths:
43
+ - "**/*.py"
44
+ - .github/workflows/vllm.yml
45
+ pull_request:
46
+ branches:
47
+ - main
48
+ - v0.2.x
49
+ paths:
50
+ - "**/*.py"
51
+ # Other entrypoints
52
+ - "!examples/**"
53
+ - "!tests/**"
54
+ - "!verl/trainer/main_*.py"
55
+ - "!verl/trainer/fsdp_sft_trainer.py"
56
+ # FSDP
57
+ - "!verl/workers/**/*dp_*.py"
58
+ # Megatron
59
+ - "!verl/workers/**/megatron_*.py"
60
+ # vLLM
61
+ - "!**/*vllm*"
62
+ # Recipes
63
+ - "!recipe/**"
64
+ # Entrypoints
65
+ - ".github/workflows/sgl.yml"
66
+ - "tests/rollout/*sglang*"
67
+ - "tests/rollout/async_rollout_utils.py"
68
+
69
+ # Cancel jobs on the same ref if a new one is triggered
70
+ concurrency:
71
+ group: ${{ github.workflow }}-${{ github.ref }}
72
+ cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
73
+
74
+ # Declare permissions just read content.
75
+ permissions:
76
+ contents: read
77
+
78
+ jobs:
79
+ sgl:
80
+ runs-on: [L20x8]
81
+ timeout-minutes: 20 # Increase this timeout value as needed
82
+ env:
83
+ HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
84
+ HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
85
+ NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
86
+ HF_ENDPOINT: "https://hf-mirror.com"
87
+ HF_HUB_ENABLE_HF_TRANSFER: 1
88
+ SGL_DISABLE_TP_MEMORY_INBALANCE_CHECK: "True"
89
+ container:
90
+ image: whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6.post5-mcore0.12.0-te2.3
91
+ options: --gpus all --shm-size=10g
92
+ steps:
93
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
94
+ with:
95
+ fetch-depth: 0
96
+ - name: Install the current repository
97
+ run: |
98
+ pip3 install hf_transfer fastmcp
99
+ pip3 install -e .[test,gpu,sglang] --no-deps
100
+ - name: Download Model to Use
101
+ run: |
102
+ huggingface-cli download 'Qwen/Qwen2-7B-Instruct'
103
+ export HF_HUB_OFFLINE=1
104
+ - name: Test the latest SGLang
105
+ run: |
106
+ cd tests/workers/rollout
107
+ torchrun --nnodes=1 --nproc_per_node=2 $(which pytest) -s test_sglang_spmd.py
108
+ - name: Test the latest SGLang Rollout async with tool
109
+ run: |
110
+ cd tests/workers/rollout
111
+ torchrun --nnodes=1 --nproc_per_node=2 $(which pytest) -s test_sglang_async_rollout_w_tools.py
112
+ - name: Test the latest SGLang Rollout async with sandbox fusion tool
113
+ run: |
114
+ cd tests/workers/rollout
115
+ pytest -s test_sglang_async_rollout_sf_tools.py
116
+ - name: Test the latest SGLang Rollout async with search tool
117
+ run: |
118
+ cd tests/workers/rollout
119
+ pytest -s test_sglang_async_rollout_search_tools.py
120
+ - name: Test the latest SGLang Rollout async with mcp search tool
121
+ run: |
122
+ cd tests/workers/rollout
123
+ pytest -s test_sglang_async_rollout_mcp_tools.py
124
+ # Note(haibin.lin): for any new test, please update gpu_unit_tests.yaml to avoid repeated tests
.github/workflows/type-coverage-check.yml ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Type Annotation and Docstring Coverage
2
+
3
+ on:
4
+ pull_request:
5
+ paths:
6
+ - '**/*.py'
7
+
8
+ jobs:
9
+ type-coverage-check:
10
+ runs-on: ubuntu-latest
11
+ steps:
12
+ - uses: actions/checkout@v4
13
+ with:
14
+ fetch-depth: 0 # 🚨 Important: fetch full history so `origin/main` is available
15
+ - name: Set up Python
16
+ uses: actions/setup-python@v5
17
+ with:
18
+ python-version: '3.10'
19
+
20
+ - name: Install dependencies
21
+ run: |
22
+ pip install gitpython
23
+ pip install -e .[sglang]
24
+ - name: Run type annotation coverage check
25
+ run: |
26
+ python3 tests/special_sanity/type_coverage_check.py
27
+ - name: Run docstring coverage check
28
+ run: |
29
+ python3 tests/special_sanity/check_api_docs.py verl
.github/workflows/vllm.yml ADDED
@@ -0,0 +1,131 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # # Tests layout
2
+
3
+ # Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
4
+ # - `tests/trainer` for testing functionality related to `verl/trainer`
5
+ # - `tests/models` for testing functionality related to `verl/models`
6
+ # - ...
7
+
8
+ # There are a few folders with `special_` prefix, created for special purposes:
9
+ # - `special_distributed`: unit tests that must run with multiple GPUs
10
+ # - `special_e2e`: end-to-end tests with training/generation scripts
11
+ # - `special_npu`: tests for NPUs
12
+ # - `special_sanity`: a suite of quick sanity tests
13
+ # - `special_standalone`: a set of test that are designed to run in dedicated environments
14
+
15
+ # Accelerators for tests
16
+ # - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
17
+ # - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
18
+
19
+ # # Workflow layout
20
+
21
+ # All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
22
+ # 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
23
+ # 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
24
+ # 3. End-to-end tests: `e2e_*.yml`
25
+ # 4. Unit tests
26
+ # - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
27
+ # - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
28
+ # - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
29
+ # - new workflow yaml is added to `.github/workflows`
30
+ # - new tests are added to workflow mentioned in 2.
31
+
32
+
33
+ name: vllm
34
+
35
+ on:
36
+ # Trigger the workflow on push or pull request,
37
+ # but only for the main branch
38
+ push:
39
+ branches:
40
+ - main
41
+ - v0.*
42
+ pull_request:
43
+ branches:
44
+ - main
45
+ - v0.*
46
+ paths:
47
+ - "**/*.py"
48
+ # Other entrypoints
49
+ - "!examples/**"
50
+ - "!tests/**"
51
+ - "!verl/trainer/main_*.py"
52
+ - "!verl/trainer/fsdp_sft_trainer.py"
53
+ # Recipes
54
+ - "!recipe/**"
55
+ # FSDP
56
+ - "!verl/workers/**/*dp_*.py"
57
+ # Megatron
58
+ - "!verl/workers/**/megatron_*.py"
59
+ # SGLang
60
+ - "!**/*sglang*"
61
+ # Entrypoints
62
+ - ".github/workflows/vllm.yml"
63
+ - "tests/special_e2e/generation"
64
+ - "tests/workers/rollout"
65
+ - "verl/trainer/main_generation.py"
66
+ - "verl/trainer/config/generation.yaml"
67
+
68
+ # Cancel jobs on the same ref if a new one is triggered
69
+ concurrency:
70
+ group: ${{ github.workflow }}-${{ github.ref }}
71
+ cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
72
+
73
+ # Declare permissions just read content.
74
+ permissions:
75
+ contents: read
76
+
77
+ jobs:
78
+ vllm:
79
+ runs-on: [L20x8]
80
+ timeout-minutes: 60 # Increase this timeout value as needed
81
+ env:
82
+ HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
83
+ HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
84
+ NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
85
+ HF_ENDPOINT: "https://hf-mirror.com"
86
+ HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
87
+ container:
88
+ image: whatcanyousee/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te2.0-megatron0.11.0-v0.0.6
89
+ options: --gpus all --shm-size=10g
90
+ steps:
91
+ - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
92
+ with:
93
+ fetch-depth: 0
94
+ - name: Install the current repository
95
+ run: |
96
+ pip3 install -e .[test]
97
+ pip3 install vllm==0.5.4
98
+ - name: Download Model to Use
99
+ run: |
100
+ huggingface-cli download Qwen/Qwen2.5-0.5B-Instruct
101
+ huggingface-cli download Qwen/Qwen2.5-1.5B-Instruct
102
+ huggingface-cli download 'Qwen/Qwen2-7B-Instruct'
103
+ huggingface-cli download 'deepseek-ai/deepseek-llm-7b-chat'
104
+ export HF_HUB_OFFLINE=1
105
+ # Disable requests to avoid network errors
106
+ - name: Running vllm tests on 8 L20 GPUs
107
+ run: |
108
+ cd tests/workers/rollout/rollout_vllm
109
+ torchrun --standalone --nnodes=1 --nproc_per_node=8 $(which pytest) -s test_vllm_hf_loader.py
110
+ - name: Test the latest vLLM
111
+ run: |
112
+ pip3 install --upgrade vllm==0.7.3
113
+ cd tests/workers/rollout/rollout_vllm
114
+ torchrun --standalone --nnodes=1 --nproc_per_node=4 $(which pytest) -s test_vllm_spmd.py
115
+ - name: Run Qwen 0.5B generation test
116
+ run: |
117
+ cd tests/special_e2e/generation
118
+ export OUTPUT_PATH="${HOME}/data/gen/qwen_05_gen_test.parquet"
119
+ MODEL_ID=Qwen/Qwen2.5-0.5B-Instruct NGPUS_PER_NODE=4 GEN_TP=2 bash ./run_gen_qwen05.sh
120
+ rm -rf "${OUTPUT_PATH}"
121
+ - name: Run Qwen 0.5B generation test when world_size == 1
122
+ run: |
123
+ cd tests/special_e2e/generation
124
+ export OUTPUT_PATH="${HOME}/data/gen/qwen_05_gen_test.parquet"
125
+ MODEL_ID=Qwen/Qwen2.5-0.5B-Instruct NGPUS_PER_NODE=1 GEN_TP=1 bash ./run_gen_qwen05.sh
126
+ rm -rf "${OUTPUT_PATH}"
127
+ - name: Running multi-turn rollout tests on 8 L20 GPUs
128
+ run: |
129
+ pip3 install --upgrade vllm==0.8.3 tensordict==0.7.2
130
+ pytest -svvv tests/workers/rollout/rollout_vllm/test_vllm_chat_scheduler.py
131
+ # Note(haibin.lin): for any new test, please update gpu_unit_tests.yaml to avoid repeated tests
.gitignore ADDED
@@ -0,0 +1,126 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ **/*.pt
3
+ **/checkpoints
4
+ **/wget-log
5
+ **/_build/
6
+ **/*.ckpt
7
+ **/outputs
8
+ **/*.tar.gz
9
+ **/playground
10
+ **/wandb
11
+
12
+ # Byte-compiled / optimized / DLL files
13
+ __pycache__/
14
+ *.py[cod]
15
+ *$py.class
16
+ dataset/*
17
+ tensorflow/my_graph/*
18
+ .idea/
19
+ # C extensions
20
+ *.so
21
+
22
+ # Distribution / packaging
23
+ .Python
24
+ env/
25
+ build/
26
+ develop-eggs/
27
+ dist/
28
+ downloads/
29
+ eggs/
30
+ .eggs/
31
+ lib/
32
+ lib64/
33
+ parts/
34
+ sdist/
35
+ var/
36
+ tmp/
37
+ *.egg-info/
38
+ .installed.cfg
39
+ *.egg
40
+
41
+ # PyInstaller
42
+ # Usually these files are written by a python script from a template
43
+ # before PyInstaller builds the exe, so as to inject date/other infos into it.
44
+ *.manifest
45
+ *.spec
46
+
47
+ # Installer logs
48
+ pip-log.txt
49
+ pip-delete-this-directory.txt
50
+
51
+ # Unit test / coverage reports
52
+ htmlcov/
53
+ .tox/
54
+ .coverage
55
+ .coverage.*
56
+ .cache
57
+ nosetests.xml
58
+ coverage.xml
59
+ *,cover
60
+ .hypothesis/
61
+
62
+ # Translations
63
+ *.mo
64
+ *.pot
65
+
66
+ # Django stuff:
67
+ *.log
68
+ local_settings.py
69
+
70
+ # Flask stuff:
71
+ instance/
72
+ .webassets-cache
73
+
74
+ # Scrapy stuff:
75
+ .scrapy
76
+
77
+ # Sphinx documentation
78
+ docs/_build/
79
+
80
+ # PyBuilder
81
+ target/
82
+
83
+ # IPython Notebook
84
+ .ipynb_checkpoints
85
+
86
+ # pyenv
87
+ .python-version
88
+
89
+ # celery beat schedule file
90
+ celerybeat-schedule
91
+
92
+ # dotenv
93
+ .env
94
+
95
+ # virtualenv
96
+ venv/
97
+ .venv/
98
+ ENV/
99
+
100
+ # Spyder project settings
101
+ .spyderproject
102
+
103
+ # Rope project settings
104
+ .ropeproject
105
+
106
+ # vscode
107
+ .vscode
108
+
109
+ # Mac
110
+ .DS_Store
111
+
112
+ # vim
113
+ *.swp
114
+
115
+ # ckpt
116
+ *.lock
117
+
118
+ # data
119
+ *.parquet
120
+
121
+
122
+ # local logs
123
+ logs
124
+ log
125
+ outputs
126
+ .history
.pre-commit-config.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ repos:
2
+ - repo: https://github.com/astral-sh/ruff-pre-commit
3
+ rev: "v0.11.4"
4
+ hooks:
5
+ - id: ruff
6
+ args: ["--fix", "--show-fixes", "--output-format=full"]
7
+ exclude: ^.*\.(ipynb)$
8
+ - id: ruff-format
.readthedocs.yaml ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Read the Docs configuration file
2
+ # See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
3
+
4
+ version: 2
5
+
6
+ build:
7
+ os: ubuntu-22.04
8
+ tools:
9
+ python: "3.11"
10
+ rust: "1.70"
11
+
12
+ sphinx:
13
+ configuration: docs/conf.py
14
+
15
+ python:
16
+ install:
17
+ - requirements: docs/requirements-docs.txt
18
+ - method: pip
19
+ path: .
LICENSE ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ Apache License
3
+ Version 2.0, January 2004
4
+ http://www.apache.org/licenses/
5
+
6
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
7
+
8
+ 1. Definitions.
9
+
10
+ "License" shall mean the terms and conditions for use, reproduction,
11
+ and distribution as defined by Sections 1 through 9 of this document.
12
+
13
+ "Licensor" shall mean the copyright owner or entity authorized by
14
+ the copyright owner that is granting the License.
15
+
16
+ "Legal Entity" shall mean the union of the acting entity and all
17
+ other entities that control, are controlled by, or are under common
18
+ control with that entity. For the purposes of this definition,
19
+ "control" means (i) the power, direct or indirect, to cause the
20
+ direction or management of such entity, whether by contract or
21
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
22
+ outstanding shares, or (iii) beneficial ownership of such entity.
23
+
24
+ "You" (or "Your") shall mean an individual or Legal Entity
25
+ exercising permissions granted by this License.
26
+
27
+ "Source" form shall mean the preferred form for making modifications,
28
+ including but not limited to software source code, documentation
29
+ source, and configuration files.
30
+
31
+ "Object" form shall mean any form resulting from mechanical
32
+ transformation or translation of a Source form, including but
33
+ not limited to compiled object code, generated documentation,
34
+ and conversions to other media types.
35
+
36
+ "Work" shall mean the work of authorship, whether in Source or
37
+ Object form, made available under the License, as indicated by a
38
+ copyright notice that is included in or attached to the work
39
+ (an example is provided in the Appendix below).
40
+
41
+ "Derivative Works" shall mean any work, whether in Source or Object
42
+ form, that is based on (or derived from) the Work and for which the
43
+ editorial revisions, annotations, elaborations, or other modifications
44
+ represent, as a whole, an original work of authorship. For the purposes
45
+ of this License, Derivative Works shall not include works that remain
46
+ separable from, or merely link (or bind by name) to the interfaces of,
47
+ the Work and Derivative Works thereof.
48
+
49
+ "Contribution" shall mean any work of authorship, including
50
+ the original version of the Work and any modifications or additions
51
+ to that Work or Derivative Works thereof, that is intentionally
52
+ submitted to Licensor for inclusion in the Work by the copyright owner
53
+ or by an individual or Legal Entity authorized to submit on behalf of
54
+ the copyright owner. For the purposes of this definition, "submitted"
55
+ means any form of electronic, verbal, or written communication sent
56
+ to the Licensor or its representatives, including but not limited to
57
+ communication on electronic mailing lists, source code control systems,
58
+ and issue tracking systems that are managed by, or on behalf of, the
59
+ Licensor for the purpose of discussing and improving the Work, but
60
+ excluding communication that is conspicuously marked or otherwise
61
+ designated in writing by the copyright owner as "Not a Contribution."
62
+
63
+ "Contributor" shall mean Licensor and any individual or Legal Entity
64
+ on behalf of whom a Contribution has been received by Licensor and
65
+ subsequently incorporated within the Work.
66
+
67
+ 2. Grant of Copyright License. Subject to the terms and conditions of
68
+ this License, each Contributor hereby grants to You a perpetual,
69
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
70
+ copyright license to reproduce, prepare Derivative Works of,
71
+ publicly display, publicly perform, sublicense, and distribute the
72
+ Work and such Derivative Works in Source or Object form.
73
+
74
+ 3. Grant of Patent License. Subject to the terms and conditions of
75
+ this License, each Contributor hereby grants to You a perpetual,
76
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
77
+ (except as stated in this section) patent license to make, have made,
78
+ use, offer to sell, sell, import, and otherwise transfer the Work,
79
+ where such license applies only to those patent claims licensable
80
+ by such Contributor that are necessarily infringed by their
81
+ Contribution(s) alone or by combination of their Contribution(s)
82
+ with the Work to which such Contribution(s) was submitted. If You
83
+ institute patent litigation against any entity (including a
84
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
85
+ or a Contribution incorporated within the Work constitutes direct
86
+ or contributory patent infringement, then any patent licenses
87
+ granted to You under this License for that Work shall terminate
88
+ as of the date such litigation is filed.
89
+
90
+ 4. Redistribution. You may reproduce and distribute copies of the
91
+ Work or Derivative Works thereof in any medium, with or without
92
+ modifications, and in Source or Object form, provided that You
93
+ meet the following conditions:
94
+
95
+ (a) You must give any other recipients of the Work or
96
+ Derivative Works a copy of this License; and
97
+
98
+ (b) You must cause any modified files to carry prominent notices
99
+ stating that You changed the files; and
100
+
101
+ (c) You must retain, in the Source form of any Derivative Works
102
+ that You distribute, all copyright, patent, trademark, and
103
+ attribution notices from the Source form of the Work,
104
+ excluding those notices that do not pertain to any part of
105
+ the Derivative Works; and
106
+
107
+ (d) If the Work includes a "NOTICE" text file as part of its
108
+ distribution, then any Derivative Works that You distribute must
109
+ include a readable copy of the attribution notices contained
110
+ within such NOTICE file, excluding those notices that do not
111
+ pertain to any part of the Derivative Works, in at least one
112
+ of the following places: within a NOTICE text file distributed
113
+ as part of the Derivative Works; within the Source form or
114
+ documentation, if provided along with the Derivative Works; or,
115
+ within a display generated by the Derivative Works, if and
116
+ wherever such third-party notices normally appear. The contents
117
+ of the NOTICE file are for informational purposes only and
118
+ do not modify the License. You may add Your own attribution
119
+ notices within Derivative Works that You distribute, alongside
120
+ or as an addendum to the NOTICE text from the Work, provided
121
+ that such additional attribution notices cannot be construed
122
+ as modifying the License.
123
+
124
+ You may add Your own copyright statement to Your modifications and
125
+ may provide additional or different license terms and conditions
126
+ for use, reproduction, or distribution of Your modifications, or
127
+ for any such Derivative Works as a whole, provided Your use,
128
+ reproduction, and distribution of the Work otherwise complies with
129
+ the conditions stated in this License.
130
+
131
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
132
+ any Contribution intentionally submitted for inclusion in the Work
133
+ by You to the Licensor shall be under the terms and conditions of
134
+ this License, without any additional terms or conditions.
135
+ Notwithstanding the above, nothing herein shall supersede or modify
136
+ the terms of any separate license agreement you may have executed
137
+ with Licensor regarding such Contributions.
138
+
139
+ 6. Trademarks. This License does not grant permission to use the trade
140
+ names, trademarks, service marks, or product names of the Licensor,
141
+ except as required for reasonable and customary use in describing the
142
+ origin of the Work and reproducing the content of the NOTICE file.
143
+
144
+ 7. Disclaimer of Warranty. Unless required by applicable law or
145
+ agreed to in writing, Licensor provides the Work (and each
146
+ Contributor provides its Contributions) on an "AS IS" BASIS,
147
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148
+ implied, including, without limitation, any warranties or conditions
149
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150
+ PARTICULAR PURPOSE. You are solely responsible for determining the
151
+ appropriateness of using or redistributing the Work and assume any
152
+ risks associated with Your exercise of permissions under this License.
153
+
154
+ 8. Limitation of Liability. In no event and under no legal theory,
155
+ whether in tort (including negligence), contract, or otherwise,
156
+ unless required by applicable law (such as deliberate and grossly
157
+ negligent acts) or agreed to in writing, shall any Contributor be
158
+ liable to You for damages, including any direct, indirect, special,
159
+ incidental, or consequential damages of any character arising as a
160
+ result of this License or out of the use or inability to use the
161
+ Work (including but not limited to damages for loss of goodwill,
162
+ work stoppage, computer failure or malfunction, or any and all
163
+ other commercial damages or losses), even if such Contributor
164
+ has been advised of the possibility of such damages.
165
+
166
+ 9. Accepting Warranty or Additional Liability. While redistributing
167
+ the Work or Derivative Works thereof, You may choose to offer,
168
+ and charge a fee for, acceptance of support, warranty, indemnity,
169
+ or other liability obligations and/or rights consistent with this
170
+ License. However, in accepting such obligations, You may act only
171
+ on Your own behalf and on Your sole responsibility, not on behalf
172
+ of any other Contributor, and only if You agree to indemnify,
173
+ defend, and hold each Contributor harmless for any liability
174
+ incurred by, or claims asserted against, such Contributor by reason
175
+ of your accepting any such warranty or additional liability.
176
+
177
+ END OF TERMS AND CONDITIONS
178
+
179
+ APPENDIX: How to apply the Apache License to your work.
180
+
181
+ To apply the Apache License to your work, attach the following
182
+ boilerplate notice, with the fields enclosed by brackets "[]"
183
+ replaced with your own identifying information. (Don't include
184
+ the brackets!) The text should be enclosed in the appropriate
185
+ comment syntax for the file format. We also recommend that a
186
+ file or class name and description of purpose be included on the
187
+ same "printed page" as the copyright notice for easier
188
+ identification within third-party archives.
189
+
190
+ Copyright [yyyy] [name of copyright owner]
191
+
192
+ Licensed under the Apache License, Version 2.0 (the "License");
193
+ you may not use this file except in compliance with the License.
194
+ You may obtain a copy of the License at
195
+
196
+ http://www.apache.org/licenses/LICENSE-2.0
197
+
198
+ Unless required by applicable law or agreed to in writing, software
199
+ distributed under the License is distributed on an "AS IS" BASIS,
200
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201
+ See the License for the specific language governing permissions and
202
+ limitations under the License.
Notice.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ Copyright 2023-2024 Bytedance Ltd. and/or its affiliates
README.md ADDED
@@ -0,0 +1,269 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center">
2
+ 👋 Hi, everyone!
3
+ verl is a RL training library initiated by <b>ByteDance Seed team</b> and maintained by the verl community.
4
+ <br>
5
+ <br>
6
+ </div>
7
+
8
+ <div align="center">
9
+
10
+ [<img src="https://devin.ai/assets/deepwiki-badge.png" alt="Ask DeepWiki.com" height="20"/>](https://deepwiki.com/volcengine/verl)
11
+ [![GitHub Repo stars](https://img.shields.io/github/stars/volcengine/verl)](https://github.com/volcengine/verl/stargazers)
12
+ [![Twitter](https://img.shields.io/twitter/follow/verl_project)](https://twitter.com/verl_project)
13
+ <a href="https://join.slack.com/t/verlgroup/shared_invite/zt-2w5p9o4c3-yy0x2Q56s_VlGLsJ93A6vA"><img src="https://img.shields.io/badge/Slack-verl-blueviolet?logo=slack&amp"></a>
14
+ <a href="https://arxiv.org/pdf/2409.19256"><img src="https://img.shields.io/static/v1?label=EuroSys&message=Paper&color=red"></a>
15
+ [![Documentation](https://img.shields.io/badge/documentation-blue)](https://verl.readthedocs.io/en/latest/)
16
+ <a href="https://raw.githubusercontent.com/eric-haibin-lin/verl-community/refs/heads/main/WeChat.JPG"><img src="https://img.shields.io/badge/微信-green?logo=wechat&amp"></a>
17
+
18
+ </div>
19
+
20
+ ![seed logo](https://github.com/user-attachments/assets/c42e675e-497c-4508-8bb9-093ad4d1f216)
21
+
22
+ <h1 style="text-align: center;">verl: Volcano Engine Reinforcement Learning for LLMs</h1>
23
+
24
+ verl is a flexible, efficient and production-ready RL training library for large language models (LLMs).
25
+
26
+ verl is the open-source version of **[HybridFlow: A Flexible and Efficient RLHF Framework](https://arxiv.org/abs/2409.19256v2)** paper.
27
+
28
+ verl is flexible and easy to use with:
29
+
30
+ - **Easy extension of diverse RL algorithms**: The hybrid-controller programming model enables flexible representation and efficient execution of complex post-training dataflows. Build RL dataflows such as GRPO, PPO in a few lines of code.
31
+
32
+ - **Seamless integration of existing LLM infra with modular APIs**: Decouples computation and data dependencies, enabling seamless integration with existing LLM frameworks, such as FSDP, Megatron-LM, vLLM, SGLang, etc
33
+
34
+ - **Flexible device mapping**: Supports various placement of models onto different sets of GPUs for efficient resource utilization and scalability across different cluster sizes.
35
+
36
+ - Ready integration with popular HuggingFace models
37
+
38
+ verl is fast with:
39
+
40
+ - **State-of-the-art throughput**: SOTA LLM training and inference engine integrations and SOTA RL throughput.
41
+
42
+ - **Efficient actor model resharding with 3D-HybridEngine**: Eliminates memory redundancy and significantly reduces communication overhead during transitions between training and generation phases.
43
+
44
+ </p>
45
+
46
+ ## News
47
+
48
+ - [2025/06] verl with Megatron backend enables large MoE models such as [DeepSeek-671b and Qwen3-236b](https://verl.readthedocs.io/en/latest/perf/dpsk.html).
49
+ - [2025/06] verl team will provide latest project updates at [PyTorch Day China](https://www.lfasiallc.com/pytorch-day-china/) on June 7th. Meet our dev team in Beijing!
50
+ - [2025/05] [PF-PPO](https://arxiv.org/abs/2409.06957), accepted to ICML 2025, is now supported in verl! PF-PPO enhances policy learning efficiency and robustness by filtering potentially noisy reward signals and reusing high-quality experiences via a replay buffer.
51
+ - [2025/04] We will give a tutorial about latest post-training techniques and programming guide for verl at [ICLR 2025 Expo](https://iclr.cc/virtual/2025/calendar?filter_events=Expo+Talk+Panel&filter_rooms=), [SCI-FM workshop](https://open-foundation-model.github.io/) and [LMSys afterparty](https://lu.ma/d23nyynm). Talk materials available [here](https://github.com/eric-haibin-lin/verl-community/tree/main/iclr25).
52
+ - [2025/04] [Seed-Thinking-v1.5](https://github.com/ByteDance-Seed/Seed-Thinking-v1.5/blob/main/seed-thinking-v1.5.pdf) tech report is released! Trained with verl, Seed-Thinking-v1.5 achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding. Beyond reasoning tasks, the method demonstrates notable generalization across diverse domains.
53
+ - [2025/04] [VAPO](https://arxiv.org/pdf/2504.05118) (value-based augmented PPO) paper covers our latest RL method for reasoning models. Trained from Qwen-32B-base model, VAPO achieves 60.4 on AIME 2024, outperforming DAPO-32B.
54
+ - [2025/03] verl v0.3.0.post1 is released! See [release note](https://github.com/volcengine/verl/releases/) for details. It achieves [~1.4x speedup](https://tongyx361.github.io/blogs/posts/verl-intro/#/verl-flexible-and-efficient-rl-for-llms) compared to prev versions.
55
+ - [2025/03] [DAPO](https://dapo-sia.github.io/) is the open-sourced SOTA RL algorithm that achieves 50 points on AIME 2024 based on the Qwen2.5-32B pre-trained model, surpassing the previous SOTA achieved by DeepSeek's GRPO (DeepSeek-R1-Zero-Qwen-32B). DAPO's training is fully powered by verl and the reproduction code is available in `recipe/dapo` now.
56
+ <details><summary> more... </summary>
57
+ <ul>
58
+ <li>[2025/05] verl will be presented at [A2M Shanghai](https://a2m.msup.com.cn/home/?aid=4488&city=shanghai) on 5/16 - 5/17.</li>
59
+ <li>[2025/05] verl will be presented at [GOSIM x PyTorch Day 2025](https://paris2025.gosim.org/). See you in Paris! </li>
60
+ <li>[2025/03] We introduced the programming model of verl at the [vLLM Beijing Meetup](https://mp.weixin.qq.com/s/n77GibL2corAtQHtVEAzfg) and [verl intro and updates](https://github.com/eric-haibin-lin/verl-community/blob/main/slides/verl-lmsys-meetup.pdf) at the [SGLang-LMSYS Org Meetup](https://lu.ma/ntjrr7ig) in Sunnyvale mid-March.</li>
61
+ <li>[2025/03] We will present verl(HybridFlow) at EuroSys 2025. See you in Rotterdam!</li>
62
+ <li>[2025/02] verl v0.2.0.post2 is released!</li>
63
+ <li>[2025/02] We presented verl in the <a href="https://lu.ma/ji7atxux">Bytedance/NVIDIA/Anyscale Ray Meetup</a>. See you in San Jose!</li>
64
+ <li>[2025/01] [Doubao-1.5-pro](https://team.doubao.com/zh/special/doubao_1_5_pro) is released with SOTA-level performance on LLM & VLM. The RL scaling preview model is trained using verl, reaching OpenAI O1-level performance on math benchmarks (70.0 pass@1 on AIME).</li>
65
+ <li>[2024/12] verl is presented at Ray Forward 2024. Slides available <a href="https://github.com/eric-haibin-lin/verl-community/blob/main/slides/Ray_Forward_2024_%E5%B7%AB%E9%94%A1%E6%96%8C.pdf">here</a></li>
66
+ <li>[2024/12] The team presented <a href="https://neurips.cc/Expo/Conferences/2024/workshop/100677">Post-training LLMs: From Algorithms to Infrastructure</a> at NeurIPS 2024. <a href="https://github.com/eric-haibin-lin/verl-data/tree/neurips">Slides</a> and <a href="https://neurips.cc/Expo/Conferences/2024/workshop/100677">video</a> available.</li>
67
+ <li>[2024/10] verl is presented at Ray Summit. <a href="https://www.youtube.com/watch?v=MrhMcXkXvJU&list=PLzTswPQNepXntmT8jr9WaNfqQ60QwW7-U&index=37">Youtube video</a> available.</li>
68
+ <li>[2024/08] HybridFlow (verl) is accepted to EuroSys 2025.</li>
69
+ </ul>
70
+ </details>
71
+
72
+ ## Key Features
73
+
74
+ - **FSDP**, **FSDP2** and **Megatron-LM** for training.
75
+ - **vLLM**, **SGLang** and **HF Transformers** for rollout generation.
76
+ - Compatible with Hugging Face Transformers and Modelscope Hub: [Qwen-3](https://github.com/volcengine/verl/blob/main/examples/grpo_trainer/run_qwen3-8b.sh), Qwen-2.5, Llama3.1, Gemma2, DeepSeek-LLM, etc
77
+ - Supervised fine-tuning.
78
+ - Reinforcement learning with [PPO](examples/ppo_trainer/), [GRPO](examples/grpo_trainer/), [ReMax](examples/remax_trainer/), [REINFORCE++](https://verl.readthedocs.io/en/latest/examples/config.html#algorithm), [RLOO](examples/rloo_trainer/), [PRIME](recipe/prime/), [DAPO](recipe/dapo/), [DrGRPO](recipe/drgrpo), [KL_Cov & Clip_Cov](recipe/entropy) etc.
79
+ - Support model-based reward and function-based reward (verifiable reward) for math, [coding](https://github.com/volcengine/verl/tree/main/recipe/dapo), etc
80
+ - Support vision-language models (VLMs) and [multi-modal RL](examples/grpo_trainer/run_qwen2_5_vl-7b.sh) with Qwen2.5-vl, Kimi-VL
81
+ - [Multi-turn with tool calling](https://github.com/volcengine/verl/tree/main/examples/sglang_multiturn)
82
+ - LLM alignment recipes such as [Self-play preference optimization (SPPO)](https://github.com/volcengine/verl/tree/main/recipe/sppo)
83
+ - Flash attention 2, [sequence packing](examples/ppo_trainer/run_qwen2-7b_seq_balance.sh), [sequence parallelism](examples/ppo_trainer/run_deepseek7b_llm_sp2.sh) support via DeepSpeed Ulysses, [LoRA](examples/sft/gsm8k/run_qwen_05_peft.sh), [Liger-kernel](examples/sft/gsm8k/run_qwen_05_sp2_liger.sh).
84
+ - Scales up to 671B models and hundreds of GPUs with [expert parallelism](https://github.com/volcengine/verl/pull/1467)
85
+ - Multi-gpu [LoRA RL](https://verl.readthedocs.io/en/latest/advance/ppo_lora.html) support to save memory.
86
+ - Experiment tracking with wandb, swanlab, mlflow and tensorboard.
87
+
88
+ ## Upcoming Features and Changes
89
+
90
+ - Roadmap https://github.com/volcengine/verl/issues/710
91
+ - DeepSeek 671b optimizations with Megatron v0.11 https://github.com/volcengine/verl/issues/708
92
+ - Multi-turn rollout and tools using optimizations https://github.com/volcengine/verl/issues/1882
93
+ - Environment interactions https://github.com/volcengine/verl/issues/1172
94
+ - List of breaking changes since v0.3 https://github.com/volcengine/verl/discussions/943, entropy_coeff defaults to 0
95
+ - Lora for RL https://github.com/volcengine/verl/pull/1127
96
+
97
+ ## Getting Started
98
+
99
+ <a href="https://verl.readthedocs.io/en/latest/index.html"><b>Documentation</b></a>
100
+
101
+ **Quickstart:**
102
+
103
+ - [Installation](https://verl.readthedocs.io/en/latest/start/install.html)
104
+ - [Quickstart](https://verl.readthedocs.io/en/latest/start/quickstart.html)
105
+ - [Programming Guide](https://verl.readthedocs.io/en/latest/hybrid_flow.html)
106
+ - [PPO in verl](https://verl.readthedocs.io/en/latest/algo/ppo.html)
107
+ - [GRPO in verl](https://verl.readthedocs.io/en/latest/algo/grpo.html)
108
+
109
+ **Running a PPO example step-by-step:**
110
+
111
+
112
+ - [Prepare Data for Post-Training](https://verl.readthedocs.io/en/latest/preparation/prepare_data.html)
113
+ - [Implement Reward Function for Dataset](https://verl.readthedocs.io/en/latest/preparation/reward_function.html)
114
+ - [PPO Example Architecture](https://verl.readthedocs.io/en/latest/examples/ppo_code_architecture.html)
115
+ - [Config Explanation](https://verl.readthedocs.io/en/latest/examples/config.html)
116
+
117
+ **Reproducible algorithm baselines:**
118
+
119
+ - [RL performance on coding, math](https://verl.readthedocs.io/en/latest/algo/baseline.html)
120
+
121
+ **For code explanation and advance usage (extension):**
122
+
123
+ - PPO Trainer and Workers
124
+ - [PPO Ray Trainer](https://verl.readthedocs.io/en/latest/workers/ray_trainer.html)
125
+ - [PyTorch FSDP Backend](https://verl.readthedocs.io/en/latest/workers/fsdp_workers.html)
126
+ - [Megatron-LM Backend](https://verl.readthedocs.io/en/latest/index.html)
127
+
128
+ - Advanced Usage and Extension
129
+ - [Add Models with the FSDP Backend](https://verl.readthedocs.io/en/latest/advance/fsdp_extension.html)
130
+ - [Add Models with the Megatron-LM Backend](https://verl.readthedocs.io/en/latest/advance/megatron_extension.html)
131
+ - [Multi-turn Rollout Support](https://verl.readthedocs.io/en/latest/sglang_multiturn/multiturn.html)
132
+ - [Search Tool Integration](https://verl.readthedocs.io/en/latest/sglang_multiturn/search_tool_example.html)
133
+ - [Sandbox Fusion Integration](https://verl.readthedocs.io/en/latest/examples/sandbox_fusion_example.html)
134
+ - [Deployment using Separate GPU Resources](https://github.com/volcengine/verl/tree/main/examples/split_placement)
135
+ - [Extend to Other RL(HF) algorithms](https://verl.readthedocs.io/en/latest/advance/dpo_extension.html)
136
+ - [Ray API design tutorial](https://verl.readthedocs.io/en/latest/advance/placement.html)
137
+
138
+ **Blogs from the community**
139
+
140
+ - [SGLang, verl, OpenBMB and Tsinghua University: Pioneering End-to-End Multi-Turn RLHF](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/verl-multiturn-rollout-Release.md)
141
+ - [Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm Integration](https://rocm.blogs.amd.com/artificial-intelligence/verl-large-scale/README.html)
142
+ - [veMLP x verl :玩转强化学习训练](https://mp.weixin.qq.com/s/7nbqxk4knMGd-hQE9ls2tA)
143
+ - [使用 verl 进行 GRPO 分布式强化学习训练最佳实践](https://www.volcengine.com/docs/6459/1463942)
144
+ - [HybridFlow verl 原文浅析](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/readme.md)
145
+ - [最高提升 20 倍吞吐量!豆包大模型团队发布全新 RLHF 框架,现已开源!](https://team.doubao.com/en/blog/%E6%9C%80%E9%AB%98%E6%8F%90%E5%8D%8720%E5%80%8D%E5%90%9E%E5%90%90%E9%87%8F-%E8%B1%86%E5%8C%85%E5%A4%A7%E6%A8%A1%E5%9E%8B%E5%9B%A2%E9%98%9F%E5%8F%91%E5%B8%83%E5%85%A8%E6%96%B0-rlhf-%E6%A1%86%E6%9E%B6-%E7%8E%B0%E5%B7%B2%E5%BC%80%E6%BA%90)
146
+
147
+ ## Performance Tuning Guide
148
+
149
+ The performance is essential for on-policy RL algorithm. We have written a detailed [performance tuning guide](https://verl.readthedocs.io/en/latest/perf/perf_tuning.html) to help you optimize performance.
150
+
151
+ ## Upgrade to vLLM >= v0.8.2
152
+
153
+ verl now supports vLLM>=0.8.2 when using FSDP as the training backend. Please refer to [this document](https://github.com/volcengine/verl/blob/main/docs/README_vllm0.8.md) for the installation guide and more information. Please avoid vllm 0.7.x, which contains bugs that may lead to OOMs and unexpected errors.
154
+
155
+ ## Use Latest SGLang
156
+
157
+ SGLang is fully supported with verl, and SGLang RL Group is working extensively on building unique features, including multi-turn agentic RL, VLM RLHF, server-based RL, and partial rollout. Please refer to [this document](https://verl.readthedocs.io/en/latest/workers/sglang_worker.html) for the installation guide and more information.
158
+
159
+ ## Upgrade to FSDP2
160
+
161
+ verl is fully embracing FSDP2! FSDP2 is recommended by torch distributed team, providing better throughput and memory usage, and is composible with other features (e.g. torch.compile). To enable FSDP2, simply use verl main and set the following options:
162
+ ```
163
+ actor_rollout_ref.ref.strategy=fsdp2
164
+ actor_rollout_ref.actor.strategy=fsdp2
165
+ critic.strategy=fsdp2
166
+ reward_model.strategy=fsdp2
167
+ ```
168
+ Furthermore, FSDP2 cpu offloading is compatible with gradient accumulation. You can turn it on to save memory with `actor_rollout_ref.actor.fsdp_config.offload_policy=True`. For more details, see https://github.com/volcengine/verl/pull/1026
169
+
170
+ ## AMD Support (ROCm Kernel)
171
+
172
+ verl now supports FSDP as the training engine (Megatron support coming soon) and both integrates with vLLM and SGLang as inference engines. Please refer to [this document](https://github.com/volcengine/verl/blob/main/docs/amd_tutorial/amd_build_dockerfile_page.rst) for the installation guide and more information, and [this document](https://github.com/volcengine/verl/blob/main/docs/amd_tutorial/amd_vllm_page.rst) for the vLLM performance tuning for ROCm.
173
+
174
+
175
+ ## Citation and acknowledgement
176
+
177
+ If you find the project helpful, please cite:
178
+
179
+ - [HybridFlow: A Flexible and Efficient RLHF Framework](https://arxiv.org/abs/2409.19256v2)
180
+ - [A Framework for Training Large Language Models for Code Generation via Proximal Policy Optimization](https://i.cs.hku.hk/~cwu/papers/gmsheng-NL2Code24.pdf)
181
+
182
+ ```bibtex
183
+ @article{sheng2024hybridflow,
184
+ title = {HybridFlow: A Flexible and Efficient RLHF Framework},
185
+ author = {Guangming Sheng and Chi Zhang and Zilingfeng Ye and Xibin Wu and Wang Zhang and Ru Zhang and Yanghua Peng and Haibin Lin and Chuan Wu},
186
+ year = {2024},
187
+ journal = {arXiv preprint arXiv: 2409.19256}
188
+ }
189
+ ```
190
+
191
+ verl is inspired by the design of Nemo-Aligner, Deepspeed-chat and OpenRLHF. The project is adopted and contributed by Bytedance, Anyscale, LMSys.org, [Alibaba Qwen team](https://github.com/QwenLM/), Shanghai AI Lab, Tsinghua University, UC Berkeley, UCLA, UIUC, University of Hong Kong, ke.com, [All Hands AI](https://www.all-hands.dev/), [ModelBest](http://modelbest.cn/), OpenPipe, JD AI Lab, Microsoft Research, [StepFun](https://www.stepfun.com/), Amazon, LinkedIn, Meituan, [Camel-AI](https://www.camel-ai.org/), [OpenManus](https://github.com/OpenManus), Xiaomi, Prime Intellect, NVIDIA research, [Baichuan](https://www.baichuan-ai.com/home), [RedNote](https://www.xiaohongshu.com/), [SwissAI](https://www.swiss-ai.org/), [Moonshot AI (Kimi)](https://www.moonshot-ai.com/), Baidu, Snowflake, [IceSword Lab](https://www.iceswordlab.com), and many more.
192
+
193
+ ## Awesome work using verl
194
+
195
+ - [TinyZero](https://github.com/Jiayi-Pan/TinyZero): a reproduction of **DeepSeek R1 Zero** recipe for reasoning tasks ![GitHub Repo stars](https://img.shields.io/github/stars/Jiayi-Pan/TinyZero)
196
+ - [SkyThought](https://github.com/NovaSky-AI/SkyThought): RL training for Sky-T1-7B by NovaSky AI team. ![GitHub Repo stars](https://img.shields.io/github/stars/NovaSky-AI/SkyThought)
197
+ - [simpleRL-reason](https://github.com/hkust-nlp/simpleRL-reason): SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild ![GitHub Repo stars](https://img.shields.io/github/stars/hkust-nlp/simpleRL-reason)
198
+ - [Easy-R1](https://github.com/hiyouga/EasyR1): **Multi-modal** RL training framework ![GitHub Repo stars](https://img.shields.io/github/stars/hiyouga/EasyR1)
199
+ - [OpenManus-RL](https://github.com/OpenManus/OpenManus-RL): LLM Agents RL tunning framework for multiple agent environments. ![GitHub Repo stars](https://img.shields.io/github/stars/OpenManus/OpenManus-RL)
200
+ - [rllm](https://github.com/agentica-project/rllm): async RL training with [verl-pipeline](https://github.com/agentica-project/verl-pipeline) ![GitHub Repo stars](https://img.shields.io/github/stars/agentica-project/rllm)
201
+ - [PRIME](https://github.com/PRIME-RL/PRIME): Process reinforcement through implicit rewards ![GitHub Repo stars](https://img.shields.io/github/stars/PRIME-RL/PRIME)
202
+ - [RAGEN](https://github.com/ZihanWang314/ragen): a general-purpose reasoning **agent** training framework ![GitHub Repo stars](https://img.shields.io/github/stars/ZihanWang314/ragen)
203
+ - [Search-R1](https://github.com/PeterGriffinJin/Search-R1): RL with reasoning and **searching (tool-call)** interleaved LLMs ![GitHub Repo stars](https://img.shields.io/github/stars/PeterGriffinJin/Search-R1)
204
+ - [DeepRetrieval](https://github.com/pat-jj/DeepRetrieval): RL Training of **Search Agent** with **Search/Retrieval Outcome** ![GitHub Repo stars](https://img.shields.io/github/stars/pat-jj/DeepRetrieval)
205
+ - [ReSearch](https://github.com/Agent-RL/ReSearch): Learning to **Re**ason with **Search** for LLMs via Reinforcement Learning ![GitHub Repo stars](https://img.shields.io/github/stars/Agent-RL/ReSearch)
206
+ - [Code-R1](https://github.com/ganler/code-r1): Reproducing R1 for **Code** with Reliable Rewards ![GitHub Repo stars](https://img.shields.io/github/stars/ganler/code-r1)
207
+ - [Skywork-OR1](https://github.com/SkyworkAI/Skywork-OR1): Skywork open reaonser series ![GitHub Repo stars](https://img.shields.io/github/stars/SkyworkAI/Skywork-OR1)
208
+ - [ToRL](https://github.com/GAIR-NLP/ToRL): Scaling tool-integrated RL ![GitHub Repo stars](https://img.shields.io/github/stars/GAIR-NLP/ToRL)
209
+ - [verl-agent](https://github.com/langfengQ/verl-agent): A scalable training framework for **long-horizon LLM/VLM agents**, along with a new algorithm **GiGPO** ![GitHub Repo stars](https://img.shields.io/github/stars/langfengQ/verl-agent)
210
+ - [PF-PPO](https://arxiv.org/abs/2409.06957): Policy Filtration for PPO based on the reliability of reward signals for more efficient and robust RLHF.
211
+ - [GUI-R1](https://github.com/ritzz-ai/GUI-R1): **GUI-R1**: A Generalist R1-style Vision-Language Action Model For **GUI Agents** ![GitHub Repo stars](https://img.shields.io/github/stars/ritzz-ai/GUI-R1)
212
+ - [DeepResearcher](https://github.com/GAIR-NLP/DeepResearcher): Scaling deep research via reinforcement learning in real-world environments ![GitHub Repo stars](https://img.shields.io/github/stars/GAIR-NLP/DeepResearcher)
213
+ - [VAGEN](https://github.com/RAGEN-AI/VAGEN): Training VLM agents with multi-turn reinforcement learning ![GitHub Repo stars](https://img.shields.io/github/stars/RAGEN-AI/VAGEN)
214
+ - [ReTool](https://retool-rl.github.io/): ReTool: reinforcement learning for strategic tool use in LLMs. Code release is in progress...
215
+ - [RM-R1](https://arxiv.org/abs/2505.02387): RL training of reasoning reward models ![GitHub Repo stars](https://img.shields.io/github/stars/RM-R1-UIUC/RM-R1)
216
+ - [Absolute Zero Reasoner](https://arxiv.org/abs/2505.03335): A no human curated data self-play framework for reasoning![GitHub Repo stars](https://img.shields.io/github/stars/LeapLabTHU/Absolute-Zero-Reasoner)
217
+ - [LUFFY](https://arxiv.org/pdf/2504.14945): Learning to Reason under Off-Policy Guidance![GitHub Repo stars](https://img.shields.io/github/stars/ElliottYan/LUFFY)
218
+ - [verl-tool](https://github.com/TIGER-AI-Lab/verl-tool): An unified and easy-to-extend tool-agent training framework based on verl![GitHub Repo stars](https://img.shields.io/github/stars/TIGER-AI-Lab/verl-tool)
219
+ - [DeepMath](https://github.com/zwhe99/DeepMath): DeepMath-103K data and series models for math reasoning![GitHub Repo stars](https://img.shields.io/github/stars/zwhe99/DeepMath)
220
+ - [Entropy Mechanism of RL](https://github.com/PRIME-RL/Entropy-Mechanism-of-RL): The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning![GitHub Repo stars](https://img.shields.io/github/stars/PRIME-RL/Entropy-Mechanism-of-RL)
221
+ - [LLaSA-TTS-GRPO](https://github.com/channel-io/ch-tts-llasa-rl-grpo): TTS fine-tuning with GRPO optimization based on LLASA models ![GitHub Repo stars](https://img.shields.io/github/stars/channel-io/ch-tts-llasa-rl-grpo)
222
+ - [RL-Factory](https://github.com/Simple-Efficient/RL-Factory): An easy and efficient RL post-training framework for Agentic Learning ![GitHub Repo stars](https://img.shields.io/github/stars/Simple-Efficient/RL-Factory)
223
+ - [RACRO](https://github.com/gyhdog99/RACRO2): Build multi-modal reasoning models via decoupling it into query-conditioned captioning and text-only reasoning ![GitHub Repo stars](https://img.shields.io/github/stars/gyhdog99/RACRO2)
224
+
225
+ and many more awesome work listed in [recipe](recipe/README.md).
226
+ ## Contribution Guide
227
+
228
+ Contributions from the community are welcome! Please check out our [project roadmap](https://github.com/volcengine/verl/issues/710) and [good first issues](https://github.com/volcengine/verl/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22good%20first%20issue%22) to see where you can contribute.
229
+
230
+ ### Code Linting and Formatting
231
+
232
+ We use pre-commit to help improve code quality. To initialize pre-commit, run:
233
+
234
+ ```bash
235
+ pip install pre-commit
236
+ pre-commit install
237
+ ```
238
+
239
+ To resolve CI errors locally, you can manually run pre-commit by:
240
+
241
+ ```bash
242
+ pre-commit run
243
+ ```
244
+
245
+ ### Adding CI tests
246
+
247
+ If possible, please add CI test(s) for your new feature:
248
+
249
+ 1. Find the most relevant workflow yml file, which usually corresponds to a `hydra` default config (e.g. `ppo_trainer`, `ppo_megatron_trainer`, `sft_trainer`, etc).
250
+ 2. Add related path patterns to the `paths` section if not already included.
251
+ 3. Minimize the workload of the test script(s) (see existing scripts for examples).
252
+
253
+ ## About [ByteDance Seed Team](https://team.doubao.com/)
254
+
255
+ Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry's most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society. You can get to know Bytedance Seed better through the following channels👇
256
+ <div>
257
+ <a href="https://team.doubao.com/">
258
+ <img src="https://img.shields.io/badge/Website-%231e37ff?style=for-the-badge&logo=bytedance&logoColor=white"></a>
259
+ <a href="https://github.com/user-attachments/assets/469535a8-42f2-4797-acdf-4f7a1d4a0c3e">
260
+ <img src="https://img.shields.io/badge/WeChat-07C160?style=for-the-badge&logo=wechat&logoColor=white"></a>
261
+ <a href="https://www.xiaohongshu.com/user/profile/668e7e15000000000303157d?xsec_token=ABl2-aqekpytY6A8TuxjrwnZskU-6BsMRE_ufQQaSAvjc%3D&xsec_source=pc_search">
262
+ <img src="https://img.shields.io/badge/Xiaohongshu-%23FF2442?style=for-the-badge&logo=xiaohongshu&logoColor=white"></a>
263
+ <a href="https://www.zhihu.com/org/dou-bao-da-mo-xing-tuan-dui/">
264
+ <img src="https://img.shields.io/badge/zhihu-%230084FF?style=for-the-badge&logo=zhihu&logoColor=white"></a>
265
+
266
+ </div>
267
+ ---
268
+
269
+ We are HIRING! Send us an [email](mailto:[email protected]) if you are interested in internship/FTE opportunities in RL for agents.
docker/Apptainerfile.rocm ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Bootstrap: docker
2
+
3
+ # Support - Traing: fsdp; Inference: vllm
4
+ # FROM: rocm/vllm:rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4
5
+ # Support - Traing: fsdp; Inference: vllm, sglang
6
+ FROM lmsysorg/sglang:v0.4.5-rocm630
7
+
8
+ %environment
9
+ export PYTORCH_ROCM_ARCH="gfx90a;gfx942"
10
+
11
+ export HIPCC_COMPILE_FLAGS_APPEND="--amdgpu-target=gfx90a;gfx942 -D__HIP_PLATFORM_AMD__"
12
+ export CFLAGS="-D__HIP_PLATFORM_AMD__"
13
+ export CXXFLAGS="-D__HIP_PLATFORM_AMD__"
14
+
15
+ %post
16
+ # Create source directory
17
+ mkdir -p /opt/src
18
+
19
+ # Uninstall and reinstall vllm
20
+ pip uninstall -y vllm
21
+ cd /opt/src
22
+ git clone -b v0.6.3 https://github.com/vllm-project/vllm.git
23
+ cd vllm
24
+ MAX_JOBS=$(nproc) python3 setup.py install
25
+ cd /opt
26
+ rm -rf /opt/src/vllm
27
+
28
+ # Install dependencies
29
+ pip install "tensordict<0.6" --no-deps
30
+ pip install accelerate \
31
+ codetiming \
32
+ datasets \
33
+ dill \
34
+ hydra-core \
35
+ liger-kernel \
36
+ numpy \
37
+ pandas \
38
+ peft \
39
+ "pyarrow>=15.0.0" \
40
+ pylatexenc \
41
+ "ray[data,train,tune,serve]" \
42
+ torchdata \
43
+ transformers \
44
+ wandb \
45
+ orjson \
46
+ pybind11
47
+
48
+ # Clone and install verl from GitHub
49
+ cd /opt
50
+ git clone https://github.com/volcengine/verl.git
51
+ cd verl
52
+ # Uncomment to use a specific version
53
+ # git checkout v0.3.0.post0
54
+ pip install -e . --no-deps
55
+
56
+ # Install torch_memory_saver
57
+ pip install git+https://github.com/ExtremeViscent/torch_memory_saver.git --no-deps
docker/Dockerfile.awsefa ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6-mcore0.12.0-te2.3
2
+
3
+ # For aws instances with EFA net interface (Sagemaker AI Pod)
4
+ # install EFA driver:
5
+ ######## AWS EFA ############
6
+ ENV NCCL_VERSION=2.25.1-1
7
+ ENV DEBIAN_FRONTEND=noninteractive
8
+ ENV EFA_INSTALLER_VERSION=1.40.0
9
+ ENV AWS_OFI_NCCL_VERSION=1.14.2
10
+ ENV FI_EFA_SET_CUDA_SYNC_MEMOPS=0
11
+ ENV FI_PROVIDER=efa
12
+
13
+ RUN apt update && apt install -y linux-image-generic libhwloc-dev
14
+
15
+ RUN cd /tmp && \
16
+ curl -O https://efa-installer.amazonaws.com/aws-efa-installer-${EFA_INSTALLER_VERSION}.tar.gz && \
17
+ tar -xf aws-efa-installer-${EFA_INSTALLER_VERSION}.tar.gz && \
18
+ cd aws-efa-installer && \
19
+ ./efa_installer.sh -y -g --skip-kmod --skip-limit-conf --no-verify && \
20
+ ldconfig && \
21
+ rm -rf /tmp/aws-efa-installer /var/lib/apt/lists/*
22
+
23
+ # NCCL EFA Plugin
24
+ RUN cd /tmp && \
25
+ curl -LO https://github.com/aws/aws-ofi-nccl/archive/refs/tags/v${AWS_OFI_NCCL_VERSION}.tar.gz && \
26
+ tar -xzf /tmp/v${AWS_OFI_NCCL_VERSION}.tar.gz && \
27
+ rm /tmp/v${AWS_OFI_NCCL_VERSION}.tar.gz && \
28
+ mv aws-ofi-nccl-${AWS_OFI_NCCL_VERSION} aws-ofi-nccl && \
29
+ cd /tmp/aws-ofi-nccl && \
30
+ ./autogen.sh && \
31
+ ./configure --prefix=/opt/amazon/efa \
32
+ --with-libfabric=/opt/amazon/efa \
33
+ --with-cuda=/usr/local/cuda \
34
+ --enable-platform-aws \
35
+ --with-mpi=/opt/amazon/openmpi && \
36
+ make -j$(nproc) install && \
37
+ rm -rf /tmp/aws-ofi/nccl
38
+
39
+ # NCCL
40
+ RUN echo "/usr/local/lib" >> /etc/ld.so.conf.d/local.conf && \
41
+ echo "/opt/amazon/openmpi/lib" >> /etc/ld.so.conf.d/efa.conf && \
42
+ ldconfig
43
+
44
+ ENV OMPI_MCA_pml=^cm,ucx \
45
+ OMPI_MCA_btl=tcp,self \
46
+ OMPI_MCA_btl_tcp_if_exclude=lo,docker0,veth_def_agent \
47
+ OPAL_PREFIX=/opt/amazon/openmpi \
48
+ NCCL_SOCKET_IFNAME=^docker,lo,veth_def_agent \
49
+ FI_EFA_USE_HUGE_PAGE=0
50
+
51
+ # docker build -t whatcanyousee/verl:awsefa --label "commit=$(git rev-parse --short HEAD)" .
52
+ # on aws:
53
+ # docker run --ipc=host --privileged --name verldev --gpus all --network=host --shm-size=1800gb -itd whatcanyousee/verl:awsefa
docker/Dockerfile.ngc.vllm ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # docker buildx build --platform linux/x86_64 -t "verlai/verl:ngc-th2.4.0-cu124-vllm0.6.3-ray2.4-te1.7-v0.0.6" -f docker/Dockerfile.ngc.vllm . --builder cloud-verlai-verl-builder --progress=plain --push
2
+ FROM nvcr.io/nvidia/pytorch:24.05-py3
3
+
4
+ # uninstall nv-pytorch fork
5
+ RUN pip3 uninstall pytorch-quantization \
6
+ pytorch-triton \
7
+ torch \
8
+ torch-tensorrt \
9
+ torchvision \
10
+ xgboost transformer_engine flash_attn \
11
+ apex megatron-core -y
12
+
13
+ RUN pip3 install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu124
14
+
15
+ # =============== Megatron dependencies (optional) =================
16
+ # install apex, set MAX_JOBS to avoid OOMs
17
+ RUN MAX_JOBS=4 pip3 install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \
18
+ --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" \
19
+ git+https://github.com/NVIDIA/apex
20
+ # =============== End of Megatron dependencies (optional) =================
21
+
22
+ RUN pip3 install --no-cache-dir \
23
+ accelerate \
24
+ codetiming \
25
+ datasets \
26
+ dill \
27
+ hydra-core \
28
+ numpy \
29
+ 'pandas' \
30
+ 'peft' \
31
+ 'pyarrow>=15.0.0' \
32
+ 'pybind11' \
33
+ 'pylatexenc' \
34
+ 'ray>=2.10' \
35
+ 'tensordict<0.6' \
36
+ 'transformers' \
37
+ 'vllm==0.6.3.post1' \
38
+ 'wandb'
39
+
40
+ # full dependencies
41
+ RUN pip3 install pytest pre-commit py-spy pyext liger-kernel
42
+
43
+ # =============== Megatron dependencies (optional) =================
44
+ # install Transformer Engine, which requires FA 2.5.8. Do it in a separate step for docker cache
45
+ RUN MAX_JOBS=4 NINJA_FLAGS="-j4" pip3 install flash-attn==2.5.8 --no-cache-dir --no-build-isolation
46
+ RUN MAX_JOBS=1 NINJA_FLAGS="-j1" TE_BUILD_WITH_NINJA=0 pip3 install git+https://github.com/eric-haibin-lin/[email protected]
47
+ # =============== End of Megatron dependencies (optional) =================
docker/Dockerfile.ngc.vllm0.8 ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Start from the NVIDIA official image (ubuntu-22.04 + cuda-12.6 + python-3.10)
2
+ # https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-24-08.html
3
+ FROM nvcr.io/nvidia/pytorch:24.08-py3
4
+
5
+ # Define environments
6
+ ENV MAX_JOBS=32
7
+ ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
8
+ ENV DEBIAN_FRONTEND=noninteractive
9
+ ENV NODE_OPTIONS=""
10
+ ENV PIP_ROOT_USER_ACTION=ignore
11
+ ENV HF_HUB_ENABLE_HF_TRANSFER="1"
12
+
13
+ # Define installation arguments
14
+ ARG APT_SOURCE=https://mirrors.tuna.tsinghua.edu.cn/ubuntu/
15
+ ARG PIP_INDEX=https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
16
+
17
+ # Set apt source
18
+ RUN cp /etc/apt/sources.list /etc/apt/sources.list.bak && \
19
+ { \
20
+ echo "deb ${APT_SOURCE} jammy main restricted universe multiverse"; \
21
+ echo "deb ${APT_SOURCE} jammy-updates main restricted universe multiverse"; \
22
+ echo "deb ${APT_SOURCE} jammy-backports main restricted universe multiverse"; \
23
+ echo "deb ${APT_SOURCE} jammy-security main restricted universe multiverse"; \
24
+ } > /etc/apt/sources.list
25
+
26
+ # Install systemctl
27
+ RUN apt-get update && \
28
+ apt-get install -y -o Dpkg::Options::="--force-confdef" systemd && \
29
+ apt-get clean
30
+
31
+ # Install tini
32
+ RUN apt-get update && \
33
+ apt-get install -y tini && \
34
+ apt-get clean
35
+
36
+ # Change pip source
37
+ RUN pip config set global.index-url "${PIP_INDEX}" && \
38
+ pip config set global.extra-index-url "${PIP_INDEX}" && \
39
+ python -m pip install --upgrade pip
40
+
41
+ # Uninstall nv-pytorch fork
42
+ RUN pip uninstall -y torch torchvision torchaudio \
43
+ pytorch-quantization pytorch-triton torch-tensorrt \
44
+ xgboost transformer_engine flash_attn apex megatron-core grpcio
45
+
46
+ # Install torch-2.6.0+cu124 + vllm-0.8.3
47
+ # torch-2.6.0+cu124: cxx11abi=False
48
+ # torch-2.6.0+cu126: cxx11abi=True
49
+ # see https://github.com/flashinfer-ai/flashinfer/issues/911
50
+ RUN pip install --no-cache-dir "vllm==0.8.3" "torch==2.6.0" "torchvision==0.21.0" "torchaudio==2.6.0" "tensordict==0.6.2" torchdata \
51
+ "transformers[hf_xet]>=4.51.0" accelerate datasets peft hf-transfer \
52
+ "numpy<2.0.0" "pyarrow>=15.0.0" pandas \
53
+ ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler \
54
+ pytest py-spy pyext pre-commit ruff
55
+
56
+ # Install flash-attn-2.7.4.post1 (cxx11abi=False)
57
+ RUN wget -nv https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl && \
58
+ pip install --no-cache-dir flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
59
+
60
+ # Install flashinfer-0.2.2.post1+cu124 (cxx11abi=False)
61
+ # vllm-0.8.3 does not support flashinfer>=0.2.3
62
+ # see https://github.com/vllm-project/vllm/pull/15777
63
+ RUN wget -nv https://github.com/flashinfer-ai/flashinfer/releases/download/v0.2.2.post1/flashinfer_python-0.2.2.post1+cu124torch2.6-cp38-abi3-linux_x86_64.whl && \
64
+ pip install --no-cache-dir flashinfer_python-0.2.2.post1+cu124torch2.6-cp38-abi3-linux_x86_64.whl
65
+
66
+ # Fix packages
67
+ RUN pip uninstall -y pynvml nvidia-ml-py && \
68
+ pip install --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"
69
+
70
+ # Install verl
71
+ RUN pip install --no-cache-dir verl[vllm] -U
72
+
73
+ # Reset pip config
74
+ RUN pip config unset global.index-url && \
75
+ pip config unset global.extra-index-url
docker/Dockerfile.ngc.vllm0.8.sagemaker ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Using a pre-built image from AWS DLC which contains the current version of python (3.10) and supported cuda version (12.1)
2
+ FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-training:2.1.0-transformers4.36.0-gpu-py310-cu121-ubuntu20.04
3
+
4
+ # uninstall nv-pytorch fork
5
+ RUN pip3 uninstall -y pytorch-quantization \
6
+ pytorch-triton torch torch-tensorrt torchvision \
7
+ xgboost transformer_engine flash_attn apex megatron-core
8
+
9
+ # Define environments
10
+ ENV MAX_JOBS=32
11
+ ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
12
+ ENV DEBIAN_FRONTEND=noninteractive
13
+ ENV NODE_OPTIONS=""
14
+ ENV HF_HUB_ENABLE_HF_TRANSFER="1"
15
+
16
+ # Install systemctl
17
+ RUN apt-get update && \
18
+ apt-get install -y -o Dpkg::Options::="--force-confdef" systemd && \
19
+ apt-get clean
20
+
21
+ # Install tini
22
+ RUN apt-get update && \
23
+ apt-get install -y tini && \
24
+ apt-get clean
25
+
26
+ # Install torch-2.6.0 + vllm-0.8.2
27
+ RUN pip install --no-cache-dir vllm==0.8.2 torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 tensordict torchdata==0.11.0 \
28
+ transformers>=4.49.0 accelerate datasets peft hf-transfer \
29
+ ray[default] codetiming hydra-core pandas pyarrow>=15.0.0 pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler \
30
+ pytest pre-commit py-spy pyext ruff
31
+
32
+ # Install flash_attn-2.7.4.post1
33
+ RUN pip uninstall -y transformer-engine flash-attn && \
34
+ pip install flash-attn==2.7.4.post1 --no-build-isolation
35
+
36
+ # Fix cv2
37
+ RUN pip uninstall -y pynvml nvidia-ml-py && \
38
+ pip install --no-cache-dir nvidia-ml-py>=12.560.30 opencv-python-headless==4.8.0.74 fastapi==0.115.6 && \
39
+ pip install --no-cache-dir --upgrade optree>=0.13.0
40
+
41
+ # Install verl
42
+ RUN pip install --no-cache-dir verl[vllm] -U
43
+
44
+ # Reset pip config
45
+ RUN pip config unset global.index-url && \
46
+ pip config unset global.extra-index-url
docker/Dockerfile.rocm ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Build the docker in the repo dir:
2
+ # docker build -f docker/Dockerfile.rocm -t verl-rocm:03.04.2015 .
3
+ # docker images # you can find your built docker
4
+
5
+
6
+ # Support - Traing: fsdp; Inference: vllm
7
+ # FROM rocm/vllm:rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4
8
+ # Support - Traing: fsdp; Inference: vllm, sglang
9
+ FROM lmsysorg/sglang:v0.4.6.post5-rocm630
10
+
11
+ # Set working directory
12
+ # WORKDIR $PWD/app
13
+
14
+ # Set environment variables
15
+ ENV PYTORCH_ROCM_ARCH="gfx90a;gfx942"
16
+
17
+ ENV HIPCC_COMPILE_FLAGS_APPEND="--amdgpu-target=gfx90a;gfx942 -D__HIP_PLATFORM_AMD__"
18
+ ENV CFLAGS="-D__HIP_PLATFORM_AMD__"
19
+ ENV CXXFLAGS="-D__HIP_PLATFORM_AMD__"
20
+
21
+ # Install vllm
22
+ RUN pip uninstall -y vllm && \
23
+ rm -rf vllm && \
24
+ git clone -b v0.6.3 https://github.com/vllm-project/vllm.git && \
25
+ cd vllm && \
26
+ MAX_JOBS=$(nproc) python3 setup.py install && \
27
+ cd .. && \
28
+ rm -rf vllm
29
+
30
+ # Copy the entire project directory
31
+ COPY . .
32
+
33
+ # Install dependencies
34
+ RUN pip install "tensordict==0.6.2" --no-deps && \
35
+ pip install accelerate \
36
+ codetiming \
37
+ datasets \
38
+ dill \
39
+ hydra-core \
40
+ liger-kernel \
41
+ numpy \
42
+ pandas \
43
+ peft \
44
+ "pyarrow>=15.0.0" \
45
+ pylatexenc \
46
+ "ray[data,train,tune,serve]<2.45.0" \
47
+ torchdata \
48
+ transformers \
49
+ wandb \
50
+ orjson \
51
+ pybind11 && \
52
+ pip install -e . --no-deps
53
+
54
+ # Install torch_memory_saver
55
+ RUN pip install git+https://github.com/ExtremeViscent/torch_memory_saver.git --no-deps
docker/Dockerfile.sglang ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Start from the NVIDIA official image (ubuntu-22.04 + python-3.10)
2
+ # https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-24-08.html
3
+ FROM nvcr.io/nvidia/pytorch:24.08-py3
4
+
5
+ # Define environments
6
+ ENV MAX_JOBS=32
7
+ ENV DEBIAN_FRONTEND=noninteractive
8
+ ENV NODE_OPTIONS=""
9
+
10
+ # Define installation arguments
11
+ ARG APT_SOURCE=https://mirrors.ustc.edu.cn/ubuntu/
12
+
13
+ # Set apt source
14
+ RUN cp /etc/apt/sources.list /etc/apt/sources.list.bak && \
15
+ { \
16
+ echo "deb ${APT_SOURCE} jammy main restricted universe multiverse"; \
17
+ echo "deb ${APT_SOURCE} jammy-updates main restricted universe multiverse"; \
18
+ echo "deb ${APT_SOURCE} jammy-backports main restricted universe multiverse"; \
19
+ echo "deb ${APT_SOURCE} jammy-security main restricted universe multiverse"; \
20
+ } > /etc/apt/sources.list
21
+
22
+ # Install systemctl
23
+ RUN apt-get update && \
24
+ apt-get install -y -o Dpkg::Options::="--force-confdef" systemd && \
25
+ apt-get clean
26
+
27
+ # Install tini
28
+ RUN apt-get update && \
29
+ apt-get install -y tini && \
30
+ apt-get clean
31
+
32
+ # Change pip source
33
+ ARG PIP_INDEX=https://mirrors.aliyun.com/pypi/simple/
34
+
35
+ RUN pip config set global.index-url "${PIP_INDEX}" && \
36
+ pip config set global.extra-index-url "${PIP_INDEX}" && \
37
+ python -m pip install --upgrade pip
38
+
39
+ # Install sglang-0.4.6.post5 and torch-memory-saver
40
+ RUN pip uninstall -y cuda-python && pip install "sglang[all]==0.4.6.post5" --no-cache-dir --find-links https://flashinfer.ai/whl/cu124/torch2.6/flashinfer-python && pip install torch-memory-saver --no-cache-dir
41
+
42
+ # Install torch-2.6.0
43
+ RUN pip install --no-cache-dir torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 tensordict torchdata \
44
+ transformers>=4.49.0 accelerate datasets peft hf_transfer \
45
+ ray[default] codetiming hydra-core pandas pyarrow>=15.0.0 pylatexenc qwen-vl-utils wandb liger-kernel \
46
+ pytest pre-commit py-spy pyext
47
+
48
+ # Install flash_attn-2.7.4.post1
49
+ RUN pip uninstall -y transformer-engine flash-attn && \
50
+ wget -v https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl && \
51
+ pip install --no-cache-dir flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
52
+
53
+ # Fix cv2
54
+ RUN pip uninstall -y pynvml nvidia-ml-py && \
55
+ pip install --no-cache-dir nvidia-ml-py>=12.560.30 opencv-python-headless==4.8.0.74 fastapi==0.115.6
docker/Dockerfile.vemlp.vllm.te ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # docker buildx build --platform linux/x86_64 -t "verlai/verl:$TAG" -f docker/$FILE .
2
+
3
+ # the one in docker.io is an alias for the one veturbo
4
+ # FROM vemlp-cn-beijing.cr.volces.com/veturbo/pytorch:2.4-cu124
5
+ FROM docker.io/haibinlin/verl:v0.0.5-th2.4.0-cu124-base
6
+
7
+ # only config pip index with https://pypi.tuna.tsinghua.edu.cn/simple if needed
8
+ # unset for now
9
+ RUN pip3 config unset global.index-url
10
+
11
+ # transformers 4.47.0 contains the following bug:
12
+ # AttributeError: 'Gemma2Attention' object has no attribute '_flash_attn_uses_top_left_mask'
13
+ RUN pip3 install --no-cache-dir \
14
+ torch==2.4.0 \
15
+ accelerate \
16
+ codetiming \
17
+ dill \
18
+ hydra-core \
19
+ numpy \
20
+ pybind11 \
21
+ tensordict \
22
+ "transformers <= 4.46.0"
23
+
24
+ RUN pip3 install --no-cache-dir flash-attn==2.7.0.post2 --no-build-isolation
25
+
26
+ # vllm depends on ray
27
+ RUN pip3 install --no-cache-dir vllm==0.6.3 ray==2.10
28
+
29
+ # install apex
30
+ RUN MAX_JOBS=4 pip3 install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \
31
+ --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" \
32
+ git+https://github.com/NVIDIA/apex
33
+
34
+ # install Transformer Engine
35
+ # - flash-attn pinned to 2.5.3 by TransformerEngine, switch to eric-haibin-lin/[email protected] to relax version req
36
+ # - install with: MAX_JOBS=1 NINJA_FLAGS="-j1" TE_BUILD_WITH_NINJA=0 to avoid OOM
37
+ # - cudnn is required by TransformerEngine
38
+ # RUN CUDNN_PATH=/opt/conda/lib/python3.11/site-packages/nvidia/cudnn \
39
+ # pip3 install git+https://github.com/eric-haibin-lin/[email protected]
40
+ RUN MAX_JOBS=1 NINJA_FLAGS="-j1" pip3 install flash-attn==2.5.3 --no-cache-dir --no-build-isolation
41
+ RUN MAX_JOBS=1 NINJA_FLAGS="-j1" pip3 install git+https://github.com/NVIDIA/[email protected]
docker/Dockerfile.vllm.sglang.megatron ADDED
@@ -0,0 +1,124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Start from the NVIDIA official image (ubuntu-22.04 + cuda-12.6 + python-3.10)
2
+ # https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-24-08.html
3
+ FROM nvcr.io/nvidia/pytorch:24.08-py3
4
+
5
+ # Define environments
6
+ ENV MAX_JOBS=32
7
+ ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
8
+ ENV DEBIAN_FRONTEND=noninteractive
9
+ ENV NODE_OPTIONS=""
10
+ ENV PIP_ROOT_USER_ACTION=ignore
11
+ ENV HF_HUB_ENABLE_HF_TRANSFER="1"
12
+
13
+ # Define installation arguments
14
+ ARG APT_SOURCE=https://mirrors.tuna.tsinghua.edu.cn/ubuntu/
15
+ ARG PIP_INDEX=https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
16
+
17
+ # Set apt source
18
+ RUN cp /etc/apt/sources.list /etc/apt/sources.list.bak && \
19
+ { \
20
+ echo "deb ${APT_SOURCE} jammy main restricted universe multiverse"; \
21
+ echo "deb ${APT_SOURCE} jammy-updates main restricted universe multiverse"; \
22
+ echo "deb ${APT_SOURCE} jammy-backports main restricted universe multiverse"; \
23
+ echo "deb ${APT_SOURCE} jammy-security main restricted universe multiverse"; \
24
+ } > /etc/apt/sources.list
25
+
26
+ # Install systemctl
27
+ RUN apt-get update && \
28
+ apt-get install -y -o Dpkg::Options::="--force-confdef" systemd && \
29
+ apt-get clean
30
+
31
+ # Install tini
32
+ RUN apt-get update && \
33
+ apt-get install -y tini aria2 && \
34
+ apt-get clean
35
+
36
+ # Change pip source
37
+ RUN pip config set global.index-url "${PIP_INDEX}" && \
38
+ pip config set global.extra-index-url "${PIP_INDEX}" && \
39
+ python -m pip install --upgrade pip
40
+
41
+ # Uninstall nv-pytorch fork
42
+ RUN pip uninstall -y torch torchvision torchaudio \
43
+ pytorch-quantization pytorch-triton torch-tensorrt \
44
+ xgboost transformer_engine flash_attn apex megatron-core grpcio
45
+
46
+ # Reinstall CUDA 12.4
47
+ RUN aria2c https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin && \
48
+ mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
49
+
50
+ RUN aria2c --always-resume=true --max-tries=99999 https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb && \
51
+ dpkg -i cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb && \
52
+ cp /var/cuda-repo-ubuntu2204-12-4-local/cuda-*-keyring.gpg /usr/share/keyrings/ && \
53
+ apt-get update && \
54
+ apt-get -y install cuda-toolkit-12-4 && \
55
+ rm cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb && \
56
+ update-alternatives --set cuda /usr/local/cuda-12.4 && \
57
+ rm -rf /usr/local/cuda-12.6
58
+
59
+ # Install torch-2.6.0+cu124 + vllm-0.8.5.post1 + sglang-0.4.6.post5
60
+ # torch-2.6.0+cu124: cxx11abi=False
61
+ # torch-2.6.0+cu126: cxx11abi=True
62
+ # see https://github.com/flashinfer-ai/flashinfer/issues/911
63
+ # Install sglang-0.4.6.post1 and torch-memory-saver
64
+ RUN pip install "sglang[all]==0.4.6.post5" --no-cache-dir --find-links https://flashinfer.ai/whl/cu124/torch2.6/flashinfer-python && pip install torch-memory-saver --no-cache-dir
65
+
66
+ RUN pip install --no-cache-dir "vllm==0.8.5.post1" "torch==2.6.0" "torchvision==0.21.0" "torchaudio==2.6.0" "tensordict==0.6.2" torchdata
67
+
68
+ RUN pip install --no-cache-dir "transformers[hf_xet]>=4.51.0" accelerate datasets peft hf-transfer \
69
+ "numpy<2.0.0" "pyarrow>=15.0.0" pandas \
70
+ ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile \
71
+ pytest py-spy pyext pre-commit ruff
72
+
73
+ # Install flash-attn-2.7.4.post1 (cxx11abi=False)
74
+ RUN wget -nv https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl && \
75
+ pip install --no-cache-dir flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
76
+
77
+ # Fix packages
78
+ RUN pip uninstall -y pynvml nvidia-ml-py && \
79
+ pip install --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"
80
+
81
+ # Install cudnn
82
+ RUN aria2c --max-tries=9999 https://developer.download.nvidia.com/compute/cudnn/9.8.0/local_installers/cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb && \
83
+ dpkg -i cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb && \
84
+ cp /var/cudnn-local-repo-ubuntu2204-9.8.0/cudnn-*-keyring.gpg /usr/share/keyrings/ && \
85
+ apt-get update && \
86
+ apt-get -y install cudnn-cuda-12 && \
87
+ rm cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb
88
+
89
+ RUN pip install --no-cache-dir nvidia-cudnn-cu12==9.8.0.87
90
+
91
+ # Install Apex
92
+ RUN git clone https://github.com/NVIDIA/apex.git && \
93
+ cd apex && \
94
+ pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
95
+
96
+ # Install TransformerEngine
97
+ RUN export NVTE_FRAMEWORK=pytorch && pip3 install --no-deps --no-cache-dir git+https://github.com/NVIDIA/[email protected]
98
+
99
+ # Install Megatron-LM
100
+ RUN pip3 install --no-deps --no-cache-dir git+https://github.com/NVIDIA/Megatron-LM.git@core_v0.12.0
101
+
102
+ # Fix opencv
103
+ RUN pip install opencv-python
104
+
105
+ RUN pip install opencv-fixer && \
106
+ python -c "from opencv_fixer import AutoFix; AutoFix()"
107
+
108
+ # Install verl
109
+
110
+ # Reset pip config
111
+ RUN pip config unset global.index-url && \
112
+ pip config unset global.extra-index-url
113
+
114
+ RUN apt-get update && \
115
+ apt-get install -y aria2 libfreeimage3 libfreeimage-dev zlib1g
116
+
117
+ RUN aria2c --always-resume=true --max-tries=99999 https://developer.nvidia.com/downloads/assets/tools/secure/nsight-systems/2025_3/nsight-systems-2025.3.1_2025.3.1.90-1_amd64.deb && \
118
+ apt-get update && apt-get install -y libxcb-cursor0 && \
119
+ dpkg -i ./nsight-systems-2025.3.1_2025.3.1.90-1_amd64.deb && \
120
+ rm -rf /usr/local/cuda/bin/nsys && \
121
+ ln -s /opt/nvidia/nsight-systems/2025.3.1/target-linux-x64/nsys /usr/local/cuda/bin/nsys && \
122
+ rm -rf /usr/local/cuda/bin/nsys-ui && \
123
+ ln -s /opt/nvidia/nsight-systems/2025.3.1/target-linux-x64/nsys-ui /usr/local/cuda/bin/nsys-ui && \
124
+ rm nsight-systems-2025.3.1_2025.3.1.90-1_amd64.deb
docker/Dockerfile.vllm.sglang.megatron.deepseek ADDED
@@ -0,0 +1,115 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Start from the NVIDIA official image (ubuntu-22.04 + cuda-12.6 + python-3.10)
2
+ # https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-24-08.html
3
+ FROM nvcr.io/nvidia/pytorch:24.08-py3
4
+
5
+ # Define environments
6
+ ENV MAX_JOBS=32
7
+ ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
8
+ ENV DEBIAN_FRONTEND=noninteractive
9
+ ENV NODE_OPTIONS=""
10
+ ENV PIP_ROOT_USER_ACTION=ignore
11
+ ENV HF_HUB_ENABLE_HF_TRANSFER="1"
12
+
13
+ # Define installation arguments
14
+ ARG APT_SOURCE=https://mirrors.tuna.tsinghua.edu.cn/ubuntu/
15
+ ARG PIP_INDEX=https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
16
+
17
+ # Set apt source
18
+ RUN cp /etc/apt/sources.list /etc/apt/sources.list.bak && \
19
+ { \
20
+ echo "deb ${APT_SOURCE} jammy main restricted universe multiverse"; \
21
+ echo "deb ${APT_SOURCE} jammy-updates main restricted universe multiverse"; \
22
+ echo "deb ${APT_SOURCE} jammy-backports main restricted universe multiverse"; \
23
+ echo "deb ${APT_SOURCE} jammy-security main restricted universe multiverse"; \
24
+ } > /etc/apt/sources.list
25
+
26
+ # Install systemctl
27
+ RUN apt-get update && \
28
+ apt-get install -y -o Dpkg::Options::="--force-confdef" systemd && \
29
+ apt-get clean
30
+
31
+ # Install tini
32
+ RUN apt-get update && \
33
+ apt-get install -y tini aria2 && \
34
+ apt-get clean
35
+
36
+ # Change pip source
37
+ RUN pip config set global.index-url "${PIP_INDEX}" && \
38
+ pip config set global.extra-index-url "${PIP_INDEX}" && \
39
+ python -m pip install --upgrade pip
40
+
41
+ # Uninstall nv-pytorch fork
42
+ RUN pip uninstall -y torch torchvision torchaudio \
43
+ pytorch-quantization pytorch-triton torch-tensorrt \
44
+ xgboost transformer_engine flash_attn apex megatron-core grpcio
45
+
46
+ # Reinstall CUDA 12.4
47
+ RUN aria2c https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin && \
48
+ mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
49
+
50
+ RUN aria2c --always-resume=true --max-tries=99999 https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb && \
51
+ dpkg -i cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb && \
52
+ cp /var/cuda-repo-ubuntu2204-12-4-local/cuda-*-keyring.gpg /usr/share/keyrings/ && \
53
+ apt-get update && \
54
+ apt-get -y install cuda-toolkit-12-4 && \
55
+ rm cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb && \
56
+ update-alternatives --set cuda /usr/local/cuda-12.4 && \
57
+ rm -rf /usr/local/cuda-12.6
58
+
59
+ # Install torch-2.6.0+cu124 + vllm-0.8.5.post1 + sglang-0.4.6.post5
60
+ # torch-2.6.0+cu124: cxx11abi=False
61
+ # torch-2.6.0+cu126: cxx11abi=True
62
+ # see https://github.com/flashinfer-ai/flashinfer/issues/911
63
+ # Install sglang-0.4.6.post1 and torch-memory-saver
64
+ RUN pip install --resume-retries 999 "sglang[all]==0.4.6.post5" --no-cache-dir --find-links https://flashinfer.ai/whl/cu124/torch2.6/flashinfer-python && pip install --resume-retries 999 torch-memory-saver --no-cache-dir
65
+
66
+ RUN pip install --resume-retries 999 --no-cache-dir "vllm==0.8.5.post1" "torch==2.6.0" "torchvision==0.21.0" "torchaudio==2.6.0" "tensordict==0.6.2" torchdata
67
+
68
+ RUN pip install --resume-retries 999 --no-cache-dir "transformers[hf_xet]>=4.51.0" accelerate datasets peft hf-transfer \
69
+ "numpy<2.0.0" "pyarrow>=15.0.0" pandas \
70
+ ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile \
71
+ pytest py-spy pyext pre-commit ruff
72
+
73
+ # Install flash-attn-2.7.4.post1 (cxx11abi=False)
74
+ RUN wget -nv https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl && \
75
+ pip install --no-cache-dir flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
76
+
77
+ # Fix packages
78
+ RUN pip uninstall -y pynvml nvidia-ml-py && \
79
+ pip install --resume-retries 999 --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"
80
+
81
+ # Install cudnn
82
+ RUN aria2c --max-tries=9999 https://developer.download.nvidia.com/compute/cudnn/9.8.0/local_installers/cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb && \
83
+ dpkg -i cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb && \
84
+ cp /var/cudnn-local-repo-ubuntu2204-9.8.0/cudnn-*-keyring.gpg /usr/share/keyrings/ && \
85
+ apt-get update && \
86
+ apt-get -y install cudnn-cuda-12 && \
87
+ rm cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb
88
+
89
+ RUN pip install --resume-retries 999 --no-cache-dir nvidia-cudnn-cu12==9.8.0.87
90
+
91
+ # Install Apex
92
+ RUN git clone https://github.com/NVIDIA/apex.git && \
93
+ cd apex && \
94
+ pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
95
+
96
+ # Install TransformerEngine
97
+ RUN export NVTE_FRAMEWORK=pytorch && pip3 install --no-deps --no-cache-dir git+https://github.com/NVIDIA/[email protected]
98
+
99
+ # Install Megatron-LM
100
+ RUN pip3 install --no-deps --no-cache-dir git+https://github.com/NVIDIA/Megatron-LM.git@core_v0.12.1
101
+
102
+ # Fix opencv
103
+ RUN pip install opencv-python
104
+
105
+ RUN pip install opencv-fixer && \
106
+ python -c "from opencv_fixer import AutoFix; AutoFix()"
107
+
108
+ # Install verl
109
+
110
+ # Reset pip config
111
+ RUN pip config unset global.index-url && \
112
+ pip config unset global.extra-index-url
113
+
114
+ RUN apt-get update && \
115
+ apt-get install -y aria2 libfreeimage3 libfreeimage-dev zlib1g
docs/Makefile ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Minimal makefile for Sphinx documentation
2
+ #
3
+
4
+ # You can set these variables from the command line.
5
+ SPHINXOPTS =
6
+ SPHINXBUILD = sphinx-build
7
+ SPHINXPROJ = verl
8
+ SOURCEDIR = .
9
+ BUILDDIR = _build
10
+
11
+ # Put it first so that "make" without argument is like "make help".
12
+ help:
13
+ @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
14
+
15
+ .PHONY: help Makefile
16
+
17
+ # Catch-all target: route all unknown targets to Sphinx using the new
18
+ # "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
19
+ %: Makefile
20
+ @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
docs/README.md ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # verl documentations
2
+
3
+ ## Build the docs
4
+
5
+ ```bash
6
+ # Install dependencies.
7
+ pip install -r requirements-docs.txt
8
+
9
+ # Build the docs.
10
+ make clean
11
+ make html
12
+ ```
13
+
14
+ ## Open the docs with your browser
15
+
16
+ ```bash
17
+ python -m http.server -d _build/html/
18
+ ```
19
+ Launch your browser and navigate to http://localhost:8000 to view the documentation.
docs/README_vllm0.7.md ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Upgrading to vllm >= 0.7
2
+
3
+ Note: verl+vllm 0.8.3 is now stable. Please see ``docs/README_vllm0.8.md`` for upgrade guide.
4
+
5
+ ## Installation
6
+
7
+ Note: At time of writing, verl+vllm 0.7.x supports **FSDP** for training and **vLLM** for rollout.
8
+
9
+ ```
10
+ # Create the conda environment
11
+ conda create -n verl python==3.10
12
+ conda activate verl
13
+
14
+ # Install verl
15
+ git clone https://github.com/volcengine/verl.git
16
+ cd verl
17
+ pip3 install -e .
18
+
19
+ # Install the latest stable version of vLLM
20
+ pip3 install vllm==0.7.3
21
+
22
+ # Install flash-attn
23
+ pip3 install flash-attn --no-build-isolation
24
+
25
+ ```
26
+
27
+ Note that if you are installing lower versions of vLLM (0.7.0, 0.7.1, 0.7.2), you need to make some tiny patches manually on vllm (/path/to/site-packages/vllm after installation) after the above steps:
28
+
29
+ - vllm/distributed/parallel_state.py: Remove the assertion below:
30
+
31
+ ```
32
+ if (world_size
33
+ != tensor_model_parallel_size * pipeline_model_parallel_size):
34
+ raise RuntimeError(
35
+ f"world_size ({world_size}) is not equal to "
36
+ f"tensor_model_parallel_size ({tensor_model_parallel_size}) x "
37
+ f"pipeline_model_parallel_size ({pipeline_model_parallel_size})")
38
+
39
+ ```
40
+
41
+ - vllm/executor/uniproc_executor.py: change `local_rank = rank` to `local_rank = int(os.environ["LOCAL_RANK"])`
42
+ - vllm/model_executor/model_loader/weight_utils.py: remove the `torch.cuda.empty_cache()` in `pt_weights_iterator`
43
+
44
+ ## Features
45
+
46
+ ### Use cuda graph
47
+
48
+ After installation, examples using FSDP as training backends can be used. By default, the `enforce_eager` is set to True, which disables the cuda graph. To enjoy cuda graphs and the sleep mode of vLLM>=0.7, add the following lines to the bash script:
49
+
50
+ ```
51
+ actor_rollout_ref.rollout.enforce_eager=False \
52
+ actor_rollout_ref.rollout.free_cache_engine=False \
53
+
54
+ ```
55
+
56
+ For a typical job like examples/ppo_trainer/run_qwen2-7b_seq_balance.sh, the rollout generation time is 115 seconds with vLLM0.6.3, while it is 85 seconds with vLLM0.7.0. By enabling the cudagraph, the generation duration is further reduced to 62 seconds.
57
+
58
+ **Note:** Currently, if the `n` is greater than 1 in `SamplingParams` in vLLM>=0.7, there is a potential performance issue on the stability of rollout generation time (Some iterations would see generation time bursts) using vLLM's V0 Engine.
59
+
60
+ ### Use vLLM V1 Engine
61
+
62
+ Using the vLLM V1 engine can avoid instability issues and achieve additional performance improvements. To use the V1 engine, you can first uninstall the previously installed vLLM and then follow the steps below to install the newer version.
63
+
64
+ ```
65
+ git clone https://github.com/vllm-project/vllm.git
66
+ cd vllm
67
+ git checkout 2275784
68
+ sed -i "903a\ data_parallel_size = world_size // pipeline_model_parallel_size // tensor_model_parallel_size" ./vllm/distributed/parallel_state.py
69
+ VLLM_USE_PRECOMPILED=1 pip install --editable .
70
+ ```
71
+
72
+ Then you can enable the V1 engine by setting `export VLLM_USE_V1=1`. In some benchmark tests, the V1 engine demonstrates a 1.5x speed improvement over the vLLM V0 engine.
73
+ The stable support of the vLLM V1 engine is available on verl main.
docs/README_vllm0.8.md ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Upgrading to vLLM >= 0.8
2
+
3
+ ## Installation
4
+
5
+ Note: This version of verl+vLLM 0.8+ supports **FSDP** for training and **vLLM** for rollout.
6
+
7
+ ```bash
8
+ # Create the conda environment
9
+ conda create -n verl python==3.10
10
+ conda activate verl
11
+
12
+ # Install verl
13
+ git clone https://github.com/volcengine/verl.git
14
+ cd verl
15
+ pip3 install -e .
16
+
17
+ # Install the latest stable version of vLLM
18
+ pip3 install vllm==0.8.3
19
+
20
+ # Install flash-attn
21
+ pip3 install flash-attn --no-build-isolation
22
+
23
+ ```
24
+
25
+ We have a pre-built docker image for verl+vLLM 0.8.3. You can direct import it with the following command:
26
+
27
+ ```bash
28
+ docker pull hiyouga/verl:ngc-th2.6.0-cu126-vllm0.8.3-flashinfer0.2.2-cxx11abi0
29
+ ```
30
+
31
+ ## Features
32
+
33
+ vLLM 0.8+ supports cuda graph and V1 engine by default in verl. To enable these features, remember to add the following lines to the bash script:
34
+
35
+ ```bash
36
+ actor_rollout_ref.rollout.enforce_eager=False \
37
+ actor_rollout_ref.rollout.free_cache_engine=False \
38
+ ```
39
+
40
+ and also **remove** the environment variable if it exists:
41
+
42
+ ```bash
43
+ # If you are using vllm<=0.6.3, you might need to set the following environment variable to avoid bugs:
44
+ # export VLLM_ATTENTION_BACKEND=XFORMERS
45
+ ```
46
+
47
+ ## Notes
48
+
49
+ When you just directly upgrade vllm>=0.8, some dependency packages may undergo version changes. If you encounter the following problems:
50
+
51
+ ```bash
52
+ in <module> from torch.multiprocessing.reductions import ForkingPickler ImportError: cannot import name 'ForkingPickler' from 'torch.multiprocessing.reductions' (/opt/conda/lib/python3.11/site-packages/torch/multiprocessing/reductions.py)
53
+ ```
54
+
55
+ You need to upgrade `tensordict` to version 0.6.2 using the command `pip install tensordict==0.6.2`.
docs/_static/js/runllm-widget.js ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ document.addEventListener("DOMContentLoaded", function () {
2
+ var script = document.createElement("script");
3
+ script.type = "module";
4
+ script.id = "runllm-widget-script";
5
+ script.src = "https://widget.runllm.com";
6
+ script.setAttribute("version", "stable");
7
+ script.setAttribute("crossorigin", "true");
8
+ script.setAttribute("runllm-keyboard-shortcut", "Mod+j");
9
+ script.setAttribute("runllm-name", "verl Chatbot");
10
+ script.setAttribute("runllm-position", "TOP_RIGHT");
11
+ script.setAttribute("runllm-assistant-id", "679");
12
+ script.async = true;
13
+ document.head.appendChild(script);
14
+ });
docs/_static/logo.png ADDED
docs/advance/checkpoint.rst ADDED
@@ -0,0 +1,159 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Using Checkpoints to Support Fault Tolerance Training
2
+ =====================================================
3
+
4
+ There could be training errors or machine failure during the whole RLHF training process,
5
+ so it is recommended to enable checkpoints to minimize your loss.
6
+
7
+ The API Interface has already been listed in :ref:`config-explain-page`,
8
+ and we will not repeat them. But there are still some technique details
9
+ we hope to clarify.
10
+
11
+ .. note::
12
+
13
+ Notice that the ``checkpoint.contents`` field has no effect to FSDP checkpoint except ``hf_model``,
14
+ the other 3 fields are binded together to save and load. We recommend to include ``model``, ``optimizer`` and ``extra`` all.
15
+
16
+ Checkpoint Saving Directory Structure
17
+ -------------------------------------
18
+
19
+ Commonly, we use the ``default_local_dir`` declared in ``ppo_trainer.yaml`` or ``ppo_megatron_trainer.yml``
20
+ to work as preffix when saving checkpoints, which is ``checkpoints/${trainer.project_name}/${trainer.experiment_name}``.
21
+
22
+ So the inner checkpoint structure of **FSDP** is like:
23
+
24
+ .. code::
25
+
26
+ checkpoints/${trainer.project_name}/${trainer.experiment_name}
27
+ ├── global_steps_${i}
28
+ │ ├── actor
29
+ │ │ ├── huggingface # default save config and tokenizer, save huggingface model if include ``hf_model`` in checkpoint.contents
30
+ │ │ ├── model_world_size_{self.world_size}_rank_{self.rank}.pt
31
+ │ │ ├── optim_world_size_{self.world_size}_rank_{self.rank}.pt
32
+ │ │ └── extra_state_world_size_{self.world_size}_rank_{self.rank}.pt
33
+ │ ├── critic
34
+ │ │ ├── huggingface
35
+ │ │ ├── model_world_size_{self.world_size}_rank_{self.rank}.pt
36
+ │ │ ├── optim_world_size_{self.world_size}_rank_{self.rank}.pt
37
+ │ │ └── extra_state_world_size_{self.world_size}_rank_{self.rank}.pt
38
+ └── latest_checkpointed_iteration.txt
39
+
40
+ All model shards, optimizers and extra states are stored together, in a sharded and distributed way.
41
+
42
+ While **Megatron** current checkpoint structure is:
43
+
44
+ .. code::
45
+
46
+ checkpoints/${trainer.project_name}/${trainer.experiment_name}
47
+ ├── global_steps_${i}
48
+ │ ├── actor
49
+ │ │ ├── huggingface # default save config and tokenizer, save huggingface model if include ``hf_mode`` in checkpoint.contents
50
+ │ │ └── dist_ckpt # save sharded model/optimizer/rng_states, naming the same as Megatron
51
+ │ └── critic
52
+ │ │ ├── huggingface
53
+ │ │ └── dist_ckpt
54
+ └── latest_checkpointed_iteration.txt
55
+
56
+ Convert FSDP and Megatron Checkpoints to HuggingFace Format Model
57
+ -----------------------------------------------------------------
58
+
59
+ We provide a tool to convert the FSDP and Megatron checkpoints to HuggingFace format model.
60
+ The tool is located in ``verl/model_merger``.
61
+
62
+ The script supports two main sub-commands: `merge` (to convert and save checkpoints) and `test` (to validate merged checkpoints against a reference model).
63
+ The arguments for the `merge` sub-command are as follows:
64
+
65
+ .. code:: bash
66
+
67
+ usage: python -m verl.model_merger merge [-h] --backend {fsdp,megatron} --local_dir LOCAL_DIR [--hf_model_path HF_MODEL_PATH]
68
+ [--tie-word-embedding] [--is-value-model] [--target_dir TARGET_DIR]
69
+ [--hf_upload_path HF_UPLOAD_PATH] [--private]
70
+
71
+ options:
72
+ -h, --help show this help message and exit
73
+ --backend {fsdp,megatron}
74
+ The backend of the model
75
+ --local_dir LOCAL_DIR
76
+ Path to the saved model checkpoints
77
+ --hf_model_path HF_MODEL_PATH
78
+ (Deprecated) Path to the original Hugging Face model for config.
79
+ --tie-word-embedding Whether to tie word embedding weights (currently only Megatron supported)
80
+ --is-value-model Whether the model is a value model (currently only Megatron supported)
81
+ --target_dir TARGET_DIR
82
+ Directory to save the merged huggingface model
83
+ --hf_upload_path HF_UPLOAD_PATH
84
+ Hugging Face repository ID to upload the model
85
+ --private Whether to upload the model to a private Hugging Face repository
86
+
87
+ Example usage for merging Megatron checkpoints:
88
+
89
+ .. code:: bash
90
+
91
+ python -m verl.model_merger merge \
92
+ --backend megatron \
93
+ --tie-word-embedding \
94
+ --local_dir checkpoints/verl_megatron_gsm8k_examples/qwen2_5_0b5_megatron_saveload/global_step_1/actor \
95
+ --target_dir /path/to/merged_hf_model
96
+
97
+ Example usage for merging FSDP checkpoints:
98
+
99
+ .. code:: bash
100
+
101
+ python -m verl.model_merger merge \
102
+ --backend fsdp \
103
+ --local_dir checkpoints/verl_fsdp_gsm8k_examples/qwen2_5_0b5_fsdp_saveload/global_step_1/actor \
104
+ --target_dir /path/to/merged_hf_model
105
+
106
+
107
+ Megatron Merger details
108
+ -----------------------
109
+
110
+ Current implement of decoder layers uses ``nn.ModuleList`` to store the layers,
111
+ and thus the model layers on every PP rank and VPP rank starts their index from 0.
112
+
113
+ There are 3 ways to correct this behavior:
114
+
115
+ 1. Modify the decoder layer's state_dict, add ``offset`` to each layer's index, thus rewrite ``nn.ModuleList`` implementation.
116
+ 2. Modify the layer index when saving checkpoint and recover them when loading checkpoint.
117
+ 3. The Checkpoint merger do this work, calculate the actual ``offset`` from ``state_dict`` only, a little complex.
118
+
119
+ Current implementation use solution 2.
120
+
121
+
122
+ HuggingFace to Megatron DistCheckpoint details
123
+ ----------------------------------------------
124
+
125
+ If your model is quite huge, we recommend you to use Megatron dist-checkpoint to load the model.
126
+ Megatron dist-checkpoint supports loading with different kinds of model parallelism,
127
+ and it is much faster than the original checkpoint loading.
128
+
129
+ To convert original HuggingFace model to Megatron dist-checkpoint,
130
+ you can use the ``scripts/converter_hf_to_mcore.py`` script. Large MoE models are temporarily supported with CPU initialization,
131
+ which is a little slower. While we are working on a better solution to support large models.
132
+
133
+ Example command to convert the model is as follows:
134
+
135
+ .. code:: bash
136
+
137
+ python scripts/converter_hf_to_mcore.py \
138
+ --hf_model_path Qwen/Qwen1.5-MoE-A2.7B-Chat \
139
+ --output_path /mnt/disk/Qwen/Qwen1.5-MoE-A2.7B-Chat \
140
+ --use_cpu_initialization # Only work for MoE models
141
+
142
+
143
+ Original Checkpoint Utils
144
+ -------------------------
145
+
146
+ Original Checkpoint Utils refer to original checkpoint implementation in ``verl/models/[model]/megatron/checkpoint_utils``.
147
+
148
+ We only need ``[model]_loader.py`` in original checkpoint utils now, since we get rid of storing ``hf_model`` every time (which is not recommended for large model training, try only saving sharded models if you can).
149
+
150
+ .. note::
151
+
152
+ Note that ``[model]_loader`` only support environments where **storage clusters are able to connect with every calculation nodes**.
153
+ Because it utilizes **sharded load way to minimize the loading checkpoint overhead**.
154
+ Every rank loads its own data from ``state_dict`` which can be accessed by all of them.
155
+ While there is also no need to broadcast among DP ranks, since the saved state_dict is only produced by DP rank 0.
156
+
157
+ For users who can **only place the huggingface model on one device**, we keep the original costly implementation in ``[model]_loader_deprecated``. In this implementation, rank 0 broadcast all weights to each tp and pp rank, and then dp rank 0 broadcast to all dp ranks. There may be at risks of OOM.
158
+
159
+ To use deprecated loader, change the import package of ``load_state_dict_to_megatron_llama``.
docs/advance/dpo_extension.rst ADDED
@@ -0,0 +1,271 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Extend to other RL(HF) algorithms
2
+ =================================
3
+
4
+ We already implemented the complete training pipeline of the PPO
5
+ algorithms. To extend to other algorithms, we analyze the high-level
6
+ principle to use verl and provide a tutorial to implement the DPO
7
+ algorithm. Users can follow the similar paradigm to extend to other RL algorithms.
8
+
9
+ .. note:: **Key ideas**: Single process drives multi-process computation and data communication.
10
+
11
+ Overall Approach
12
+ ----------------
13
+
14
+ Step 1: Consider what multi-machine multi-GPU computations are needed
15
+ for each model, such as ``generate_sequence`` , ``compute_log_prob`` and
16
+ ``update_policy`` in the actor_rollout model. Implement distributed
17
+ single-process-multiple-data (SPMD) computation and encapsulate them
18
+ into APIs
19
+
20
+ Step 2: Based on different distributed scenarios, including FSDP and 3D
21
+ parallelism in Megatron-LM, implement single-process control of data
22
+ interaction among multi-process computations.
23
+
24
+ Step 3: Utilize the encapsulated APIs to implement the control flow
25
+
26
+ Example: Online DPO
27
+ -------------------
28
+
29
+ We use verl to implement a simple online DPO algorithm. The algorithm
30
+ flow of Online DPO is as follows:
31
+
32
+ 1. There is a prompt (rollout) generator which has the same weight as
33
+ the actor model. After a batch of prompts are fed into the generator,
34
+ it generates N responses for each prompt.
35
+ 2. Send all the prompts + responses to a verifier for scoring, which can
36
+ be reward model or a rule-based function. Then sort them in pairs to
37
+ form a training batch.
38
+ 3. Use this training batch to train the actor model using DPO. During
39
+ the process, a reference policy is needed.
40
+
41
+ Step 1: What are the multi-machine multi-GPU computations
42
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
43
+
44
+ **Sample Generator**
45
+
46
+ Implementation details:
47
+
48
+ .. code:: python
49
+
50
+ from verl.single_controller.base import Worker
51
+ from verl.single_controller.ray import RayWorkerGroup, RayClassWithInitArgs, RayResourcePool
52
+ import ray
53
+
54
+ @ray.remote
55
+ class SampleGenerator(Worker):
56
+ def __init__(self, config):
57
+ super().__init__()
58
+ self.config = config
59
+
60
+ def generate_sequences(self, data):
61
+ pass
62
+
63
+ Here, ``SampleGenerator`` can be viewed as a multi-process pulled up by
64
+ ``torchrun``, with each process running the same code (SPMD).
65
+ ``SampleGenerator`` needs to implement a ``generate_sequences`` API for
66
+ the control flow to call. The implementation details inside can use any
67
+ inference engine including vllm, sglang and huggingface. Users can
68
+ largely reuse the code in
69
+ verl/verl/workers/rollout/vllm_rollout/vllm_rollout.py and we won't
70
+ go into details here.
71
+
72
+ **ReferencePolicy inference**
73
+
74
+ API: compute reference log probability
75
+
76
+ .. code:: python
77
+
78
+ from verl.single_controller.base import Worker
79
+ import ray
80
+
81
+ @ray.remote
82
+ class ReferencePolicy(Worker):
83
+ def __init__(self):
84
+ super().__init__()
85
+ self.model = Model()
86
+
87
+ def infer(self, data):
88
+ return self.model(data)
89
+
90
+ **Actor update**
91
+
92
+ API: Update actor model parameters
93
+
94
+ .. code:: python
95
+
96
+ from verl.single_controller.base import Worker
97
+ import ray
98
+
99
+ @ray.remote
100
+ class DPOActor(Worker):
101
+ def __init__(self):
102
+ super().__init__()
103
+ self.model = Model()
104
+ self.model = FSDP(self.model) # or other distributed strategy
105
+ self.optimizer = optim.Adam(self.model.parameters(), lr=1e-3)
106
+ self.loss_fn = xxx
107
+
108
+ def update(self, data):
109
+ self.optimizer.zero_grad()
110
+ logits = self.model(data)
111
+ loss = self.loss_fn(logits)
112
+ loss.backward()
113
+ self.optimizer.step()
114
+
115
+ **Notes: How to distinguish between control processes and distributed computation processes**
116
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
117
+
118
+ - Control processes are generally functions directly decorated with
119
+ ``@ray.remote``
120
+ - Computation processes are all wrapped into a ``RayWorkerGroup``.
121
+
122
+ Users can reuse most of the distribtued computation logics implemented
123
+ in PPO algorithm, including FSDP and Megatron-LM backend in
124
+ verl/verl/trainer/ppo.
125
+
126
+ Step 2: Based on different distributed scenarios, implement single-process control of multi-process data interaction
127
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
128
+
129
+ **The core problem to solve here is how a single process sends data to
130
+ multiple processes, drives multi-process computation, and how the
131
+ control process obtains the results of multi-process computation.**
132
+ First, we initialize the multi-process ``WorkerGroup`` in the control
133
+ process.
134
+
135
+ .. code:: python
136
+
137
+ @ray.remote(num_cpus=1)
138
+ def main_task(config):
139
+ # construct SampleGenerator
140
+ resource_pool = RayResourcePool(process_on_nodes=[8] * 2) # 16 GPUs
141
+ ray_cls = RayClassWithInitArgs(SampleGenerator, config=config)
142
+ # put SampleGenerator onto resource pool
143
+ worker_group = RayWorkerGroup(resource_pool, ray_cls)
144
+
145
+ # construct reference policy
146
+
147
+ As we can see, in the control process, multiple processes are wrapped
148
+ into a ``RayWorkerGroup``. Inside this ``WorkerGroup``, there is a
149
+ ``self._workers`` member, where each worker is a RayActor
150
+ (https://docs.ray.io/en/latest/ray-core/actors.html) of SampleGenerator.
151
+ ray_trainer.md also provide an implementation of
152
+ ``MegatronRayWorkerGroup``.
153
+
154
+ Assuming the model is distributed using FSDP, and there is a batch of
155
+ data on the control process, for data parallelism, the underlying
156
+ calling process is:
157
+
158
+ .. code:: python
159
+
160
+ data = xxx
161
+ data_list = data.chunk(dp_size)
162
+
163
+ output = []
164
+ for d in data_list:
165
+ # worker_group._workers[i] is a SampleGenerator
166
+ output.append(worker_group._workers[i].generate_sequences.remote(d))
167
+
168
+ output = ray.get(output)
169
+ output = torch.cat(output)
170
+
171
+ Single process calling multiple processes involves the following 3
172
+ steps:
173
+
174
+ 1. Split the data into DP parts on the control process.
175
+ 2. Send the data to remote, call the remote computation through RPC, and
176
+ utilize multi-process computation.
177
+ 3. Obtain the computation results of each worker on the control process
178
+ and merge them.
179
+
180
+ Frequently calling these 3 steps on the controller process greatly hurts
181
+ code readability. **In verl, we have abstracted and encapsulated these 3
182
+ steps, so that the worker's method + dispatch + collect can be
183
+ registered into the worker_group**
184
+
185
+ .. code:: python
186
+
187
+ from verl.single_controller.base.decorator import register
188
+
189
+ def dispatch_data(worker_group, data):
190
+ return data.chunk(worker_group.world_size)
191
+
192
+ def collect_data(worker_group, data):
193
+ return torch.cat(data)
194
+
195
+ dispatch_mode = {
196
+ 'dispatch_fn': dispatch_data,
197
+ 'collect_fn': collect_data
198
+ }
199
+
200
+ @register(dispatch_mode=dispatch_mode)
201
+ def generate_sequences(self, data):
202
+ pass
203
+
204
+ In this way, we can directly call the method inside the worker through
205
+ the ``worker_group`` on the control (driver) process (which is a single
206
+ process):
207
+
208
+ .. code:: python
209
+
210
+ output = worker_group.generate_sequences(data)
211
+
212
+ This single line includes data splitting, data distribution and
213
+ computation, and data collection.
214
+
215
+ Furthermore, the model parallelism size of each model is usually fixed,
216
+ including dp, tp, pp. So for these common distributed scenarios, we have
217
+ pre-implemented specific dispatch and collect methods,in `decorator.py <https://github.com/volcengine/verl/blob/main/verl/single_controller/base/decorator.py>`_, which can be directly used to wrap the computations.
218
+
219
+ .. code:: python
220
+
221
+ from verl.single_controller.base.decorator import register, Dispatch
222
+
223
+ @register(dispatch_mode=Dispatch.DP_COMPUTE_PROTO)
224
+ def generate_sequences(self, data: DataProto) -> DataProto:
225
+ pass
226
+
227
+ Here it requires the data interface to be ``DataProto``. Definition of
228
+ ``DataProto`` is in `protocol.py <https://github.com/volcengine/verl/blob/main/verl/protocol.py>`_.
229
+
230
+ Step 3: Main training loop
231
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~
232
+
233
+ With the above training flows, we can implement the algorithm's control
234
+ flow. It is recommended that ``main_task`` is also a ray remote process.
235
+
236
+ .. code:: python
237
+
238
+ @ray.remote(num_cpus=1)
239
+ def main_task(config):
240
+ # construct SampleGenerator
241
+ resource_pool = RayResourcePool(process_on_nodes=[8] * 2) # 16 GPUs
242
+ ray_cls = RayClassWithInitArgs(SampleGenerator, config=config)
243
+ # put SampleGenerator onto resource pool
244
+ sample_gen = RayWorkerGroup(resource_pool, ray_cls)
245
+
246
+ # construct reference policy
247
+ ray_cls = RayClassWithInitArgs(ReferencePolicy)
248
+ ref_policy = RayWorkerGroup(resource_pool, ray_cls)
249
+
250
+ # construct actor
251
+ ray_cls = RayClassWithInitArgs(DPOActor)
252
+ dpo_policy = RayWorkerGroup(resource_pool, ray_cls)
253
+
254
+ dataloader = DataLoader()
255
+
256
+ for data in dataloader:
257
+ # generate data
258
+ data = sample_gen.generate_sequences(data)
259
+ # generate scores for each data
260
+ data = generate_scores(data)
261
+ # generate pairwise data using scores
262
+ data = generate_pairwise_data(data)
263
+ # generate ref_log_prob
264
+ data.batch['ref_log_prob'] = ref_policy.infer(data)
265
+ # update using dpo
266
+ dpo_policy.update(data)
267
+ # logging
268
+
269
+ Here, different ``WorkerGroups`` can be placed in the same resource pool or
270
+ in different resource pools using ``create_colocated_worker_cls``
271
+ similar as in `ray_trainer.py <https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/ray_trainer.py>`_.