Model Overview
- Model Architecture: gpt-oss-120b
- Input: Text
- Output: Text
- Supported Hardware Microarchitecture: AMD MI350/MI355
- ROCm: 7.0
- Operating System(s): Linux
- Inference Engine: vLLM
- Model Optimizer: AMD-Quark
- Weight quantization: OCP MXFP4, Static
- Activation quantization: OCP MXFP4, Dynamic
- Calibration Dataset: Pile
This model was built with gpt-oss-120b model by applying AMD-Quark for MXFP4 quantization.
Model Quantization
The model was quantized from openai/gpt-oss-120b using AMD-Quark. The weights are quantized MXFP4 and activations were quantized to FP8.
Quantization scripts:
cd Quark/examples/torch/language_modeling/llm_ptq/
exclude_layers="*lm_head *self_attn* *router*"
python3 internal_scripts/quantize_quark.py \
--model_dir openai/gpt-oss-120b \
--quant_scheme w_mxfp4_a_fp8 \
--exclude_layers $exclude_layers \
--num_calib_data 512 \
--output_dir amd/gpt-oss120b-w-mxfp4-a-fp8 \
--model_export hf_format \
--multi_gpu
Deployment
Use with vLLM
This model can be deployed efficiently using the vLLM backend.
Evaluation
The model was evaluated on AIME25 and GPQA Diamond benchmarks with low reasoning effort.
Accuracy
| Benchmark | gpt-oss-120b | gpt-oss120b-w-mxfp4-a-fp8(this model) | Recovery |
| AIME25 | 65.25 | 67.12 | 102.87% |
| GPQA | 51.67 | 53.42 | 103.39% |
Reproduction
The results of AIME25 and GPQA Diamond were obtained using gpt_oss.evals with low effort setting, and vLLM docker rocm/vllm-private:rocm7.0_ubuntu_22.04_vllm_0.10.1_instinct_gptoss_wmxfp4_afp8_20251030.
Launching server
vllm serve amd/gpt-oss120b-w-mxfp4-a-fp8 \
--tensor_parallel_size 2 \
--gpu-memory-utilization 0.90 \
--no-enable-prefix-caching \
--max-num-batched-tokens 1024
Evaluating model in a new terminal
python -m gpt_oss.evals --model /shareddata/amd/gpt-oss120b-w-mxfp4-a-fp8 --eval aime25,gpqa --reasoning-effort low --n-threads 128
License
Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.
- Downloads last month
- 1,376
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for amd/gpt-oss-120b-w-mxfp4-a-fp8
Base model
openai/gpt-oss-120b