Text Generation
Transformers
GGUF
step3p5
custom_code
imatrix
conversational

What are the benchmarks of the 4 bit model vs the FP8 model?

#9
by Grossor - opened

What it says on the title. I'd like to know how much do we "lose" by running this particular 4bit vs the FP8 model.

StepFun org

Hi @Grossor , due to time limit, before release we only did a sanity check by running HMMT'25 Feb, a challenging math benchmark that requires long reasoning (>64K in some cases). Here is the benchmark score we got:

vllm-bf16-baseline 98.44%
step3p5_flash_Q4_K_S.gguf 97.50%

I would say there is minimal loss, and it is still (one of) the most powerful model that can run in 128GB unified memory

thanks!

Sign up or log in to comment