Logo

πŸ™ Github   |   πŸ“„ Paper

PreSINQ GGUF Quantized Qwen3-4B Model

This repository contains the official PreSINQ GGUF-quantized versions of the Qwen3-4B model. For a detailed explanation of PreSINQ strategy please refer to the the official SINQ repository. SINQ is a fast and high-quality quantization technique designed to significantly reduce Large Language Model size while preserving accuracy.

If you find this project useful, please consider giving a ⭐ to the official SINQ repository.


Model Details

  • Model Name: Qwen3-4B-PreSINQ-GGUF
  • Base Model: Qwen/Qwen3-4B
  • Task: Text Generation
  • Framework: PyTorch / Transformers
  • License: Apache-2.0
  • Quantized By: Huawei – Computing Systems Lab

How to Obtain the PreSINQ Model

The PreSINQ Qwen3-4B models are produced using the PreSINQ GGUF script available in the official SINQ repository.

The models provided here correspond to the best-performing configurations for each quantization type.

πŸ“Š Best PreSINQ Quantization Results (Qwen3-4B)

Results below are measured on the WikiText-2 test set.

Method Bits Size (GB) Perplexity ↓
Baseline (FP16) FP16 7.50 14.3128
Baseline + Q4_K_S 4-bit 2.22 14.9756
PreSINQ + Q4_K_S 4-bit 2.22 14.7121
Baseline + Q3_K_S 3-bit 1.76 19.0347
PreSINQ + Q3_K_S 3-bit 1.76 16.0734

However, you can generate good PreSINQ models (not the best one) faster by reducing the number of configurations explored during the PreSINQ script execution. The table below shows perplexity for different PreSINQ parameter configurations using Q3_K_S quantization.
Evaluation is performed on a 5k-line subset of the Pile validation dataset.

Group Size Iterations Repetitions Perplexity
32 2 1 11.2359
32 4 1 11.1062
32 8 1 10.7951
32 16 1 10.8192
32 32 1 10.7918
64 2 1 11.2507
64 4 1 11.0779
64 8 1 10.9395
64 16 1 10.9450
128 2 1 11.2318
128 4 1 10.9561
128 8 1 10.8965
128 16 1 10.8904

πŸš€ Usage

Usage Example

You can load and run the PreSINQ GGUF models using:

  • πŸ€— Transformers
  • llama.cpp
  • Any GGUF-compatible inference framework

🧾 How to Cite This Work

If you find SINQ useful in your research or applications:

  • Please give a ⭐ to the official SINQ repository
  • Cite our paper:
@misc{muller2025sinq,
      title={SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights}, 
      author={Lorenz K. Muller and Philippe Bich and Jiawei Zhuang and Ahmet Celik and Luca Benfenati and Lukas Cavigelli},
      year={2025},
      eprint={2509.22944},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={http://arxiv.org/abs/2509.22944}
}
Downloads last month
170
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for huawei-csl/Qwen3-4B-PreSINQ-GGUF

Finetuned
Qwen/Qwen3-4B
Quantized
(196)
this model

Collection including huawei-csl/Qwen3-4B-PreSINQ-GGUF

Paper for huawei-csl/Qwen3-4B-PreSINQ-GGUF