PreSINQ GGUF Quantized Qwen3-4B Model
This repository contains the official PreSINQ GGUF-quantized versions of the Qwen3-4B model. For a detailed explanation of PreSINQ strategy please refer to the the official SINQ repository.
SINQ is a fast and high-quality quantization technique designed to significantly reduce Large Language Model size while preserving accuracy.
If you find this project useful, please consider giving a β to the official SINQ repository.
Model Details
- Model Name:
Qwen3-4B-PreSINQ-GGUF - Base Model:
Qwen/Qwen3-4B - Task: Text Generation
- Framework: PyTorch / Transformers
- License: Apache-2.0
- Quantized By: Huawei β Computing Systems Lab
How to Obtain the PreSINQ Model
The PreSINQ Qwen3-4B models are produced using the PreSINQ GGUF script available in the official SINQ repository.
The models provided here correspond to the best-performing configurations for each quantization type.
π Best PreSINQ Quantization Results (Qwen3-4B)
Results below are measured on the WikiText-2 test set.
| Method | Bits | Size (GB) | Perplexity β |
|---|---|---|---|
| Baseline (FP16) | FP16 | 7.50 | 14.3128 |
| Baseline + Q4_K_S | 4-bit | 2.22 | 14.9756 |
| PreSINQ + Q4_K_S | 4-bit | 2.22 | 14.7121 |
| Baseline + Q3_K_S | 3-bit | 1.76 | 19.0347 |
| PreSINQ + Q3_K_S | 3-bit | 1.76 | 16.0734 |
However, you can generate good PreSINQ models (not the best one) faster by reducing the number of configurations explored during the PreSINQ script execution.
The table below shows perplexity for different PreSINQ parameter configurations using Q3_K_S quantization.
Evaluation is performed on a 5k-line subset of the Pile validation dataset.
| Group Size | Iterations | Repetitions | Perplexity |
|---|---|---|---|
| 32 | 2 | 1 | 11.2359 |
| 32 | 4 | 1 | 11.1062 |
| 32 | 8 | 1 | 10.7951 |
| 32 | 16 | 1 | 10.8192 |
| 32 | 32 | 1 | 10.7918 |
| 64 | 2 | 1 | 11.2507 |
| 64 | 4 | 1 | 11.0779 |
| 64 | 8 | 1 | 10.9395 |
| 64 | 16 | 1 | 10.9450 |
| 128 | 2 | 1 | 11.2318 |
| 128 | 4 | 1 | 10.9561 |
| 128 | 8 | 1 | 10.8965 |
| 128 | 16 | 1 | 10.8904 |
π Usage
Usage Example
You can load and run the PreSINQ GGUF models using:
- π€ Transformers
- llama.cpp
- Any GGUF-compatible inference framework
π§Ύ How to Cite This Work
If you find SINQ useful in your research or applications:
@misc{muller2025sinq,
title={SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights},
author={Lorenz K. Muller and Philippe Bich and Jiawei Zhuang and Ahmet Celik and Luca Benfenati and Lukas Cavigelli},
year={2025},
eprint={2509.22944},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={http://arxiv.org/abs/2509.22944}
}
- Downloads last month
- 170
3-bit
4-bit