Logo

PreSINQ GGUF Quantized Qwen3-4B Model

This repository contains the official PreSINQ GGUF-quantized versions of the Qwen3-4B model. For a detailed explanation of PreSINQ strategy please refer to the the official SINQ repository. SINQ is a fast and high-quality quantization technique designed to significantly reduce Large Language Model size while preserving accuracy.

If you find this project useful, please consider giving a ⭐ to the official SINQ repository.

Model Details

Model Name: Qwen3-4B-PreSINQ-GGUF
Base Model: Qwen/Qwen3-4B
Task: Text Generation
Framework: PyTorch / Transformers
License: Apache-2.0
Quantized By: Huawei – Computing Systems Lab

How to Obtain the PreSINQ Model

The PreSINQ Qwen3-4B models are produced using the PreSINQ GGUF script available in the official SINQ repository.

The models provided here correspond to the best-performing configurations for each quantization type.

📊 Best PreSINQ Quantization Results (Qwen3-4B)

Results below are measured on the WikiText-2 test set.

Method	Bits	Size (GB)	Perplexity ↓
Baseline (FP16)	FP16	7.50	14.3128
Baseline + Q4_K_S	4-bit	2.22	14.9756
PreSINQ + Q4_K_S	4-bit	2.22	14.7121
Baseline + Q3_K_S	3-bit	1.76	19.0347
PreSINQ + Q3_K_S	3-bit	1.76	16.0734

However, you can generate good PreSINQ models (not the best one) faster by reducing the number of configurations explored during the PreSINQ script execution. The table below shows perplexity for different PreSINQ parameter configurations using Q3_K_S quantization.
Evaluation is performed on a 5k-line subset of the Pile validation dataset.

Group Size	Iterations	Repetitions	Perplexity
32	2	1	11.2359
32	4	1	11.1062
32	8	1	10.7951
32	16	1	10.8192
32	32	1	10.7918
64	2	1	11.2507
64	4	1	11.0779
64	8	1	10.9395
64	16	1	10.9450
128	2	1	11.2318
128	4	1	10.9561
128	8	1	10.8965
128	16	1	10.8904

🚀 Usage

Usage Example

You can load and run the PreSINQ GGUF models using:

🤗 Transformers
llama.cpp
Any GGUF-compatible inference framework

🧾 How to Cite This Work

If you find SINQ useful in your research or applications:

Please give a ⭐ to the official SINQ repository
Cite our paper:

@misc{muller2025sinq,
      title={SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights}, 
      author={Lorenz K. Muller and Philippe Bich and Jiawei Zhuang and Ahmet Celik and Luca Benfenati and Lukas Cavigelli},
      year={2025},
      eprint={2509.22944},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={http://arxiv.org/abs/2509.22944}
}