---
license: apache-2.0
base_model:
- DavidAU/Qwen3-Zero-Coder-Reasoning-0.8B
language:
- en
pipeline_tag: text-generation
tags:
- merge
- programming
- code generation
- code
- codeqwen
- coding
- coder
- qwen3
- chat
- qwen
- qwen-coder
- code
- chat
- reasoning
- thinking
- r1
- cot
- deepseek
- 40k context
- general usage
- problem solving
- brainstorming
- solve riddles
- general usage
---

(uploading...)

<h2>Qwen3-Zero-Coder-Reasoning-0.8B-NEO-EX-GGUF</h2>

This is a coder/programming model, will full reasoning on the Qwen 3 platform.

It contains 42 layers, and 464 tensors - a very dense model for this size.

The GGUFs have been augmented with the NEO Imatrix dataset- including the Q8s, F16s, and BF16s (NEO2, NEO3).

There are THREE versions of NEO GGUFs in this repo as well, to take advantage of the unique properties of this model.

As odd as this sounds, lower to mid quants work best because of the stronger Imatrix effect in these quants. 

Model can code better, and seems to make better decisions (rather than hesitating a lot) and sometimes
generates SMALLER reasoning blocks. 

Likewise, lower quants often come up with "outside the box" solutions.

Higher quants work well, but make generate longer reasoning blocks, but can in some cases come up with better solutions (relative to smaller quants).

For these reasons I suggest you download at least 2 quants and compare operations for your use case(s).

IQ3_M will work well for many use cases, at over 150 T/S ; IQ4s/Q4s are the best of Imatrix and "bits".

For NEO ggufs: Standard NEO Imatrix GGUFs. Q8, F16, BF16 are NOT imatrixed, nor contain Imatrixed tensors/elements.

For NEO2 ggufs: GGufs Imatrixed AND the output tensor was Imatrixed. For Q8, F16, BF16: set at Q6_k (imatrixed).

For NEO3 ggufs: GGufs Imatrixed AND the output tensor was Imatrixed, and set at IQ4_XS for all quants (including Q8, F16, BF16).

<B>Operation:</B>

This model will generate "reasoning block(s)" to solve your coding problem. 

Good directions, with "dos" and "don'ts" will yield the best results.

I suggest 2-3 generations for best results.

That being said, this model can repeat code blocks from time to time, and/or need to be manually stopped.

These issues are present in other Qwen models of this size.

<B>Quant Advice:</B>

Although ususually the advice is to use the biggest quant you can, in this case smaller quants - IQ3_M, Q4s, IQ4s - may yield better
results in some use cases. 

This is due in part to the Neo Imatrix dataset (the dataset has a STRONGER effect inverse to the quant size).

<B>Q8s, F16, BF16</B>

There are three each of these.

First set are normal, second set (NEO2) have the output tensor set at Q6 (which is also imatrixed), and the third set (NEO3) has the output tensor
set at IQ4_XS (which is also imatrixed).

Also, due to the config of NEO2, and NEO3 quants - these will be SMALLER than the standard quants, and therefore operate at higher speed.

<B>Settings:</B>

This model requires:
- Jinja (embedded) or CHATML template
- Max context of 40k.
- Suggest min context of 8k to 16k.

Settings used for testing (suggested):
- Temp .3 to .7
- Rep pen 1.05 to 1.1
- Topp .8 , minp .05
- Topk 20
- No system prompt.

Settings used for testing #2 (suggested):
- Temp .55
- Rep pen 1.05 
- Topp .95 , minp .05
- Topk 100
- No system prompt.

Settings used for testing #3 (suggested - my fav):
- Temp .6
- Rep pen 1.1 
- Topp .95 , minp .0
- Topk 20
- No system prompt.

This model will respond well to both detailed instructions and step by step refinement and additions to code.

As this is an instruct model, it will also benefit from a detailed system prompt too.

For simpler coding problems, lower quants will work well; but for complex/multi-step problem solving suggest Q6 or Q8.

With this model, you should use statements to tell it what you want and want to disallow to help keep this model on track.

---

For more information / other Qwen/Mistral Coders / additional settings see:

[ https://huggingface.co/DavidAU/Qwen2.5-MOE-2x-4x-6x-8x__7B__Power-CODER__19B-30B-42B-53B-gguf ]

---

<H2>Help, Adjustments, Samplers, Parameters and More</H2>

---

<B>CHANGE THE NUMBER OF ACTIVE EXPERTS:</B>

See this document:

https://huggingface.co/DavidAU/How-To-Set-and-Manage-MOE-Mix-of-Experts-Model-Activation-of-Experts

<B>Settings: CHAT / ROLEPLAY and/or SMOOTHER operation of this model:</B>

In "KoboldCpp" or  "oobabooga/text-generation-webui" or "Silly Tavern" ;

Set the "Smoothing_factor" to 1.5 

: in KoboldCpp -> Settings->Samplers->Advanced-> "Smooth_F"

: in text-generation-webui -> parameters -> lower right.

: In Silly Tavern this is called: "Smoothing"


NOTE: For "text-generation-webui" 

-> if using GGUFs you need to use "llama_HF" (which involves downloading some config files from the SOURCE version of this model)

Source versions (and config files) of my models are here:

https://huggingface.co/collections/DavidAU/d-au-source-files-for-gguf-exl2-awq-gptq-hqq-etc-etc-66b55cb8ba25f914cbf210be

OTHER OPTIONS:

- Increase rep pen to 1.1 to 1.15 (you don't need to do this if you use "smoothing_factor")

- If the interface/program you are using to run AI MODELS supports "Quadratic Sampling" ("smoothing") just make the adjustment as noted.

<B>Highest Quality Settings / Optimal Operation Guide / Parameters and Samplers</B>

This a "Class 1" model:

For all settings used for this model (including specifics for its "class"), including example generation(s) and for advanced settings guide (which many times addresses any model issue(s)), including methods to improve model performance for all use case(s) as well as chat, roleplay and other use case(s) please see:

[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]

You can see all parameters used for generation, in addition to advanced parameters and samplers to get the most out of this model here:

[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]