Update README.md
Browse files
README.md
CHANGED
|
@@ -44,9 +44,16 @@ The GGUFs have been augmented with the NEO Imatrix dataset- including the Q8s, F
|
|
| 44 |
|
| 45 |
There are THREE versions of NEO GGUFs in this repo as well, to take advantage of the unique properties of this model.
|
| 46 |
|
| 47 |
-
As odd as this sounds, lower to mid quants work best because of the stronger Imatrix effect in these quants.
|
| 48 |
-
|
| 49 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 50 |
|
| 51 |
IQ3_M will work well for many use cases, at over 150 T/S ; IQ4s/Q4s are the best of Imatrix and "bits".
|
| 52 |
|
|
|
|
| 44 |
|
| 45 |
There are THREE versions of NEO GGUFs in this repo as well, to take advantage of the unique properties of this model.
|
| 46 |
|
| 47 |
+
As odd as this sounds, lower to mid quants work best because of the stronger Imatrix effect in these quants.
|
| 48 |
+
|
| 49 |
+
Model can code better, and seems to make better decisions (rather than hesitating a lot) and sometimes
|
| 50 |
+
generates SMALLER reasoning blocks.
|
| 51 |
+
|
| 52 |
+
Likewise, lower quants often come up with "outside the box" solutions.
|
| 53 |
+
|
| 54 |
+
Higher quants work well, but make generate longer reasoning blocks, but can in some cases come up with better solutions (relative to smaller quants).
|
| 55 |
+
|
| 56 |
+
For these reasons I suggest you download at least 2 quants and compare operations for your use case(s).
|
| 57 |
|
| 58 |
IQ3_M will work well for many use cases, at over 150 T/S ; IQ4s/Q4s are the best of Imatrix and "bits".
|
| 59 |
|