DavidAU commited on
Commit
d36f439
·
verified ·
1 Parent(s): 7c31cc3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -3
README.md CHANGED
@@ -44,9 +44,16 @@ The GGUFs have been augmented with the NEO Imatrix dataset- including the Q8s, F
44
 
45
  There are THREE versions of NEO GGUFs in this repo as well, to take advantage of the unique properties of this model.
46
 
47
- As odd as this sounds, lower to mid quants work best because of the stronger Imatrix effect in these quants. Model can
48
- code better, and seems to make better decisions (rather than hesitating a lot). Higher quants work well, but make generate
49
- longer reasoning blocks, but can in some cases come up with better solutions (relative to smaller quants).
 
 
 
 
 
 
 
50
 
51
  IQ3_M will work well for many use cases, at over 150 T/S ; IQ4s/Q4s are the best of Imatrix and "bits".
52
 
 
44
 
45
  There are THREE versions of NEO GGUFs in this repo as well, to take advantage of the unique properties of this model.
46
 
47
+ As odd as this sounds, lower to mid quants work best because of the stronger Imatrix effect in these quants.
48
+
49
+ Model can code better, and seems to make better decisions (rather than hesitating a lot) and sometimes
50
+ generates SMALLER reasoning blocks.
51
+
52
+ Likewise, lower quants often come up with "outside the box" solutions.
53
+
54
+ Higher quants work well, but make generate longer reasoning blocks, but can in some cases come up with better solutions (relative to smaller quants).
55
+
56
+ For these reasons I suggest you download at least 2 quants and compare operations for your use case(s).
57
 
58
  IQ3_M will work well for many use cases, at over 150 T/S ; IQ4s/Q4s are the best of Imatrix and "bits".
59