Coding without MoEs
Collection
some slower than others • 86 items • Updated
Brainwaves
Gemma-3-27b-it-HERETIC-Gemini-Deep-Reasoning
q8 0.596,0.748,0.881,0.779,0.458,0.819,0.751
Gemma3-27B-it-vl-HERETIC-GLM4.7-2200x-cp515
q8 0.566,0.737,0.876,0.745,0.420,0.805,0.759
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("Gemma3-27B-it-vl-HERETIC-GLM4.7-2200x-cp515-q8-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_dict=False,
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
8-bit
Base model
google/gemma-3-27b-pt