Instructions to use Nohobby/Q2.5-Qwetiapin-32B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Nohobby/Q2.5-Qwetiapin-32B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Nohobby/Q2.5-Qwetiapin-32B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Nohobby/Q2.5-Qwetiapin-32B")
model = AutoModelForCausalLM.from_pretrained("Nohobby/Q2.5-Qwetiapin-32B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Nohobby/Q2.5-Qwetiapin-32B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Nohobby/Q2.5-Qwetiapin-32B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Nohobby/Q2.5-Qwetiapin-32B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Nohobby/Q2.5-Qwetiapin-32B

SGLang

How to use Nohobby/Q2.5-Qwetiapin-32B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Nohobby/Q2.5-Qwetiapin-32B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Nohobby/Q2.5-Qwetiapin-32B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Nohobby/Q2.5-Qwetiapin-32B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Nohobby/Q2.5-Qwetiapin-32B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Nohobby/Q2.5-Qwetiapin-32B with Docker Model Runner:
```
docker model run hf.co/Nohobby/Q2.5-Qwetiapin-32B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Qwetiapin

There's no 'I' in 'brain damage'

Overview

An attempt to make QwentileSwap write even better by merging it with RP-Ink. And DeepSeek, because why not. However, I screwed up the first merge step by accidentally setting an extremely high epsilon value. Step2 wasn't planned, but due to a wonky tensor size mismatch error, I couldn't merge Step1 into QwentileSwap using sce, so I just threw in some random model. And that did, in fact, solve the issue.

The result? Well, it's usable, I guess. The slop is reduced, more details are brought up, but said details sometimes get messed up. It's fixed by a few swipes and there's a chance that it's caused by my sampler settings, but uhh I'll just leave them as they are.

Prompt format: ChatML

Settings: This kinda works but I'm weird

Quants

Static | Imatrix

Merge Details

Merging Steps

Step1

dtype: bfloat16
tokenizer_source: base
merge_method: della_linear
parameters:
  density: 0.5
  epsilon: 0.4 #was supposed to be 0.04 
  lambda: 1.1
base_model: allura-org/Qwen2.5-32b-RP-Ink
models:
  - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
    parameters:
      weight:
        - filter: v_proj
          value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
        - filter: o_proj
          value: [1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1]
        - filter: up_proj
          value: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
        - filter: gate_proj
          value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
        - filter: down_proj
          value: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
        - value: 0
  - model: allura-org/Qwen2.5-32b-RP-Ink
    parameters:
      weight:
        - filter: v_proj
          value: [1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1]
        - filter: o_proj
          value: [0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0]
        - filter: up_proj
          value: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
        - filter: gate_proj
          value: [1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1]
        - filter: down_proj
          value: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
        - value: 1

Step2

models:
  - model: Aryanne/QwentileSwap
    parameters:
      weight: [1.0, 0.9, 0.8, 0.9, 1.0]
  - model: Daemontatox/Cogito-Ultima
    parameters:
      weight: [0, 0.1, 0.2, 0.1, 0]
merge_method: nuslerp
parameters:
  nuslerp_row_wise: true
dtype: bfloat16
tokenizer_source: base

Step3

models:
  - model: Step2
  - model: Step1
merge_method: sce
base_model: Step2
parameters:
  select_topk:
    - value: [0.3, 0.35, 0.4, 0.35, 0.2]
dtype: bfloat16