Instructions to use ArtusDev/soob3123_GrayLine-Qwen3-8B-EXL3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ArtusDev/soob3123_GrayLine-Qwen3-8B-EXL3 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ArtusDev/soob3123_GrayLine-Qwen3-8B-EXL3")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("ArtusDev/soob3123_GrayLine-Qwen3-8B-EXL3", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ArtusDev/soob3123_GrayLine-Qwen3-8B-EXL3 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ArtusDev/soob3123_GrayLine-Qwen3-8B-EXL3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ArtusDev/soob3123_GrayLine-Qwen3-8B-EXL3",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/ArtusDev/soob3123_GrayLine-Qwen3-8B-EXL3

SGLang

How to use ArtusDev/soob3123_GrayLine-Qwen3-8B-EXL3 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ArtusDev/soob3123_GrayLine-Qwen3-8B-EXL3" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ArtusDev/soob3123_GrayLine-Qwen3-8B-EXL3",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ArtusDev/soob3123_GrayLine-Qwen3-8B-EXL3" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ArtusDev/soob3123_GrayLine-Qwen3-8B-EXL3",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use ArtusDev/soob3123_GrayLine-Qwen3-8B-EXL3 with Docker Model Runner:
```
docker model run hf.co/ArtusDev/soob3123_GrayLine-Qwen3-8B-EXL3
```

EXL3 Quants of soob3123/GrayLine-Qwen3-8B

EXL3 quants of soob3123/GrayLine-Qwen3-8B using exllamav3 for quantization.

Quants

Quant(Revision)	Bits per Weight	Head Bits
2.5_H6	2.5	6
3.0_H6	3.0	6
3.5_H6	3.5	6
4.0_H6	4.0	6
4.5_H6	4.5	6
5.0_H6	5.0	6
6.0_H6	6.0	6
8.0_H8	8.0	8

Downloading quants with huggingface-cli

Click to view download instructions

Install hugginface-cli:

pip install -U "huggingface_hub[cli]"

Download quant by targeting the specific quant revision (branch):

huggingface-cli download ArtusDev/soob3123_GrayLine-Qwen3-8B-EXL3 --revision "5.0bpw_H6" --local-dir ./

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for ArtusDev/soob3123_GrayLine-Qwen3-8B-EXL3

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Finetuned

soob3123/GrayLine-Qwen3-8B

Quantized

(8)

this model

ArtusDev
/

soob3123_GrayLine-Qwen3-8B-EXL3

EXL3 Quants of soob3123/GrayLine-Qwen3-8B

Quants

Downloading quants with huggingface-cli

Model tree for ArtusDev/soob3123_GrayLine-Qwen3-8B-EXL3

Datasets used to train ArtusDev/soob3123_GrayLine-Qwen3-8B-EXL3