Need help getting started with image generation

Im totally new to all this AI stuff, i’ll get straight to the point.
I want to generate images locally offline on my personal desktop, i’ve got an amd gpu.
So i tried out stablediffusion on a website and was stunned by how good the results where.
I soon realized to generate more detailed images i might want to use a LLM to enhance prompts.
So these are the two things i want to get running locally, a model for image generation so that would be something like stablediffusion correct me if im wrong, and a large language model to enhance image prompts. I have not tried out any of the many AI models so far since i avoided all the hype.
Trying to say, if you can recommend me which models suit best for my purpose that would be helpful.
Also, i preffer running only opensource AI.
I can’t code any programming language, thus a more simple setup or atleast a guide to follow step by step would be very welcome. I tried getting stabblediffusion running but i failed, powershell on windows10 kept throwing errors i tried to solve but couldn’t succeed.

1 Like

When using open-source generative AI models, there are still some limitations with AMD GPUs. While things have improved significantly on Linux and Windows 11 + WSL2 environments today, options remain quite limited on Windows 10


What you’re setting up (two separate local apps)

  • Image generation: Stable Diffusion 1.5 “weights” + a GUI that runs locally (you open it in your browser at 127.0.0.1).
  • Prompt enhancement: a small local text model that turns “an idea” into POSITIVE / NEGATIVE / SETTINGS you copy/paste into the image GUI.

Keeping them separate is the simplest “offline + no-coding” workflow.


The most realistic Windows 10 + AMD path (no WSL2)

Best first-success route

SD.Next + ONNX Runtime + DirectML (DmlExecutionProvider)
SD.Next explicitly supports ONNX Runtime and notes you can select DmlExecutionProvider by installing onnxruntime-directml, and that DirectX 12 is required. (GitHub)

Alternatives (only if you want them later)

  • AUTOMATIC1111 + Microsoft DirectML extension: uses ONNX Runtime + DirectML, but requires models optimized via Olive (more moving parts). (GitHub)
    AMD’s own guide for that extension calls it “preview” and (in that guide) states only SD 1.5 is supported. (AMD)
  • A1111 main repo on Windows+AMD: not officially supported; their wiki points to DirectML-focused forks/approaches instead. (GitHub)
  • SD.Next + ZLUDA: can be a speed/compatibility upgrade on some AMD cards, but it’s an “after you already work” option. SD.Next documents launching it with --use-zluda and notes HIP SDK version constraints. (GitHub)

Step-by-step: SD 1.5 image generation with SD.Next (Windows 10 + AMD)

0) Put it in an easy folder

Use something like:

  • C:\AI\sdnext\

Avoid OneDrive/Desktop/Program Files. (This prevents many permissions/path problems.)

1) Install the basics (one-time)

  • Latest AMD GPU driver + reboot
  • Git for Windows
  • Python (many SD Windows setups are happiest on Python 3.10.x)

2) Install + start SD.Next (use cmd.exe, not PowerShell)

Open Command Prompt and run:

cd C:\AI
git clone https://github.com/vladmandic/sdnext.git
cd sdnext
webui.bat --debug

SD.Next documents launching on Windows with webui.bat --debug. (GitHub)

When it finishes starting, it prints a local URL (often http://127.0.0.1:7860). Open that in your browser.

3) Add an SD 1.5 model file (the “weights”)

A common starter SD 1.5 checkpoint is:

  • v1-5-pruned-emaonly.safetensors (license shown as creativeml-openrail-m) (Hugging Face)

Place the .safetensors file into SD.Next’s model folder (SD.Next “Getting Started” covers the basic “generate with a few clicks” workflow and model handling). (GitHub)

4) Turn on AMD GPU acceleration (ONNX Runtime + DirectML)

In SD.Next, switch to the ONNX Runtime pipeline and choose DmlExecutionProvider (DirectML). SD.Next notes:

  • DML EP becomes available by installing onnxruntime-directml
  • DirectX 12 is required (GitHub)

Why this matters: ONNX Runtime’s DirectML EP has specific constraints (for example, it does not support memory-pattern optimizations or parallel execution in ORT sessions). (ONNX Runtime)

5) First “known-stable” test settings (prove it works)

Start conservative:

  • 512×512
  • Steps: 20
  • CFG: ~7
  • Batch size: 1

Test prompts:

  • Positive: portrait photo, soft studio lighting, sharp focus
  • Negative: lowres, blurry, watermark, text, bad anatomy, extra fingers

Once you can generate one image reliably, then raise resolution/complexity.


Quick troubleshooting (the fastest fixes)

A) Start in “safe mode” to remove extension problems

webui.bat --debug --safe

--safe disables user extensions and is recommended for troubleshooting. (GitHub)

B) UI acts broken / buttons don’t work

SD.Next recommends deleting ui-config.json if it’s bloated (old settings can override new defaults and break the UI). (GitHub)

C) DirectML crashes / weird ORT errors

DirectML EP requires certain ORT options (mem-pattern + parallel execution) to be disabled; enabling them can cause errors. (ONNX Runtime)
If you see errors like 80070057, they’re commonly associated with those constraints; ONNX Runtime has issue reports in this area. (GitHub)


Prompt enhancement (offline, GUI-first)

Pick one “local chat” app

Option 1: Jan (desktop GUI, open source, offline)

Jan is presented as an open-source ChatGPT-like app for running models locally. (GitHub)

Option 2: KoboldCpp (single EXE + browser UI; good AMD hint)

KoboldCpp releases explicitly recommend the Vulkan option in the nocuda build for AMD. (GitHub)

Option 3: Ollama (simple installer)

Ollama’s Windows docs state it does not require Administrator and installs in your home directory by default. (Ollama Official Document)

Good beginner prompt-enhancer models (small + practical)

Specialized prompt optimizers (often best for SD prompting):

  • TIPO-200M (prompt optimization for text-to-image workflows). (Hugging Face)
  • DART v2 (generates Danbooru-style tags; useful if you like tag prompts). (Hugging Face)

General small instruct model (good at structured output):

  • SmolLM2-1.7B-Instruct (compact “run on-device” class model). (Hugging Face)

Copy/paste template for your prompt enhancer

Use this once as your “system prompt” (or first message).

You write prompts for Stable Diffusion 1.5.

Return exactly these sections:

POSITIVE:
NEGATIVE:
SETTINGS:
VARIATIONS:

Rules:
- POSITIVE: 1–2 lines. Include subject, environment, lighting, camera/framing, style/medium.
- NEGATIVE: comma-separated. Include common artifacts: lowres, blurry, watermark, text, deformed hands, extra fingers.
- SETTINGS: suggest resolution (start 512x512), steps (20–30), CFG (6–8).
- VARIATIONS: 5 short alternate POSITIVE prompts that keep the same idea but change lighting/camera/mood.

User idea: <paste your idea here>

Workflow:

  1. Write your idea → 2) copy POSITIVE/NEGATIVE/SETTINGS → 3) paste into SD.Next → 4) generate.
1 Like

Thankls for the fast reply !
Is there no way to install WSL v2 on windows10?
I do also have cachyos linux if you recommend that over windows, but i am very new into it so without well to be honest, without detailed instructions i probably wont be able to handle things.
Maybe i should have also stated my hardware is relatively capable, 16gb vram, 32gb ram.
I know this isn’t enough for some ai models, but it could be worse.
I would like to use “high-end” ai models rather than some beginner models which produce poor or limited output.

1 Like

WSL2 can be installed on most existing Windows 10 systems. However, I don’t think WSL2 is beginner-friendly (unless you’re already familiar with programming or Linux)…
If you do use WSL2, ComfyUI is available, so you can directly apply many online guides. It probably supports the most models among GUI options.
That said, ComfyUI is quite difficult to operate as a software, so I don’t think it’s beginner-friendly…:sweat_smile:

However, with AMD GPUs, using WSL2 or Linux significantly expands what you can do, so it might be worth the challenge despite the difficulty.


Yes — WSL2 works on Windows 10 (if your build is new enough)

WSL2 is supported on Windows 10. The key constraint is your Windows version/build:

  • Easiest path (“one command install”): Windows 10 version 2004+ (Build 19041+) supports wsl --install. (Microsoft Learn)
  • Older-but-still-supported path: Windows 10 version 1903/1909 can run WSL2 if you’re on Build 18362/18363 with minor build 1049+ (backport). (Microsoft Learn)

Step 1 — check your Windows build

  1. Press Win + R → type winver → Enter

  2. Note Version and OS Build

    • If you’re below the supported builds above, update Windows 10 first.

What changes when you use WSL2 (for AMD + “high-end” image models)

On Windows-only AMD setups, you usually rely on DirectML / ONNX acceleration. It can work well, but it tends to lag behind Linux-first tooling in model compatibility and features.

With WSL2 + Ubuntu + ROCm, you’re much closer to the “main” ecosystem:

  • ComfyUI, modern pipelines, and broader model support
  • Better chances of running newer “high-end” checkpoints (depending on ROCm support and VRAM)

AMD explicitly supports ROCm on WSL with Ubuntu 22.04 / 24.04. (AMD ROCm Documentation)


Windows 10 + WSL2 + AMD ROCm: realistic, supported route (recommended if you want newer models)

Phase A — install/enable WSL2

If you’re on Windows 10 2004+:

  1. Open PowerShell (Admin)

  2. Run:

    wsl --install
    
  3. Reboot when asked. (Microsoft Learn)

If you’re on an older supported build (1903/1909 backport), follow the manual steps (enable features, set WSL2, install kernel update). (Microsoft Learn)

After install:

wsl -l -v

You want your distro to show VERSION 2.

Phase B — confirm your AMD GPU is supported for ROCm-on-WSL

AMD publishes a WSL GPU support matrix (for example, it includes RX 7900 series, RX 7800 XT, and other listed cards). Compare your GPU name from Device Manager → Display adapters against the list. (AMD ROCm Documentation)

Phase C — install the AMD WSL driver + ROCm inside Ubuntu (WSL)

AMD’s current WSL instructions are:

  1. On Windows: install the AMD “Adrenalin Edition for WSL2” driver (and reboot). (AMD ROCm Documentation)

  2. In Ubuntu (WSL): install amdgpu-install, then run:

    amdgpu-install -y --usecase=wsl,rocm --no-dkms
    
  3. Verify:

    rocminfo
    

AMD documents this exact flow and verification output. (AMD ROCm Documentation)


If you already have CachyOS Linux: should you use it instead?

For AMD “AI stack” work, Linux native is often the best-performing and least-restricted once installed correctly.

The catch: AMD ROCm’s officially supported distros are things like Ubuntu 22.04/24.04 and certain RHEL versions (rolling Arch-based distros are not the primary supported target). (AMD ROCm Documentation)

Practical recommendation (beginner-friendly):

  • Keep CachyOS for daily use if you like it, but for AI, use Ubuntu 22.04 or 24.04 either:

    • Dual-boot, or
    • As your WSL2 distro on Windows, or
    • As a separate SSD/partition install

This aligns with AMD’s documentation and reduces “unknown distro” friction.


“High-end” image models that fit your hardware (16GB VRAM, 32GB RAM)

1) SDXL (strong baseline, lots of community models)

  • Very widely supported in UIs and workflows.
  • If VRAM becomes tight, the official model card notes using CPU offload as a VRAM workaround in Diffusers-based setups. (Hugging Face)
  • With 16GB VRAM, SDXL is generally a comfortable target for 1024-ish workflows (plus upscaling/refine).

2) Stable Diffusion 3.5 Medium (newer architecture, better prompt following)

Stability AI states SD 3.5 Medium “only requires ~9.9GB VRAM (excluding text encoders)” to run at full performance. That fits your 16GB VRAM class well. (Stability AI)

3) FLUX family (high quality, heavier; often needs quantization to fit)

Flux models can be extremely VRAM-hungry at full precision; community reports commonly rely on 8-bit / fp8 / 4-bit variants to run on smaller cards. (Expect slower speed and more setup complexity than SDXL.) (GitHub)

Bottom line: for “high-end within your environment” with the least drama:

  • Start with SDXL and/or SD 3.5 Medium on Linux/WSL2+ROCm.

The easiest “modern” image-generation UI once you’re on ROCm: ComfyUI

AMD provides a ROCm-focused ComfyUI install guide and even points to using Templates inside the UI (e.g., SDXL Turbo template). (AMD ROCm Documentation)

Why ComfyUI fits your situation:

  • You still use a browser GUI
  • It’s currently one of the best paths for newer models + advanced workflows (upscale, inpaint, ControlNet-style conditioning, etc.)

Local prompt-enhancer (offline) that does NOT require coding

Option A (GUI-first): Jan

  • Jan is an open-source desktop chat app designed to run models locally/offline. (GitHub)
  • Pair it with a small, capable instruct model like Mistral-7B-Instruct (Apache-2.0). (Hugging Face)

A simple, stable output format for prompt enhancement is to have the LLM always emit:

  • POSITIVE:
  • NEGATIVE:
  • SETTINGS:
  • VARIATIONS:

Option B (very simple, strong AMD support): KoboldCpp

  • Single app + browser UI; releases explicitly recommend Vulkan (nocuda build) for AMD. (GitHub)

Recommended “do-this-first” plan (minimizes dead ends)

  1. Get WSL2 working on Windows 10 (update Windows if needed). (Microsoft Learn)
  2. Install Ubuntu 22.04 in WSL2 (AMD explicitly supports it). (AMD ROCm Documentation)
  3. Install AMD Adrenalin for WSL2 + ROCm and verify rocminfo. (AMD ROCm Documentation)
  4. Install ComfyUI and run SDXL (then try SD 3.5 Medium after you have a working baseline). (AMD ROCm Documentation)
  5. Add a prompt enhancer with Jan + a small instruct model, using the fixed POSITIVE/NEGATIVE/SETTINGS format. (GitHub)

This path is the most realistic way to get “high-end” local image generation on AMD hardware without getting trapped in Windows-only acceleration limitations.

If you can get PyTorch installed appropriately once, the rest isn’t too difficult, but that’s the tough part.
For example, how to install WSL2, ComfyUI, and FLUX.1:


Overview: what “FLUX.1 (GGUF) in ComfyUI” actually means

  • ComfyUI is the node-based image generation UI/server. You run it locally and open it in a browser. (ComfyUI Official Document)
  • FLUX.1 GGUF files (from common community conversions) typically contain only the diffusion/UNet part, so you still need the text encoders and VAE files that Flux workflows expect. (Hugging Face)
  • On Windows 10 + AMD, the most practical “high-end” route is:
    WSL2 (Ubuntu) → AMD ROCm in WSL → PyTorch ROCm → ComfyUI → ComfyUI-GGUF → Flux workflow. (Microsoft Learn)

With 16GB VRAM / 32GB RAM, you’re in a good spot for Flux Schnell and many Flux Dev GGUF quantizations, but you’ll want to be deliberate about which encoder variants you use (FP8 vs FP16). (comfyanonymous.github.io)


Phase 0 — Compatibility and prerequisites (do this first)

0.1 Confirm your Windows 10 build supports the simple WSL install

Microsoft’s “wsl --install” flow requires Windows 10 version 2004+ (Build 19041+). (Microsoft Learn)

  • Press Win + Rwinver → check Version and OS Build
  • If you’re older than that, use the manual install path. (Microsoft Learn)

0.2 Confirm your AMD GPU is supported for ROCm on WSL

ROCm on WSL does not support all AMD GPUs. Check AMD’s WSL support matrix for your exact GPU model. (ROCm Documentation)

If your GPU is not listed, ComfyUI + ROCm will usually be easier on a native Linux install than on WSL.


Phase 1 — Install WSL2 + Ubuntu (Windows side)

Open PowerShell (Admin):

wsl --install

This installs WSL and defaults new distros to WSL2. (Microsoft Learn)

Then install Ubuntu (example):

wsl --install -d Ubuntu-24.04

Verify:

wsl -l -v

You want Ubuntu showing VERSION 2. (Microsoft Learn)

If wsl --install is not available on your Windows build, follow Microsoft’s manual steps (enable features, kernel update, set default version to 2). (Microsoft Learn)


Phase 2 — Install AMD ROCm support for WSL (Windows + Ubuntu/WSL)

2.1 Install the AMD Windows driver that matches ROCm-on-WSL

AMD’s ROCm-on-WSL docs call out AMD Software: Adrenalin Edition 26.1.1 for WSL2 (and to reboot after installing). (ROCm Documentation)

2.2 Install ROCm in Ubuntu (WSL)

Inside Ubuntu (WSL terminal), run the AMD-recommended install command:

sudo apt update
sudo apt install -y amdgpu-install
sudo amdgpu-install -y --usecase=wsl,rocm --no-dkms

This is AMD’s documented approach for the WSL usecase. (ROCm Documentation)

Verify ROCm sees the GPU:

rocminfo

If rocminfo fails or shows no usable GPU, go back to the AMD WSL compatibility matrix. (ROCm Documentation)


Phase 3 — Install PyTorch ROCm (Ubuntu in WSL)

AMD recommends using their wheels from repo.radeon.com rather than random PyTorch nightly wheels, and notes:

  • NumPy 2.0 is incompatible with the wheels in this configuration (use numpy==1.26.4)
  • For Ubuntu 24.04 wheels shown in their doc, Python 3.12 is required (ROCm Documentation)

Option A (recommended): use a virtual environment

sudo apt update
sudo apt install -y python3-venv python3-pip wget
python3 -m venv ~/venvs/torch-rocm
source ~/venvs/torch-rocm/bin/activate

pip install --upgrade pip wheel
pip install numpy==1.26.4

Now install AMD’s wheels (Ubuntu 24.04 example shown in AMD docs; use the 22.04 block if you chose Ubuntu 22.04). (ROCm Documentation)

wget https://repo.radeon.com/rocm/manylinux/rocm-rel-7.2/torch-2.9.1%2Brocm7.2.0.lw.git7e1940d4-cp312-cp312-linux_x86_64.whl
wget https://repo.radeon.com/rocm/manylinux/rocm-rel-7.2/torchvision-0.24.0%2Brocm7.2.0.gitb919bd0c-cp312-cp312-linux_x86_64.whl
wget https://repo.radeon.com/rocm/manylinux/rocm-rel-7.2/triton-3.5.1%2Brocm7.2.0.gita272dfa8-cp312-cp312-linux_x86_64.whl
wget https://repo.radeon.com/rocm/manylinux/rocm-rel-7.2/torchaudio-2.9.0%2Brocm7.2.0.gite3c6ee2b-cp312-cp312-linux_x86_64.whl

pip uninstall -y torch torchvision triton torchaudio
pip install \
  torch-2.9.1+rocm7.2.0.lw.git7e1940d4-cp312-cp312-linux_x86_64.whl \
  torchvision-0.24.0+rocm7.2.0.gitb919bd0c-cp312-cp312-linux_x86_64.whl \
  torchaudio-2.9.0+rocm7.2.0.gite3c6ee2b-cp312-cp312-linux_x86_64.whl \
  triton-3.5.1+rocm7.2.0.gita272dfa8-cp312-cp312-linux_x86_64.whl

Verify (AMD’s verification commands): (ROCm Documentation)

python3 -c "import torch; print(torch.cuda.is_available())"
python3 -c "import torch; print('device name [0]:', torch.cuda.get_device_name(0))"

You want True and a Radeon device name.


Phase 4 — Install and run ComfyUI (Ubuntu in WSL)

Use ComfyUI’s manual-install steps (venv + clone + requirements + start). (ComfyUI Official Document)

sudo apt update
sudo apt install -y git
source ~/venvs/torch-rocm/bin/activate

cd ~
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI

pip install -r requirements.txt
python main.py

Open ComfyUI from Windows

In many setups you can open:

  • http://localhost:8188

If localhost networking behaves unexpectedly, Microsoft documents the differences between WSL networking modes and when localhost works vs when you may need the WSL IP. (Microsoft Learn)


Phase 5 — Add GGUF support to ComfyUI (required for FLUX.1 GGUF)

Install the GGUF custom node pack:

source ~/venvs/torch-rocm/bin/activate
cd ~/ComfyUI/custom_nodes
git clone https://github.com/city96/ComfyUI-GGUF
pip install --upgrade gguf

Notes from the GGUF project:

  • ComfyUI must be “recent enough” to support the required custom ops.
  • You will use “Unet Loader (GGUF)” and place .gguf files under models/unet. (GitHub)

Restart ComfyUI after installing custom nodes.


Phase 6 — Download the actual FLUX.1 GGUF model file(s)

You have two main choices:

FLUX.1 Schnell (recommended to start)

  • Designed for very low steps (1–4 steps).
  • Released under Apache-2.0 (generally OK for commercial use). (Hugging Face)

FLUX.1 Dev (higher quality, but licensing restrictions)

Where to place the GGUF file

For the common Flux GGUF conversions:

Important gotcha: some GGUF loaders only detect files placed directly in models/unet (not in subfolders). (GitHub)

Which quantization to pick (16GB VRAM guidance)

From the Flux Dev GGUF listing, file sizes scale with quantization level (example sizes shown there for Q4/Q5/Q6/Q8). (Hugging Face)

Practical starting point:

  • Start with Q5 or Q6 (good balance).
  • If you run out of memory, drop to Q4.
  • If you have headroom and want quality, try higher (at higher memory cost).

Phase 7 — Install the other required Flux files (text encoders + VAE)

Flux workflows typically require:

  • clip_l.safetensors (text encoder)
  • t5xxl encoder (FP16 or FP8 variant)
  • VAE file

ComfyUI’s official Flux examples page calls out:

  • clip_l.safetensors and t5xxl_fp16.safetensors in ComfyUI/models/text_encoders/
  • Optional: use an FP8 t5xxl variant to reduce memory
  • VAE goes in ComfyUI/models/vae/ (comfyanonymous.github.io)

Create folders:

mkdir -p ~/ComfyUI/models/text_encoders
mkdir -p ~/ComfyUI/models/vae
mkdir -p ~/ComfyUI/models/unet

Place the files into those folders per the Flux example docs. (comfyanonymous.github.io)


Phase 8 — Load a known-good Flux workflow, then switch it to GGUF

8.1 Load the official workflow

Go to the official Flux examples page and drag one of the example images into ComfyUI to auto-load a working graph. (comfyanonymous.github.io)

8.2 Swap the diffusion model loader to GGUF

The GGUF project’s instructions are explicit:

  • Use your normal Flux workflow
  • Replace the standard diffusion model loader node with “Unet Loader (GGUF)” (GitHub)

In practice:

  1. Find the node that loads the Flux diffusion model (often “Load Diffusion Model”).
  2. Delete/disable it.
  3. Add Unet Loader (GGUF) (from the GGUF node pack).
  4. Select your flux1-*.gguf from models/unet.

Keep the rest (text encoders + VAE nodes) pointing to the files you installed.


Phase 9 — First run settings (stable “it works” baseline)

If you start with Flux Schnell

Flux Schnell is intended for very low step counts. (ComfyUI Official Document)

Recommended first test:

  • Resolution: 1024×1024
  • Steps: 4
  • Prompt: simple, concrete, one subject + one environment

Notes about CFG / negative prompts

  • Some official Flux checkpoint workflows call out special settings (e.g., setting CFG to a specific value for certain checkpoints). (comfyanonymous.github.io)
  • Flux workflows often rely less on “negative prompts” than older SD-style prompting; if a workflow doesn’t include negatives, don’t force them.

Troubleshooting (most common failures)

1) ComfyUI runs but is extremely slow (CPU fallback)

Verify PyTorch sees the GPU in WSL (AMD’s verification commands). (ROCm Documentation)

2) GGUF model doesn’t appear in the dropdown

  • Confirm the file is in ComfyUI/models/unet
  • Avoid subfolders under models/unet (GitHub)

3) “Missing node types” for GGUF loader

Usually means the custom node didn’t load:

  • confirm the repo is in ComfyUI/custom_nodes
  • confirm pip install --upgrade gguf
  • restart ComfyUI (GitHub)

4) Workflow complains about missing encoders/VAE

That’s expected if only the GGUF UNet was installed. Flux workflows require separate encoders and VAE. (Hugging Face)


What to use first (recommended)

  • Start with FLUX.1 Schnell GGUF to validate the whole pipeline end-to-end (faster, fewer steps, generally easier). (Hugging Face)
  • Once stable, try FLUX.1 Dev GGUF (keeping licensing constraints in mind). (ComfyUI Official Document)

Best “authoritative” guides (recommended reading order)

1) AMD ROCm-on-WSL2 + ComfyUI (Windows 10 + AMD-specific)

  • AMD blog: “Running ComfyUI in Windows with ROCm on WSL” — end-to-end setup: drivers, WSL environment, PyTorch, ComfyUI. This is the most directly “your environment” aligned. (ROCm Blog)
  • AMD docs: Install Radeon software for WSL with ROCm — the canonical ROCm-on-WSL install command and verification flow. (ROCm Documentation)
  • AMD docs: Install PyTorch for ROCm (WSL) — official wheel-based PyTorch install steps for ROCm-on-WSL. (ROCm Documentation)
  • ComfyUI issue thread: WSL2 with AMD ROCm guide — practical community-maintained steps + pitfalls, useful when the “happy path” breaks. (GitHub)

2) Official ComfyUI FLUX.1 workflow (explains the model pieces you must have)

  • ComfyUI docs: “Flux.1 Text-to-Image Workflow Example” — explains Flux workflows and the required components (dual text encoders + VAE) at a workflow level. (ComfyUI Official Document)
  • ComfyUI official Flux examples page — downloadable/drag-and-drop workflow images (a reliable way to avoid wiring mistakes). (comfyanonymous.github.io)
  • ComfyUI discussion pointing to the official Flux text encoders repo — quick reference for “where do I get the encoders” when you hit missing-file errors. (GitHub)
  • ComfyUI built-in node doc: CLIPTextEncodeFlux — helps you understand what the Flux dual-encoder node is doing (useful when debugging). (ComfyUI Official Document)

3) GGUF-specific (this is the core for “FLUX.1 (gguf)”)

  • ComfyUI-GGUF (city96) GitHub README — the primary reference for enabling GGUF in ComfyUI, including the loader node and the exact folder (models/unet). (GitHub)
  • FLUX.1-dev-gguf model card (city96) — model placement notes + quantization sizes (handy for choosing Q5/Q6/Q8). (Hugging Face)
  • FLUX.1-schnell-gguf model card (city96) — same idea, but for Schnell; also clearly states it’s meant to be used with ComfyUI-GGUF. (Hugging Face)
  • ComfyUI-GGUF issue about subfolders not being read — useful “gotcha”: put .gguf directly in models/unet, not in a subdirectory. (GitHub)

Good community walkthroughs (optional, but often easier to follow)

  • ComfyUI-Wiki Flux.1 Dev guide — Windows-oriented explanation + workflow overview; not official, but often readable for beginners. (comfyui-wiki.com)
  • YouTube: “Flux GGUF and Custom Nodes” (ComfyUI Tutorial) — visual step-by-step for installing custom nodes and using Flux GGUF. (youtube.com)
  • Medium: run Flux GGUF with LoRA in ComfyUI — practical “from zero to working graph” structure; note it’s third-party. (Medium)

(There are also Japanese-language walkthroughs if that’s helpful; examples exist on Qiita/note.com, but treat them as secondary references compared to the docs above.) (Qiita)

1 Like

That is alot to read. I appreciate the help, really. But if i can be honest for a moment?
I would rather either do this directly on windows without the linux subsystem, or do it on linux direclty.
I got cachyos (archbased) and windows 10. Using a vm is not an option since i cant use my gpu there.

Flux schnel sounds good to me, if i cant get it running somehow.
I will read your full reply later on, its alot to read and i must do some things!
But i thank you!

1 Like

This,

AMD ROCm-on-WSL2 + ComfyUI (Windows 10 + AMD-specific)

  • AMD blog: “Running ComfyUI in Windows with ROCm on WSL” — end-to-end setup: drivers, WSL environment, PyTorch, ComfyUI. This is the most directly “your environment” aligned. (ROCm Blog)

sounds great. I should just try using WSL. Im surprised this all is so much work, given how popular ai appears to be how are there no scripts or easy single click setups?

1 Like

Getting error 0x8000ffff when i run the command to install ubuntu with wsl.
I think its dependent on windows store and update which i both removed entirely.

Do you have a guide/instructions like this one you shared Running ComfyUI in Windows with ROCm on WSL — ROCm Blogs but instead of windows i would try it on linux directly, on cachyos arch based

1 Like

Im surprised this all is so much work, given how popular ai appears to be how are there no scripts or easy single click setups?

Yeah. Currently, AI generation-related projects (especially OSS) change so rapidly that the situation can shift within days. Because of this, I think Python (with only the core parts written in high-speed C++, C, or Rust) is often used, prioritizing maintainability and multi-platform support over speed and ease of use. It’s still largely at the developer-focused stage.

However, one-click solutions (though currently for nVidia mainly) are also starting to emerge.

For similar reasons, Linux often offers more opportunities to utilize newer AI backends compared to Windows, Mac, or mobile OSes. If you don’t have an aversion to Linux, using it whenever possible tends to be more cost-effective.


High-quality guides worth using alongside this setup

  • ROCm official support scope (important for Arch/CachyOS expectations) (ROCm Documentation)
  • ComfyUI manual install basics (venv/conda, dependencies, start command) (ComfyUI)
  • ComfyUI official Flux.1 workflow + exact file placement for encoders/VAE (ComfyUI)
  • ComfyUI-GGUF install + “GGUF Unet loader”, where GGUF files go (GitHub)
  • Flux.1 GGUF model cards (file placement + license info) (Hugging Face)
  • ComfyUI AMD/ROCm PyTorch install commands + AMD-specific env vars (GitHub)

First: what you’re trying to do (and what “ROCm on Arch” really means)

  • ComfyUI is a local browser UI that runs a Python backend.
  • FLUX.1 (GGUF) is a quantized format that ComfyUI loads via a custom node (ComfyUI-GGUF). (GitHub)
  • AMD acceleration on Linux is generally best via ROCm / HIP. However, AMD’s official ROCm support for many consumer Radeon cards is tied to specific distros (often Ubuntu/RHEL). CachyOS (Arch-based) can still work, but you should expect occasional friction. (ROCm Documentation)

Given your situation (WSL blocked; GPU passthrough VM not possible), installing on CachyOS directly is reasonable, with a clear fallback plan if ROCm/PyTorch behaves badly on Arch.


Model choice for your hardware (16GB VRAM)

You said “high-end models” and you want FLUX.1:

  • FLUX.1-schnell (Apache-2.0): easiest legally + fast (commonly ~4 steps) (ComfyUI)
  • FLUX.1-dev (non-commercial license): higher quality but has non-commercial restrictions (ComfyUI)

For 16GB VRAM, start with:

  • schnell GGUF at Q4_K_S or Q5_K_S (leaves VRAM headroom). The model cards list sizes by quant level (Q4 ≈ ~6–7GB, Q5 ≈ ~8–9GB, Q8 ≈ ~12.7GB). (Hugging Face)
  • For the T5 text encoder, use FP8 instead of FP16 (ComfyUI Flux tutorial explicitly recommends FP16 only when VRAM > 32GB). (ComfyUI)

Part A — CachyOS setup (native) for ComfyUI + FLUX.1 GGUF

A1) Install base tools

Open a terminal and run:

sudo pacman -Syu
sudo pacman -S --needed git python python-pip

If python -m venv fails on your setup, install:

sudo pacman -S --needed python-virtualenv

A2) Create a clean ComfyUI install

Pick a folder (example: ~/ai) and install:

mkdir -p ~/ai
cd ~/ai
git clone https://github.com/Comfy-Org/ComfyUI.git
cd ComfyUI

python -m venv venv
source venv/bin/activate

pip install -U pip setuptools wheel

A3) Install PyTorch for AMD (ROCm/HIP)

You have two practical choices on CachyOS:

Option 1 (most common): install PyTorch ROCm wheels via pip

ComfyUI’s README provides ROCm pip commands (stable and nightly). (GitHub)

Stable ROCm build:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm7.1

Nightly ROCm build (if stable is slow/buggy):

pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm7.2

If you’re on RX 7000 (RDNA3) and the above doesn’t detect your GPU, ComfyUI also documents AMD nightlies by GPU target (RDNA3 / gfx110X). (GitHub)

pip install --pre torch torchvision torchaudio --index-url https://rocm.nightlies.amd.com/v2/gfx110X-all/

Option 2 (Arch-packaged PyTorch ROCm)

Arch provides a ROCm-enabled PyTorch build (python-pytorch-rocm). (Arch Linux)
This can be useful if pip wheels are troublesome on your specific CachyOS stack.

deactivate  # leave venv
sudo pacman -S --needed python-pytorch-rocm

If you choose Option 2, recreate your venv using system site packages so it can “see” the system torch:

cd ~/ai/ComfyUI
rm -rf venv
python -m venv venv --system-site-packages
source venv/bin/activate
pip install -U pip setuptools wheel

A4) Verify PyTorch sees the GPU

With the venv activated:

python -c "import torch; print('torch:', torch.__version__); print('hip:', torch.version.hip); print('cuda_is_available:', torch.cuda.is_available()); print('device:', torch.cuda.get_device_name(0) if torch.cuda.is_available() else None)"

Notes:

  • On ROCm, PyTorch still uses torch.cuda.* APIs, and torch.cuda.is_available() should become True when HIP is working. (PyTorch)

If cuda_is_available is False, skip ahead to Troubleshooting.

A5) Install ComfyUI Python dependencies

Still inside ~/ai/ComfyUI with venv active:

pip install -r requirements.txt

This matches ComfyUI’s manual install flow (install GPU deps first, then requirements.txt, then run). (ComfyUI)

A6) First launch

python main.py

Open the URL it prints (usually http://127.0.0.1:8188).


Part B — Add FLUX.1 GGUF support (ComfyUI-GGUF)

B1) Install the ComfyUI-GGUF custom node

From the ComfyUI folder with venv active:

cd ~/ai/ComfyUI
cd custom_nodes
git clone https://github.com/city96/ComfyUI-GGUF
cd ..
pip install --upgrade gguf

That is the official install method described in the repo. (GitHub)

B2) Confirm the GGUF loader exists

Restart ComfyUI (Ctrl+C, then python main.py again).
In the UI’s node search, you should find “Unet Loader (GGUF)” under the bootleg category. (GitHub)


Part C — Download and place the actual models (what goes in which folder)

C1) Pick your Flux GGUF model

  • FLUX.1-schnell GGUF (recommended first): Apache-2.0 (Hugging Face)
  • FLUX.1-dev GGUF: non-commercial license (Hugging Face)

Both model cards state:

  • They are used with ComfyUI-GGUF
  • Put the GGUF file in ComfyUI/models/unet (Hugging Face)

So:

mkdir -p ~/ai/ComfyUI/models/unet

Move your downloaded *.gguf there.

C2) Get the required text encoders + VAE (safetensors)

Even when the UNet is GGUF, you still typically use the same Flux encoders + VAE layout described in the official Flux tutorial. (ComfyUI)

For FLUX.1-schnell the tutorial lists:

  • clip_l.safetensors
  • t5xxl_fp8_e4m3fn.safetensors (FP8 recommended for lower VRAM)
  • ae.safetensors
  • (and flux1-schnell.safetensors as the diffusion model — but you are replacing that with GGUF) (ComfyUI)

Create folders:

mkdir -p ~/ai/ComfyUI/models/text_encoders
mkdir -p ~/ai/ComfyUI/models/vae

Place:

  • clip_l.safetensors and t5xxl_fp8_e4m3fn.safetensorsmodels/text_encoders/
  • ae.safetensorsmodels/vae/

(If you later try dev, the same applies, except FP16 T5 is only recommended when VRAM > 32GB. (ComfyUI))


Part D — Run an actual Flux workflow (and swap in the GGUF loader)

D1) Load the official Flux Schnell workflow

The Flux tutorial provides workflow images you can drag into ComfyUI, or open via the menu (Workflows -> Open). (ComfyUI)

Do that for Flux.1 Schnell.

D2) Replace the “Load Diffusion Model” node with “Unet Loader (GGUF)”

In the loaded graph:

  1. Delete (or disconnect) the node that loads flux1-schnell.safetensors
  2. Add Unet Loader (GGUF) (bootleg category) (GitHub)
  3. Select your FLUX.1-schnell-*.gguf file (it must be in models/unet) (GitHub)
  4. Connect the GGUF loader’s output to wherever the original diffusion model output went.

Restart ComfyUI if the GGUF dropdown is empty; the loader only lists files it can see under the expected folder. (GitHub)

D3) Make sure encoders and VAE are loaded

The Flux tutorial tells you exactly what should be loaded in:

  • DualCLIPLoader (T5 + CLIP-L)
  • Load VAE (ae.safetensors) (ComfyUI)

Flux generally doesn’t require negative prompts in the basic workflow. (ComfyUI)


Recommended “known-good” starting settings (16GB VRAM)

Use the workflow defaults first. If you need a baseline:

  • Model: FLUX.1-schnell GGUF (Q4_K_S or Q5_K_S) (Hugging Face)
  • Text encoder: t5xxl_fp8_e4m3fn.safetensors (ComfyUI)
  • Steps: schnell commonly uses ~4 steps (workflow will set this) (ComfyUI)
  • Resolution: start 1024×1024; if OOM, drop to 768×768 or 832×832.

Troubleshooting (the common failure points on CachyOS)

1) torch.cuda.is_available() is False

Most common causes:

  • Installed the CPU torch by accident (wrong pip command / missing ROCm index URL).
  • GPU not visible to ROCm/HIP on your stack.

Fix:

  • Reinstall torch using the ComfyUI README ROCm command. (GitHub)
  • If you have RDNA2/older and ROCm is finicky, ComfyUI documents HSA_OVERRIDE_GFX_VERSION=... overrides (use only if needed). (GitHub)

2) “Unet Loader (GGUF)” dropdown shows nothing

  • Confirm the GGUF file is in ComfyUI/models/unet (GitHub)
  • Restart ComfyUI
  • Ensure ComfyUI is updated enough to support custom ops for UNet-only loading (ComfyUI-GGUF explicitly warns about needing a recent version). (GitHub)

3) It runs but is slow on AMD

ComfyUI documents an ROCm performance knob:

TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 python main.py --use-pytorch-cross-attention

(GitHub)

4) You keep hitting weird ROCm breakage on CachyOS

This is where the “official support” reality matters: AMD’s ROCm docs list supported OSes and (for many Radeon GPUs) call out Ubuntu/RHEL-only support. (ROCm Documentation)

If that’s your case, your most reliable non-WSL solution is: install Ubuntu 22.04/24.04 on bare metal (dual boot) and repeat the same ComfyUI steps there.


Fallback plan (still on CachyOS): run ComfyUI inside a ROCm Docker container (not a VM)

If your host Arch userspace keeps fighting you, a container can be more reproducible. A CachyOS thread shows the ROCm device mapping pattern (notably /dev/kfd and /dev/dri) for AMD GPU containers. (CachyOS Forum)

High-level idea:

  1. Install/enable Docker on CachyOS
  2. Run a ROCm-enabled image with the device mounts
  3. Install/run ComfyUI inside the container and expose port 8188

(If you want, you can follow that exact device-mapping structure; it’s the key part for AMD GPU access in containers.) (CachyOS Forum)


About the WSL error (0x8000ffff) in one sentence

If you removed Windows Store/Windows Update components, WSL distro installs often fail because the normal distro delivery path relies on them; your current best route is native Linux (CachyOS or Ubuntu dual boot) for reliable AMD acceleration.