Stable Diffusion XL
Stability AI · Stable Diffusion family · Released July 2023
Stability AI's July 2023 SDXL — ~3.5B params, 1024x1024 native, CreativeML-OpenRAIL-M license, backbone of the open image generation ecosystem.
Model card
| Developer | Stability AI |
|---|---|
| Family | Stable Diffusion |
| License | CreativeML-OpenRAIL-M |
| Modality | image-gen |
| Parameters (B) | 3.5 |
| Context window | 0 |
| Release date | July 2023 |
| Primary languages | en |
| Hugging Face | stabilityai/stable-diffusion-xl-base-1.0 |
| Ollama | Not on Ollama registry |
SDXL 1.0 ships: Stability AI’s biggest open image model yet
Stability AI just released Stable Diffusion XL 1.0 (SDXL) — a 3.5B-parameter base model paired with a 6.6B-parameter refiner, making it one of the largest open-access image generation models to date. The announcement went up today, with weights on Hugging Face, the code on GitHub, and hosted access through Stability’s Clipdrop and DreamStudio products. The license is CreativeML OpenRAIL++-M — permissive, commercial use allowed with standard responsible-use clauses.
SDXL 1.0 follows a July research preview (SDXL 0.9) that gave the community a four-week head start to test and build tooling. Today the production weights drop — and the claim is a substantial quality step over Stable Diffusion 1.5 (October 2022) and Stable Diffusion 2.1 (December 2022) across color fidelity, lighting, composition, and prompt adherence. Below: what’s in the model, how the image-gen community is receiving it today, and what SDXL means for a sovereign pleb running local image generation on home hardware.
What’s in the weights
SDXL 1.0 is a latent diffusion model — same family as the earlier Stable Diffusion releases, same basic research line. The lineage: Latent Diffusion Models (Rombach et al., CompVis + LMU Munich, 2022) → Stable Diffusion 1.4 / 1.5 (August / October 2022) → Stable Diffusion 2.0 / 2.1 (November / December 2022) → Stable Diffusion XL 0.9 preview (July 2023) → SDXL 1.0 today. The architectural bones are the familiar U-Net denoiser + VAE autoencoder + text encoder stack, scaled up and refined.
Key specs:
- Base model: 3.5B parameter U-Net (up from SD 1.5’s ~860M U-Net — roughly 4x the denoiser size)
- Refiner model: 6.6B parameters in the full ensemble pipeline (base + refiner together), used for a second-stage denoising pass on later timesteps
- Native resolution: 1024×1024 (a major jump from SD 1.5’s 512×512 native — you don’t need highres-fix to get 1MP output anymore)
- Text encoders: two encoders used in tandem — OpenCLIP ViT-bigG/14 + CLIP ViT-L/14 — for richer prompt conditioning than SD 1.5’s single CLIP
- VAE: new 16-channel latent space, higher fidelity than SD 1.x’s VAE
- Conditioning: size and crop conditioning at inference — you can pass target dimensions explicitly, avoiding the squashed-subject failure mode of SD 1.5
- License: CreativeML OpenRAIL++-M, commercial use permitted
The two-stage pipeline (base → refiner) is the architectural novelty worth understanding. The base model handles the majority of the denoising steps; the refiner is specialized for the final steps, adding fine detail and cleaning up artifacts. You can run base-only for faster inference at slightly reduced quality, or the full ensemble for production-grade output. The refiner is optional in the daily workflow, which is good news for plebs watching VRAM.
On-release reception in the image-gen community
SDXL had an unusually long preview period — SDXL 0.9 landed in late June and the community has been generating, comparing, and building LoRAs and tools for a full month before today’s production weights. That means today’s release isn’t landing cold. Early community sentiment, as published on the Stability blog and in the large open Discord / Reddit conversations:
- Color and lighting: widely reported as a clear step up over SD 1.5. More vibrant palettes, better contrast, more believable shadow behavior.
- Prompt adherence: the dual text encoder is doing real work. Complex prompts with multiple subjects, spatial relationships, and style modifiers track better than in SD 1.5. Not yet at the level of the closed Midjourney v5.2 line, but notably closer.
- Native 1MP generation: the biggest workflow change for plebs. No highres-fix pass, no tiled-upscale compositing for 1024×1024 outputs. That simplifies a lot of ComfyUI and Automatic1111 pipelines.
- Hands and anatomy: improved over SD 1.5, but still imperfect. Six-finger hands are rarer, not extinct.
- Text in images: modest improvement — readable short text is sometimes possible, though not reliably.
- Style flexibility: the community has already produced dozens of SDXL LoRAs from the 0.9 preview, demonstrating the model responds well to style fine-tuning.
The comparison plebs will care about most is SDXL vs Midjourney v5.2, the current closed-source leader. Stability’s position, implicit in the release: SDXL is the best open model available and closes a meaningful chunk of the gap to Midjourney, especially at 1024 and above. Whether it closes the whole gap is a matter of taste and workload.
Sovereign pleb implications
SDXL is the most capable open image model plebs can run locally today — but it’s also meaningfully heavier than SD 1.5 was. The VRAM math matters:
- SDXL base at fp16: about 7GB of VRAM for the model, plus ~1–2GB for activations during generation. Comfortable on a 12GB card (3060, 3080, 4070) and very comfortable on a 16GB+ card.
- SDXL base + refiner ensemble: about 13GB of VRAM loaded simultaneously. Fits on a 16GB card with headroom, tight on 12GB. Most ComfyUI workflows swap base and refiner sequentially to lower peak VRAM — that works fine on 12GB cards but slows things down.
- SDXL at fp8 / int8 quantized: community quantization tools are emerging (see the ComfyUI and stable-diffusion-webui forks today); fp8 runs comfortably on 8GB cards at slight quality cost.
- SDXL on an 8GB card: possible with aggressive VAE tiling, sequential model loading, and the –lowvram flag in Auto1111 — usable but slow, 30–60 seconds per 1024×1024 image.
- SDXL on CPU: 5–10 minutes per image. Not a daily workflow.
What this replaces in a pleb image-gen stack: SDXL is the clear upgrade path from SD 1.5 and SD 2.1 if your hardware can handle it. The 1024×1024 native output eliminates a lot of workflow friction — no more highres-fix, no more tiled-diffusion gymnastics for 1MP outputs. For plebs on 8GB cards, SD 1.5 remains a reasonable daily driver for speed; run SDXL for final renders when quality matters.
For plebs using image generation as part of a broader self-hosted AI stack, SDXL on a second GPU (the “second 3090” next to your LLM card) is a natural split: one card for LLM work, one card for image work, total rig cost still under a Midjourney + ChatGPT annual subscription. A used 3090 with 24GB is comfortable overkill for SDXL alone, which leaves room to run ControlNet, LoRAs, and multiple concurrent workflows.
For inference-as-heater rigs, SDXL’s longer per-image compute time (compared to chat-token generation) means GPUs spend more time at sustained load — a better thermal profile than chat, if the goal is space heating. For small operators thinking about hosted image gen, SDXL is the first open model that’s credible as a Midjourney alternative for paying customers — the quality is close enough that workflow and cost become the decisive factors.
How to run it today
Weights for SDXL base and refiner are on Hugging Face at stabilityai/stable-diffusion-xl-base-1.0 and stabilityai/stable-diffusion-xl-refiner-1.0.
The cleanest way for a pleb to run SDXL locally is ComfyUI — the node-based workflow editor supports the two-stage base+refiner pipeline natively, has good VRAM management, and the community has already published dozens of SDXL-optimized workflow graphs. Our ComfyUI for plebs guide walks through installing ComfyUI on Windows, macOS, or Linux, loading the SDXL checkpoints, and wiring up the base+refiner workflow.
AUTOMATIC1111 / stable-diffusion-webui also supports SDXL as of this week — you’ll need the latest dev branch or the 1.5.x release that lands within days. InvokeAI and Fooocus (a new SDXL-first UI) are additional options. For plebs who want hosted access without local install, Clipdrop and DreamStudio both run SDXL behind their respective APIs as of today. If you’re building a multi-model local setup (SDXL for images + a chat model like Mistral 7B on a second GPU), the rig math in our Ollama quickstart covers the chat-side setup.
For troubleshooting — VRAM errors, slow generation, bad output — our self-hosted AI troubleshooting guide has the usual suspects, and the ComfyUI Discord is where the SDXL-specific workflow questions are getting fast answers today.
What comes next
Expect a wave of SDXL fine-tunes and LoRAs on Hugging Face and Civitai over the next two weeks — the 0.9 preview gave trainers a month of runway, so many finished models will land on release-day-plus. ControlNet for SDXL is in active development and will likely reach production within a few weeks. Stability has also hinted at SDXL-Turbo variants (distilled models for 1–4-step generation) as a research direction.
Bigger picture: open image generation just took a large step. SDXL is the first open model that a pleb running it at home can credibly compare to Midjourney for quality, at least in most categories. The combination of permissive license, same-day weights, 1024-native output, and solid community tooling makes local image generation a sovereign workflow for more plebs than it was yesterday. Pull the weights, run ComfyUI, own your pixels. See the Sovereign AI Manifesto for the broader case.
Further reading: The same pleb-grade infrastructure that runs local inference also runs a Bitcoin space heater. Many readers arrive from the mining side — see From S19 to Your First AI Hashcenter for the bridge.
Recommended hardware
Runs on 12 GB VRAM — 3060 Ti / 4060 / M2 territory. Sweet spot for home rigs.
Get it running
-
01
Install Ollama →
Ten-minute local LLM runtime. One binary, zero cloud.
-
02
Give it a web UI →
Open-WebUI turns Ollama into a self-hosted ChatGPT.
-
03
Understand quantization →
GGUF Q4/Q8/FP16 — which weights fit your GPU, explained.
Further reading: the Sovereign AI for Bitcoiners Manifesto for why sovereign inference matters, and From S19 to Your First AI Hashcenter for repurposing your mining rack into a Hashcenter that runs models like this one.
