Current

FLUX.1 schnell

Black Forest Labs · FLUX family · Released August 2024

Black Forest Labs' August 2024 Apache 2.0 FLUX variant — 12B distilled to 1-4 steps for fast, commercially-open image generation.

Model card

Developer	Black Forest Labs
Family	FLUX
License	Apache-2.0
Modality	image-gen
Parameters (B)	12
Context window	0
Release date	August 2024
Primary languages	en
Hugging Face	black-forest-labs/FLUX.1-schnell
Ollama	Not on Ollama registry

FLUX.1 schnell drops: Apache-2.0, 4 steps, commercially unrestricted

Black Forest Labs just released FLUX.1 schnell — the 4-step distilled sibling of FLUX.1 dev, at the same 12B parameter count, under the Apache 2.0 license. “Schnell” is German for “fast,” and the name is the pitch: generate near-FLUX-quality images in 1 to 4 sampling steps, with no license friction for commercial use. It ships alongside FLUX.1 dev and the closed-weight FLUX.1 pro as the commercial-friendly member of the initial FLUX family, announced today on the Black Forest Labs launch blog.

The reason schnell matters more than its per-image quality might suggest: it’s the first truly production-grade open image model that operators can deploy commercially with zero license asterisks. SDXL’s OpenRAIL++ license is permissive but imposes use-case restrictions. SD 3’s original license required a paid tier for anyone serious. FLUX.1 dev is non-commercial by default. Apache 2.0 is what an actual pleb running an actual business needs — no revenue thresholds, no use-case carve-outs, no negotiation. Weights are on black-forest-labs/FLUX.1-schnell as of today.

What’s in the weights

FLUX.1 schnell is the same base architecture as FLUX.1 dev — a 12B parameter rectified-flow transformer built on the hybrid MMDiT + parallel attention backbone that the Black Forest Labs team developed after leaving Stability AI. The lineage: latent diffusion (CompVis/LMU Munich, 2022) → SDXL (2023) → Stable Diffusion 3 research (same founding team) → FLUX.1 family today. Schnell’s specific contribution is the distillation technique that drops inference step count from 20–50 to 1–4 while preserving most of dev’s quality.

Pleb translation — distillation in image models: A big slow “teacher” model (FLUX.1 pro or dev) generates high-quality images through many denoising steps. Distillation trains a “student” model to mimic the teacher’s output in far fewer steps. The student learns the whole denoising trajectory rather than each step individually, effectively compressing the iterative refinement process into a much shorter sequence. You lose some fine detail and prompt adherence, but generation becomes dramatically faster. Schnell is the student; the teacher is the closed FLUX.1 pro variant.

Schnell’s specific distillation is latent adversarial diffusion distillation (LADD), a technique Stability AI first published for SDXL-Turbo in late 2023. Black Forest Labs applies LADD to the FLUX rectified-flow backbone. The result is a model that hits respectable quality at 4 sampling steps — and usable output at just 1 step for prompt exploration workflows.

Key specs:

12B parameters, shared architecture with FLUX.1 dev
Rectified flow objective, 1–4 sampling steps (vs 20–50 for dev)
MMDiT + parallel attention hybrid transformer backbone
Dual text encoders: T5-XXL + CLIP, same as dev
Native resolution: 1024×1024, flexible aspect ratios via size conditioning
License: Apache 2.0 — fully permissive, commercial use unrestricted

Training data: Black Forest Labs cites “large-scale web image-text data” in the release notes without enumerating sources, consistent with industry practice. That’s a caveat plebs should track but isn’t uniquely FLUX’s problem.

Inference speed — the whole point

Step count is the dominant factor in image-gen latency. FLUX.1 dev at 50 steps produces 1024×1024 output in roughly 30–60 seconds on a single RTX 3090 at FP16. FLUX.1 schnell at 4 steps produces equivalent-resolution output in roughly 2–4 seconds on the same hardware. That’s not a linear 12.5x speedup (the per-step cost is slightly different between distilled and non-distilled models), but it’s close enough that schnell feels like a different product. On a 4090, schnell generation is effectively instant for single images.

The 1-step mode is worth calling out. FLUX.1 schnell can generate usable — not great, but usable — images in a single sampling step. Quality is meaningfully below 4-step, but for prompt exploration (“which variation of this prompt do I want to refine?”) the 1-step latency of well under a second enables workflows that 20-step models simply can’t support. You can iterate prompts faster than most people can type new ones.

Sovereign pleb implications — VRAM and commercial use

VRAM requirements mirror FLUX.1 dev since it’s the same architecture. The math:

FLUX.1 schnell FP16: ~24GB VRAM with text encoders loaded. Tight on a single 24GB card (3090 or 4090) — workable with careful T5 offload between passes.
FLUX.1 schnell FP8: ~12–14GB VRAM. Comfortable on a 16GB 4080, runs on a 12GB card with CPU offload. This is the most common pleb configuration.
FLUX.1 schnell GGUF Q4/Q5: community GGUF quants via ComfyUI-GGUF land in the 6–8GB range for the model weights. Enables 12GB and even 8GB cards with aggressive offload.
FLUX.1 schnell GGUF Q3: usable on 8GB cards for experimentation. Quality degrades noticeably.

See the GGUF quantization guide for the full quality/size trade-off discussion. FP8 is the sweet spot for schnell if your card can hold it — the quality cost versus FP16 is minor and the VRAM savings open the model to a much broader pleb audience.

The Apache 2.0 license is the practical differentiator for commercial work:

Batch product imagery: e-commerce plebs generating thousands of product shots can run schnell without licensing concerns at any scale
SaaS image generation: building a paid tool on schnell is legal and fee-free — no revenue thresholds, no percentage-of-revenue carve-outs
White-label inference: Hashcenter operators running paid image-gen for other plebs can use schnell as the backbone without downstream license friction
Derivative models: fine-tunes, LoRAs, and distilled variants of schnell are Apache-2.0 by inheritance. That’s the foundation for a real open ecosystem rather than a licensed one.

For plebs running inference-as-heater workflows, schnell’s short per-image latency at sustained batch generation means GPUs stay at near-full utilization continuously — which produces more consistent heat output than longer dev generations with idle gaps between batches. It’s a slightly better thermal profile for the space-heater use case. For ASIC-to-AI Hashcenter conversions, schnell is the image-workload backbone in mixed LLM + image stacks — commercial-licensable, fast, and capable enough for most paying plebs’ needs.

Sampler and workflow notes for ComfyUI

Schnell has specific workflow requirements that differ from FLUX.1 dev:

Sampler: euler with simple scheduling is the reference. The distillation was done against this sampler path.
Step count: 4 is the target. 1-step works for exploration. More than 4 doesn’t improve quality — schnell wasn’t trained for it and will often produce worse output at higher step counts.
CFG (guidance scale): schnell is guidance-distilled, which means it bakes the classifier-free-guidance behavior into the model itself. Keep CFG at 1.0. Running schnell at CFG > 1.0 produces artifacts, not better output.
Negative prompts: largely ignored due to CFG=1 — schnell workflows should use positive prompts only and lean on the T5 encoder’s ability to parse what you want rather than relying on what-to-avoid instructions.

Workflow graphs on day one: ComfyUI has a schnell-specific reference workflow, and the ComfyUI team added it to the example graphs within hours of release. Forge and SwarmUI support schnell as of today. The migration from dev to schnell in an existing ComfyUI setup is a matter of swapping the checkpoint node, dropping CFG to 1.0, and setting steps to 4 — everything else carries over.

How to run it today

Weights are on black-forest-labs/FLUX.1-schnell. No license acceptance required — it’s Apache 2.0, you just download. Drop the weights into ComfyUI’s models/checkpoints directory, grab the reference workflow from the ComfyUI examples repo, and you’re generating within minutes.

Our ComfyUI for plebs guide covers installation. For plebs who prefer Forge (the pleb-favored A1111 fork), schnell support is in the current release. Diffusers library support is in version 0.30+. For multimodal pleb stacks that pair schnell with an LLM for prompt crafting, the pleb self-hosted AI guide covers the LLM side and the 10-minute Ollama install guide gets you the text model side in one session. Troubleshooting: the self-hosted AI troubleshooting guide.

What comes next

Black Forest Labs has been explicit that FLUX is the first product line in a longer roadmap. Video generation is on the roadmap, and the founding team’s rectified-flow research lineage means whatever video model comes next will likely apply similar architectural ideas. Expect schnell-style distilled variants of whatever larger models Black Forest Labs releases going forward — the LADD approach generalizes, and the commercial appetite for 4-step open weights is clearly massive given the day-one reception.

Bigger picture: schnell is the first Apache-2.0 image model that can credibly compete with closed-weight offerings on quality. SDXL came close on openness but trailed on quality. Stable Diffusion 3’s initial release came close on quality but stumbled on licensing. FLUX.1 schnell is the first release where plebs don’t have to choose — the model is both permissive and capable. That’s one more layer of decentralization in the image-generation stack, which matters for the same reason it matters in the LLM stack: sovereign operators need tools they can actually deploy without calling a lawyer first. See the Sovereign AI for Bitcoiners Manifesto for the case, FLUX.1 dev and SD 3.5 for the model-comparison context, and Bitcoin space heater for the hardware side of the setup. Pull the weights — Apache 2.0, your weights, your pixels, your business.

Recommended hardware

Runs on 16 GB VRAM — 4070 Ti or M3 Pro. Quantized Q4 fits comfortably.

Buying guide: used RTX 3090 for LLMs (2026) →

Get it running

01 Install Ollama →
Ten-minute local LLM runtime. One binary, zero cloud.
02 Give it a web UI →
Open-WebUI turns Ollama into a self-hosted ChatGPT.
03 Understand quantization →
GGUF Q4/Q8/FP16 — which weights fit your GPU, explained.

Further reading: the Sovereign AI for Bitcoiners Manifesto for why sovereign inference matters, and From S19 to Your First AI Hashcenter for repurposing your mining rack into a Hashcenter that runs models like this one.