Intermediate Guide Stable Diffusion

Stable Diffusion Advanced: Model Fine-Tuning and LoRA

Train custom LoRA models to encode your art style, create consistent characters, or specialise for specific aesthetics without needing a degree in ML.

AI Snapshot

✓ Train LoRA models on your images to teach Stable Diffusion your unique style or consistent character without expensive GPU time or ML expertise
✓ Use trained LoRA models with minimal VRAM overhead; combine multiple LoRAs to create complex, highly specific results
✓ Share your trained models on Civitai for community use or keep them private; build repeatable generation pipelines with your visual identity

Why This Matters

Stable Diffusion generates images but defaults to generic output. Fine-tuning through LoRA (Low-Rank Adaptation) training teaches the model your specific style. Instead of fighting the model to produce your aesthetic, the model learns your aesthetic. An artist whose style is distinctive can train a LoRA on 50 of their paintings; thereafter, every generation automatically incorporates their style.

This is transformative for creators with established visual identities. A designer with recognisable aesthetic can generate unlimited on-brand content. A character designer can train a model on their design style and generate infinite character variations in their signature style. Comic book artists, concept artists, illustrators—all benefit from style consistency at scale.

For creative entrepreneurs scaling production, LoRA training is competitive advantage. Teams generating multiple projects monthly can maintain visual consistency across projects by training LoRAs per style. This systematisation of creative output isn't replacing creativity; it's amplifying it by removing tedious consistency work.

How to Do It

Collect 30-100 images representing your style or subject. For style LoRA: 50 images of your artwork or reference images. For character LoRA: 50-100 images of the character in different poses and lighting. Prepare images: resize to 512x512 or 768x768, ensure consistent quality, remove corrupted images. Create captions for each image describing what's in it. Use tools like CLIP Interrogator to auto-caption, then manually refine.

Use Kohya SS (Kohya's GUI for LoRA training) or sd-scripts. Kohya SS is most user-friendly. Download from GitHub, install requirements, run the GUI. You'll need a decent GPU (8GB VRAM minimum; 12GB+ recommended). The training setup takes 30 minutes.

In Kohya SS, configure: training data folder, output folder, epochs (typically 10-100 depending on dataset size), learning rate (0.0001 is standard starting point), batch size (4-8 depending on VRAM). For most users, keep defaults and adjust only epochs and dataset path. LoRA training is forgiving; you can iterate and retrain if results aren't satisfactory.

Click 'Train' in Kohya SS. Training takes 10 minutes to 2 hours depending on dataset size and GPU power. The output is a .safetensors file (your trained LoRA, 50-500MB depending on size). Place this in your LoRA folder in Automatic1111. No additional setup needed.

In Automatic1111, select your base model. In the prompt, reference your LoRA: 'a portrait of a woman wearing a red dress <lora:my_style_lora:0.8> watercolour style, high quality'. The :0.8 is the strength (0-1 scale). Lower values apply the LoRA subtly; higher values apply it strongly. Experiment with strength to find your preference.

Stack LoRAs for complex results: 'portrait <lora:character_lora:0.7> <lora:style_lora:0.6> <lora:lighting_lora:0.5>'. Start with 2-3 LoRAs; too many compete and produce poor results. Adjust strength values until you get desired balance.

Prompt Templates

Use trained LoRA: '{subject/scene} <lora:trained_style:0.8> {additional descriptors}'. Adjust strength (0.5-1.0) to control how strongly the LoRA style applies.

Use character LoRA: '{character_name} <lora:character_lora:0.9> {action/pose/setting}'. Vary actions and settings to generate consistent character in diverse contexts.

Combine LoRAs: '{subject} <lora:style_lora:0.7> <lora:character_lora:0.6> <lora:lighting_lora:0.5>'. Balance strength values; too many strong LoRAs conflict.

Common Mistakes

⚠ Training LoRA on too few images (less than 30) or poor quality images

⚠ Using LoRA strength too high (above 0.9) or stacking too many LoRAs

⚠ Poor image captions during LoRA training

Recommended Tools

Kohya SS (Kohya's Stable Diffusion GUI)

Most user-friendly LoRA training interface. Handles all technical details; requires only data preparation.

CLIP Interrogator

Auto-generates captions for training images. Saves time on manual captioning.

Civitai.com

Community platform to share, discover, and download trained LoRAs.

FAQ

How long does LoRA training take?

Typically 10-60 minutes depending on GPU and dataset size. RTX 3060 trains in 20-30 minutes. Slower GPUs or larger datasets take longer. You can pause training anytime and resume later.

What's the difference between LoRA training and full fine-tuning?

LoRA is lightweight fine-tuning that modifies only a small number of model parameters (hence 'Low-Rank'). Full fine-tuning modifies all parameters, requires 10x more computation, and consumes 10x more disk space. For most uses, LoRA is superior: faster, cheaper, more shareable.

Can I sell images generated with my trained LoRA?

Yes. You own the images you generate. However, check Stable Diffusion's licence and your jurisdiction's copyright law. Most commercial use is allowed, but consult legal guidance for your specific region.

Next Steps

Prepare 50-60 images representing your style or subject. Train one LoRA and test results. Adjust caption quality and retrain if needed. Once satisfied, train a second LoRA for a different style or subject. Experiment combining LoRAs. After 2-3 trained models, you'll have full proficiency.