TRAIN YOUR OWN STABLE DIFFUSION LORA IN 2026: COMPLETE GUIDE

You have 20 photos of a person and want AI-generated portraits in any style — cyberpunk, Renaissance painting, anime. The base model doesn’t know what that face looks like. A LoRA fixes that.

LoRA (Low-Rank Adaptation) trains a small adapter file (~40-150MB) from 15-30 images, letting you generate a specific character in any context without retraining the entire model. Training takes 20-60 minutes on a free Google Colab GPU or less on local hardware. The technique is well-established: the r/StableDiffusion training primer (January 2026) and the no-nonsense character LoRA guide (February 2026) represent the current community consensus on best practices.

This guide covers the full workflow — dataset preparation, tool selection, training parameters, and inference — based on documented community practices and official tool documentation.

This article covers character LoRA training specifically. For a broader overview of all LoRA training tools and techniques, see the comprehensive LoRA training guide.

Who Is This Guide For?

You want to generate AI images of a specific person in different styles and settings. You’ve used Stable Diffusion WebUI or ComfyUI for inference but haven’t trained a model. You may not own a high-end GPU and want to use free cloud options. You need practical, verified steps — not theory.

By the End of This, You’ll Know

  • How to prepare 15-30 training images for optimal character LoRA results
  • Which training tool to pick based on your base model and hardware
  • Exact training parameters for Kohya SS, FluxGym, and ComfyUI workflows
  • How to test and troubleshoot your LoRA using community-validated methods
  • Where to find additional resources for advanced techniques

Dataset Preparation

The quality of your training images determines the quality of your LoRA more than any parameter setting. The ZSky AI LoRA training guide (May 2026) emphasizes this point: “The quality of your training dataset directly determines your LoRA’s performance.”

Image Selection

Quantity: 15-30 images minimum. 20-25 is the community-vetted sweet spot for character LoRAs. The r/StableDiffusion primer with 71+ replies recommends starting at 20 and adding more only if the model fails to generalize. Below 15, the model struggles. Above 30 without proportional diversity, you risk overfitting.

Diversity: Include front-facing, 45-degree profile, and true profile shots. Mix lighting conditions. Include different expressions. Every angle you omit is a gap in what the model can generate. ViewComfy’s training guide recommends reviewing each image and asking: “Would I be happy if the model produced an image like that?”

Quality: Well-lit, sharp, high-resolution images. The subject’s face should occupy 40-60% of the frame. Too small and the model won’t learn features. Too large and generations will crop awkwardly.

Background: Simple or removable backgrounds produce better results. The model should learn the face, not the background.

Processing Pipeline

  1. Crop and center on the face. Square aspect ratio is required. Use ImageMagick for batch processing:

    mogrify -resize 768x768^ -gravity center -extent 768x768 *.jpg
    
  2. Remove backgrounds. This measurably improves training quality. Batch process with rembg:

    rembg p *.jpg output/
    
  3. Caption each image. The caption tells the model what to learn from each image. Create a .txt file per image with your trigger word plus a description. For a character LoRA using the trigger word “sanj”:

    photo of sanj, professional headshot, detailed face, studio lighting
    
  4. Organize into folders. Structure your dataset as expected by your training tool:

    dataset/
    └── train/
        ├── photo_001.jpg
        ├── photo_001.txt
        └── ...
    

Auto-captioning tools like BLIP and WD14 taggers (included in Kohya SS) can generate initial captions, but manual review is essential. Auto-captions frequently omit your trigger word or describe irrelevant background details. The Hugging Face community discussion on perfect LoRA training parameters for human characters stresses that caption quality directly impacts character consistency.

Training Tool Options

Three main tools dominate character LoRA training in 2026. Your choice depends on your base model and hardware.

Google Colab (Free, No GPU Required)

For SD 1.5 and SDXL training without local GPU hardware. The fast-stable-diffusion Colab notebook by TheLastBen handles dependency installation and xFormers setup automatically. Switch runtime to T4 or L4 GPU under Runtime > Change runtime type > GPU. Training takes 20-60 minutes within the free Colab window. Upload your prepared dataset to Google Drive, mount it in the notebook, and point the training tab at your image folder.

The standard tool for LoRA training since 2023. Kohya SS supports SD 1.5, SDXL, and FLUX models with the full range of training parameters. It includes built-in BLIP and WD14 captioning, sample generation during training, and support for LoRA+ and other advanced optimizations.

VRAM requirements by model:

  • SD 1.5: 8GB minimum, 12GB recommended. Batch size 1-2.
  • SDXL: 12GB minimum, 16GB+ recommended.
  • FLUX.1: 16GB minimum, 24GB+ recommended. FluxGym is a more practical option for lower VRAM.

FluxGym (Web UI, FLUX-Focused)

Designed specifically for FLUX LoRA training with explicit low-VRAM support. FluxGym wraps Kohya’s SD3 training scripts in a web UI, supporting 12GB/16GB/20GB VRAM configurations. Runs via Docker:

git clone https://github.com/cocktailpeanut/fluxgym
cd fluxgym
git clone -b sd3 https://github.com/kohya-ss/sd-scripts
docker compose up -d --build

Access the UI at http://localhost:7860. FluxGym automatically downloads base models when selected and includes automatic sample generation during training to monitor progress. The Advanced tab exposes all Kohya training options for users who need fine-grained control.

ComfyUI Flux Trainer (Node-Based Workflow)

For users already running ComfyUI, the Flux Trainer nodes provide a node-based training workflow within ComfyUI. This option is documented in the ComfyUI Flux LoRA training tutorial (May 2025). It requires ComfyUI Manager and the Flux Trainer custom nodes. Training runs entirely within the ComfyUI environment, making it seamless for users already familiar with the platform.

Training Parameters

These settings produce reliable results for character LoRAs across all tools. They align with the Hugging Face LoRA training documentation and community-validated parameters from r/StableDiffusion.

Network dimension: 32-64 for most character LoRAs. Higher values (128+) capture more detail but increase overfitting risk. Start at 64. The Hugging Face discussion on human character training notes that lower dimensions generalize better with small datasets.

Network alpha: Half of network dimension (16 for dim 32, 32 for dim 64). Controls regularization of LoRA weights.

Learning rate: 1e-4 to 2e-4. Use cosine with restarts scheduler. 2e-4 is the most commonly reported starting point across community guides.

Training steps: 1500-2500 depending on dataset size. Watch preview images during training. When generated samples stop improving and start looking identical to training images, stop — you’ve passed the optimal point regardless of step count.

Batch size: 1 for 8GB VRAM, 2-4 for 12GB+. Larger batch sizes stabilize training at the cost of memory.

Resolution: Match your base model’s native resolution — 512x512 for SD 1.5, 1024x1024 for SDXL and FLUX.

Clip skip: 2 for SD 1.5, 1 for SDXL and FLUX.

Regularization images: 200-500 generated from the base model using your class prompt (e.g., “photo of a person”). The ZSky AI guide (May 2026) recommends this approach to prevent the model from attributing all image characteristics to your trigger word.

Training Workflow

Step 1 — Prepare your dataset. Follow the image selection and processing steps above. Aim for 20-25 images with face-centered crops and background removal.

Step 2 — Choose your tool based on hardware and target model.

  • No local GPU, want SD 1.5 or SDXL: Google Colab (free T4 GPU)
  • Local GPU with 8-12GB: Kohya SS for SD 1.5 or SDXL
  • Local GPU with 12-20GB, want FLUX: FluxGym (explicitly supports these configurations)
  • Already using ComfyUI: ComfyUI Flux Trainer nodes
  • Want full control over every parameter: Kohya SS command line

Step 3 — Configure and launch training. For Kohya SS command line:

accelerate launch --num_cpu_threads_per_process 8 train_network.py \
  --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
  --train_data_dir="./training_images" \
  --output_dir="./output" \
  --caption_extension=".txt" \
  --max_train_epochs=15 \
  --learning_rate=2e-4 \
  --lr_scheduler="cosine_with_restarts" \
  --network_module="networks.lora" \
  --network_dim=64 \
  --network_alpha=32

For FluxGym and ComfyUI, use the UI controls — the parameters map directly.

Step 4 — Monitor training. Check generated samples every 100-200 steps. Stop when output quality peaks.

Step 5 — Test your LoRA. Place the output .safetensors file in your inference UI’s LoRA folder. Generate test images with your trigger word at varying strengths (0.6-1.0).

Prompting Your Trained LoRA

Structure prompts this way for consistent results:

photo of [trigger word], [style], [setting], [lighting], [camera details] <lora:filename:0.8>

The strength parameter (0.8) controls LoRA influence. 0.6-0.8 works for most character LoRAs. Higher values increase likeness but reduce prompt adherence. Lower values improve prompt following but may lose the character.

Negative prompt:

deformed, distorted, disfigured, poorly drawn, bad anatomy, extra limb, mutation, blurry, out of focus

Troubleshooting Common Issues

Inconsistent likeness across generations. The dataset lacks diversity — too many similar angles or expressions. Add 5-10 images with different poses and lighting. If the dataset is already diverse, increase training steps by 500. The no-nonsense character LoRA guide specifically recommends at least 30% profile shots in your dataset.

Overfitting — outputs look identical to training images. Reduce training steps or network dimension. Add 200-500 regularization images generated from the base model using the class prompt. Decrease LoRA strength during inference to 0.5-0.6. ZSky AI notes: “An overtrained LoRA produces images that look exactly like the training data regardless of the prompt.”

Distorted faces or anatomical errors. Check for inconsistent images in the dataset. One poorly cropped or misaligned image can derail training. Verify all images are consistently centered, properly cropped, and match the base model’s native resolution.

Poor quality or blurry outputs. Use a higher quality base model. SDXL and FLUX produce sharper results than SD 1.5 at the cost of higher VRAM. Verify dataset images are sharp and uncompressed.

What You Can Actually Use Today

  • Google Colab: Free T4 GPU for SD 1.5 and SDXL LoRA training. 20-60 minute training runs within the free tier.
  • Kohya SS: Current release supports SD 1.5, SDXL, FLUX, and LoRA+ optimizations. Download here.
  • FluxGym: Web UI for FLUX LoRA with 12GB/16GB/20GB VRAM support. GitHub repo.
  • ComfyUI Flux Trainer: Node-based training workflow within ComfyUI. Flux Trainer nodes.
  • AUTOMATIC1111 WebUI: Standard inference interface. Place .safetensors in models/Lora/.
  • ComfyUI: Node-based inference with full LoRA support. Better for complex workflows.
  • Civitai: Repository for downloading and sharing LoRAs. Review training parameters used by popular character LoRAs for reference.

For a complete overview of all LoRA training tools, model-specific advice, and advanced techniques, see the comprehensive LoRA training guide. For advanced character LoRA techniques, dataset optimization, and multi-concept training, read Advanced LoRA Self-Portraits.

Need help with AI model training?

I advise engineering teams on ML infrastructure, model training pipelines, and AI deployment strategy. If you’re building production systems around custom models, let’s talk.