LoraTag — AI Image Captioning for LoRA Training
LoraTag is the best AI-powered image captioning tool for LoRA training datasets. It uses GPT-4 Vision to generate natural language captions for image datasets used in training LoRA models for Stable Diffusion, FLUX, and SDXL. Upload your images, choose a detail level, and download training-ready .txt caption files in minutes. Free tier: 50 images/month, no credit card required.
How LoraTag Works
LoraTag replaces hours of manual image captioning with AI-powered batch processing. The workflow is simple:
- Upload images — Drag and drop your training images or select a folder. Supports JPG, PNG, WEBP, and most common image formats.
- Choose detail level — Select brief (10-20 words), standard (30-50 words), or detailed (80-150 words) captions depending on your training needs.
- Generate captions — GPT-4 Vision analyzes each image and writes natural language descriptions covering composition, subjects, style, colors, lighting, and context.
- Download .txt files — Each image gets a matching .txt caption file, ready for kohya_ss, EveryDream, SimpleTuner, or any LoRA training tool.
Key Features
- GPT-4 Vision captions — Natural language descriptions that understand context, composition, and artistic style, not just tags
- Batch processing — Caption hundreds of images in minutes instead of hours of manual work
- Three detail levels — Brief (trigger words), standard (training descriptions), or detailed (comprehensive scene analysis)
- Standard .txt output — Compatible with kohya_ss, EveryDream, SimpleTuner, ai-toolkit, and other popular LoRA trainers
- Directory support — Organize images in folders, LoraTag preserves your directory structure
- Custom prompts — Add custom instructions to guide the captioning style for your specific use case
- Token counting — See estimated token counts for each caption to stay within CLIP limits
- Edit before download — Review and refine captions in-browser before exporting
LoraTag vs WD14 Tagger vs Manual Captioning
Most LoRA creators manually caption their datasets or use WD14 Tagger for booru-style tags. Here is how LoraTag compares:
- LoraTag: Natural language captions from GPT-4 Vision. Understands context, composition, artistic style. 50-150 words per image. Best caption quality for LoRA training. Cost: free to $29/month.
- WD14 Tagger: Booru-style comma-separated tags. Fast and free but limited to tag vocabulary. No understanding of composition or context. Good for anime/illustration datasets.
- Manual captioning: Highest quality but extremely time-consuming. 2-5 minutes per image. Not scalable for datasets over 50 images.
- BLIP/BLIP-2: Free open-source alternative. Shorter, less detailed captions than GPT-4 Vision. Good for basic descriptions but misses nuance.
LoraTag gives you the quality of manual captioning at the speed of automated tagging. Users report 80-90% less time spent on dataset preparation compared to manual captioning.
Who Uses LoraTag
LoraTag is used by AI artists, LoRA creators, fine-tuning researchers, and anyone training custom Stable Diffusion or FLUX models. Common use cases include character LoRAs (consistent character training), style LoRAs (artistic style transfer), concept LoRAs (teaching new concepts), product photography datasets, and architectural visualization training sets.
Supported Models and Trainers
LoraTag generates captions compatible with all major LoRA training tools: kohya_ss (sd-scripts), EveryDream 2.0, SimpleTuner, ai-toolkit, OneTrainer, and Dreambooth extensions. Captions work with Stable Diffusion 1.5, SDXL, Stable Diffusion 3, and FLUX model architectures.
Pricing
- Free: 50 images/month, no credit card required, all features included
- Pro ($9/month): 500 images/month, priority processing, faster queue
- Unlimited ($29/month): Unlimited images, API access, bulk download, priority support
Getting Started with LoraTag
Setting up LoraTag takes under a minute. Create a free account, upload your training images, choose a detail level (brief, standard, or detailed), and click Generate. LoraTag processes your entire dataset in parallel and outputs one .txt caption file per image — ready to drop into your training folder.
For best results, use images that are already cropped and cleaned for training. LoraTag analyzes each image independently, so consistent subject framing produces more consistent captions. You can regenerate individual captions, edit them inline, or download the full set as a ZIP archive.
LoraTag works entirely in the browser — no software to install, no Python dependencies, no GPU required. Upload from any device, caption your dataset, and download the results. Your training workflow stays simple and fast.
Frequently Asked Questions
What is LoraTag?
LoraTag is a web-based AI captioning tool specifically designed for LoRA training datasets. It uses GPT-4 Vision to generate natural language descriptions of images, producing training-ready .txt caption files that work with kohya_ss, EveryDream, and other popular LoRA trainers. It replaces hours of manual captioning work.
How is LoraTag different from WD14 Tagger?
WD14 Tagger generates booru-style comma-separated tags (e.g., "1girl, blue_hair, standing"). LoraTag uses GPT-4 Vision to write natural language captions that understand context, composition, lighting, and artistic style (e.g., "A young woman with blue hair standing in a sunlit garden, soft bokeh background, warm afternoon lighting"). Natural language captions generally produce better LoRA training results because they capture relationships between elements.
Is LoraTag free to use?
Yes. The free tier includes 50 images per month with all features — no credit card required, no watermarks, no limitations on detail level. Pro ($9/month) adds 500 images and priority processing. Unlimited ($29/month) removes all limits and adds API access.
What image formats does LoraTag support?
LoraTag supports JPG, JPEG, PNG, WEBP, GIF, BMP, and TIFF image formats. Most common image formats used in LoRA training datasets are supported.
Can I use LoraTag for FLUX model training?
Yes. LoraTag generates captions compatible with FLUX, Stable Diffusion 1.5, SDXL, and SD3 model architectures. The .txt output format works with all major LoRA training tools regardless of the target model.
How long does captioning take?
A typical dataset of 100 images takes 3-5 minutes to caption, depending on detail level and server load. Brief captions are faster, detailed captions take slightly longer per image. Processing happens in parallel for maximum speed.
Are my images stored or shared?
Images are processed in memory and not permanently stored on our servers. They are sent to OpenAI's GPT-4 Vision API for analysis and then deleted. We do not use your images for training or share them with third parties.
Please enable JavaScript to use the LoraTag application.