This paper presents IMAGGarment-1, a fine-grained garment generation (FGG) framework that enables high-fidelity garment synthesis with precise control over silhouette, color, and logo placement. Unlike existing methods that are limited to single-condition inputs, IMAGGarment-1 addresses the challenges of multi-conditional controllability in personalized fashion design and digital apparel applications. Specifically, IMAGGarment-1 employs a two-stage training strategy to separately model global appearance and local details, while enabling unified and controllable generation through end-to-end inference. In the first stage, we propose a global appearance model that jointly encodes silhouette and color using a mixed attention module and a color adapter. In the second stage, we present a local enhancement model with an adaptive appearance-aware module to inject user-defined logos and spatial constraints, enabling accurate placement and visual consistency. To support this task, we release GarmentBench, a large-scale dataset comprising over 180K garment samples paired with multi-level design conditions, including sketches, color references, logo placements, and textual prompts. Extensive experiments demonstrate that our method outperforms existing baselines, achieving superior structural stability, color fidelity, and local controllability performance. The code and model are available at https://github.com/muzishen/IMAGGarment-1.
Our framework comprises two components: a global appearance model (stage I) and a local enhancement model (stage II), which explicitly disentangle and jointly control the global appearance and local details under multi-conditional guidance, enabling accurate synthesis of garment silhouette, color, and logo placement. The global appearance model first generates a latent of coarse garment image conditioned on the textual prompt, garment silhouette, and color palette. Subsequently, the local enhancement model refines this latent representation by integrating user-defined logo and spatial constraint, producing the final high-fidelity garment image with fine-grained controllability.
GarmentBench is the first publicly available dataset that includes textual descriptions, sketches, colors, logos and locations. Specifically, GarmentBench includes 189,966 garments.
The dataset includes cloth(in folder'./cloth'),sketch(in folder'./sketch'),logo(in folder'./logo'),mask(in folder'./mask'),color(in folder'./color'),test data(in folder'./test_data'),sketch annotations(in file'./sketch_pair.json'),logo annotations(in file'./logo_pair.json') and color annotations(in file'./.color_pair.json').
Each annotation adheres to the subsequent Parquet format specifications, including column names and corresponding content examples:
{
"cloth": "cloth_path",
"sketch": "sketch_path",
"color" : "color_path",
"logo": "logo_path",
"mask": "mask_path",
"caption": "caption"
}
@article{shen2025IMAGGarment-1,
title={IMAGGarment-1: Fine-Grained Garment Generation for Controllable Fashion Design},
author={Shen, Fei and Yu, Jian and Wang, Cong and Jiang, Xin and Du, Xiaoyu and Tang, Jinhui},
booktitle={arXiv preprint arXiv:2504.13176},
year={2025}
}