Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing (CVPR 2024)
This code was tested with Python 3.8 and Pytorch 1.11, using pre-trained models through Hugging Face Diffusers. Specifically, we implemented our method over Stable Diffusion. It can also be implemented over other stable diffusion models like Realistic-V2, Deliberate, and Anything-V4. When implemented over [Realistic-V2],[Deliberate] and [Anything-V4], you need to update diffusers to 0.21.1. Additional required packages are listed in the requirements file.
conda env create -f environment.yaml
conda activate FPE
edit_fake for Synthesis image editing. edit_real for Real image editing. null_text_w_FPE Real image editing using NULL TEXT INVERSION to reconstucte image. edit_fake using self attention map control in prompt-to-prompt code.
self_replace_steps
: specifies the fraction of steps to replace the self-attention maps.
cd gradio_app
python app.py --model_path "<path to your stable diffusion model>"
@misc{liu2024understanding,
title={Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing},
author={Bingyan Liu and Chengyu Wang and Tingfeng Cao and Kui Jia and Jun Huang},
year={2024},
eprint={2403.03431},
archivePrefix={arXiv},
primaryClass={cs.CV}
}