References
[1] Stability AI. Stable diffusion 3. https ://stability.
ai/news/stable-diffusion- 3. Accessed: 2024-
08-16. 7
[2] Martin Arjovsky, Soumith Chintala, and L
´
eon Bottou.
Wasserstein generative adversarial networks. In Proceedings
of the 34th International Conference on Machine Learning,
pages 214–223. PMLR, 2017. 2
[3] Omri Avrahami, Dani Lischinski, and Ohad Fried. Blended
diffusion for text-driven editing of natural images. In Pro-
ceedings of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition (CVPR), pages 18208–18218, 2022.
5
[4] Tim Brooks, Aleksander Holynski, and Alexei A. Efros. In-
structpix2pix: Learning to follow image editing instructions.
In CVPR, 2023. 5
[5] Rewon Child. Very deep {vae}s generalize autoregressive
models and can outperform them on images. In International
Conference on Learning Representations, 2021. 1
[6] Prafulla Dhariwal and Alexander Nichol. Diffusion models
beat gans on image synthesis. In Advances in Neural Infor-
mation Processing Systems, pages 8780–8794. Curran Asso-
ciates, Inc., 2021. 2
[7] Oran Gafni, Adam Polyak, Oron Ashual, Shelly Sheynin,
Devi Parikh, and Yaniv Taigman. Make-a-scene: Scene-
based text-to-image generation with human priors. In Com-
puter Vision–ECCV 2022: 17th European Conference, Tel
Aviv, Israel, October 23–27, 2022, Proceedings,Part XV,
2022. 5
[8] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing
Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and
Yoshua Bengio. Generative adversarial nets. In Advances in
Neural Information Processing Systems. Curran Associates,
Inc., 2014. 2
[9] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing
Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and
Yoshua Bengio. Generative adversarial nets. In Advances in
Neural Information Processing Systems. Curran Associates,
Inc., 2014. 1
[10] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent
Dumoulin, and Aaron C Courville. Improved training of
wasserstein gans. In Advances in Neural Information Pro-
cessing Systems. Curran Associates, Inc., 2017. 2
[11] Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman,
Yael Pritch, and Daniel Cohen-Or. Prompt-to-prompt image
editing with cross attention control. 2022. 4
[12] G. E. Hinton and R. R. Salakhutdinov. Reducing the dimen-
sionality of data with neural networks. Science, 313(5786):
504–507, 2006. 1
[13] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif-
fusion probabilistic models. In Advances in Neural Informa-
tion Processing Systems 33: Annual Conference on Neural
Information Processing Systems 2020, NeurIPS 2020, De-
cember 6-12, 2020, virtual, 2020. 2
[14] Jonathan Ho, Chitwan Saharia, William Chan, David J. Fleet,
Mohammad Norouzi, and Tim Salimans. Cascaded diffusion
models for high fidelity image generation. J. Mach. Learn.
Res., 23(1), 2022. 3
[15] Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-
Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen.
Lora: Low-rank adaptation of large language models. In
The Tenth International Conference on Learning Represen-
tations, ICLR 2022, Virtual Event, April 25-29, 2022, 2022.
4
[16] Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen.
Progressive growing of gans for improved quality, stability,
and variation. In 6th International Conference on Learning
Representations, ICLR 2018, Vancouver, BC, Canada, April
30 - May 3, 2018, Conference Track Proceedings, 2018. 2
[17] Bahjat Kawar, Shiran Zada, Oran Lang, Omer Tov, Hui-
wen Chang, Tali Dekel, Inbar Mosseri, and Michal Irani.
Imagic: Text-based real image editing with diffusion mod-
els. In 2023 IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR), pages 6007–6017, 2023. 5
[18] Gwanghyun Kim, Taesung Kwon, and Jong-Chul Ye. Diffu-
sionclip: Text-guided diffusion models for robust image ma-
nipulation. 2022 IEEE/CVF Conference on Computer Vision
and Pattern Recognition (CVPR), pages 2416–2425, 2021. 5
[19] Diederik P. Kingma and Max Welling. Auto-encoding vari-
ational bayes. In ICLR, 2014. 1
[20] Zhifeng Kong and Wei Ping. On fast sampling of diffusion
probabilistic models. In ICML Workshop on Invertible Neu-
ral Networks, Normalizing Flows, and Explicit Likelihood
Models, 2021. 3
[21] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton.
Imagenet classification with deep convolutional neural net-
works. In Advances in Neural Information Processing Sys-
tems. Curran Associates, Inc., 2012. 1
[22] Black Forest Labs. Black forest labs official website.
https://blackforestlabs.ai/. Accessed: 2024-
08-16. 7
[23] Sungbin Lim, EUN BI YOON, Taehyun Byun, Taewon
Kang, Seungwoo Kim, Kyungjae Lee, and Sungjoon Choi.
Score-based generative modeling through stochastic evolu-
tion equations in hilbert spaces. In Advances in Neural In-
formation Processing Systems, pages 37799–37812. Curran
Associates, Inc., 2023. 2
[24] Lars Mescheder. On the convergence properties of gan train-
ing. 2018. 2
[25] OpenAI. Sora. https: // openai .com / index /
sora/. 7
[26] William Peebles and Saining Xie. Scalable diffusion mod-
els with transformers. In Proceedings of the IEEE/CVF In-
ternational Conference on Computer Vision (ICCV), pages
4195–4205, 2023. 7
[27] Dustin Podell, Zion English, Kyle Lacey, Andreas
Blattmann, Tim Dockhorn, Jonas M
¨
uller, Joe Penna, and
Robin Rombach. SDXL: Improving latent diffusion models
for high-resolution image synthesis. In The Twelfth Interna-
tional Conference on Learning Representations, 2024. 7
[28] Alec Radford, Luke Metz, and Soumith Chintala. Unsuper-
vised representation learning with deep convolutional gener-
ative adversarial networks. In 4th International Conference
8