Open-Source Text-to-Image Models: Evaluation using Metrics and Human Perception

被引:0
作者
Yamac, Aylin [1 ]
Genc, Dilan [1 ]
Zaman, Esra [1 ]
Gerschner, Felix [1 ]
Klaiber, Marco [1 ]
Theissler, Andreas [1 ]
机构
[1] Aalen Univ Appl Sci, Aalen, Germany
来源
2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024 | 2024年
关键词
text-to-image; open-source; weaknesses;
D O I
10.1109/COMPSAC61105.2024.00261
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-image models, which aim to convert text input into images, have gained popularity partly due to their flexibility and user-friendliness. However, there are still weaknesses in the generation of images intended to display emotions, visual text, multiple objects, relative positioning, and attribute binding. This study analyzes the weaknesses of three open-source models: Stable Diffusion v2-1, Openjourney, and Dreamlike Photoreal 2.0. The models are compared based on scores for quality, alignment, and aesthetics. The evaluation is based on (a) the metrics ClipScore, Frechet Inception Distance (FID), and Large-scale Artificial Intelligence Open Network (LAION) and (b) human perception obtained in user surveys. The evaluation revealed that all models show predominantly unsatisfactory performance, and the identified weaknesses were confirmed.
引用
收藏
页码:1659 / 1664
页数:6
相关论文
共 50 条
  • [31] Parallelization and performance evaluation of open-source HEVC codecs
    Garcia-Lucas, David
    Cebrian-Marquez, Gabriel
    Cuenca, Pedro
    JOURNAL OF SUPERCOMPUTING, 2017, 73 (01) : 495 - 513
  • [32] Parallelization and performance evaluation of open-source HEVC codecs
    David García-Lucas
    Gabriel Cebrián-Márquez
    Pedro Cuenca
    The Journal of Supercomputing, 2017, 73 : 495 - 513
  • [33] ICY: A NEW OPEN-SOURCE COMMUNITY IMAGE PROCESSING SOFTWARE
    de Chaumont, Fabrice
    Dallongeville, Stephane
    Olivo-Marin, Jean-Christophe
    2011 8TH IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING: FROM NANO TO MACRO, 2011, : 234 - 237
  • [34] Performance Evaluation of Open-Source Serverless Platforms for Kubernetes
    Decker, Jonathan
    Kasprzak, Piotr
    Kunkel, Julian Martin
    ALGORITHMS, 2022, 15 (07)
  • [35] SWF-GAN: A Text-to-Image model based on sentence-word fusion perception
    Liu, Chun
    Hu, Jingsong
    Lin, Hong
    COMPUTERS & GRAPHICS-UK, 2023, 115 : 500 - 510
  • [36] Stable rivers: A case study in the application of text-to-image generative models for Earth sciences
    Kupferschmidt, C.
    Binns, A. D.
    Kupferschmidt, K. L.
    Taylor, G. W.
    EARTH SURFACE PROCESSES AND LANDFORMS, 2024, 49 (13) : 4213 - 4232
  • [37] Inv-ReVersion: Enhanced Relation Inversion Based on Text-to-Image Diffusion Models
    Zhang, Guangzi
    Qian, Yulin
    Deng, Juntao
    Cai, Xingquan
    APPLIED SCIENCES-BASEL, 2024, 14 (08):
  • [38] Clever little tricks: A socio-technical history of text-to-image generative models
    Steinfeld, Kyle
    INTERNATIONAL JOURNAL OF ARCHITECTURAL COMPUTING, 2023, 21 (02) : 211 - 241
  • [39] Open-source approaches for location cover models: capabilities and efficiency
    Chen, Huanfa
    Murray, Alan T.
    Jiang, Rui
    JOURNAL OF GEOGRAPHICAL SYSTEMS, 2021, 23 (03) : 361 - 380
  • [40] Open-source approaches for location cover models: capabilities and efficiency
    Huanfa Chen
    Alan T. Murray
    Rui Jiang
    Journal of Geographical Systems, 2021, 23 : 361 - 380