Open-Source Text-to-Image Models: Evaluation using Metrics and Human Perception

被引:0
|
作者
Yamac, Aylin [1 ]
Genc, Dilan [1 ]
Zaman, Esra [1 ]
Gerschner, Felix [1 ]
Klaiber, Marco [1 ]
Theissler, Andreas [1 ]
机构
[1] Aalen Univ Appl Sci, Aalen, Germany
来源
2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024 | 2024年
关键词
text-to-image; open-source; weaknesses;
D O I
10.1109/COMPSAC61105.2024.00261
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-image models, which aim to convert text input into images, have gained popularity partly due to their flexibility and user-friendliness. However, there are still weaknesses in the generation of images intended to display emotions, visual text, multiple objects, relative positioning, and attribute binding. This study analyzes the weaknesses of three open-source models: Stable Diffusion v2-1, Openjourney, and Dreamlike Photoreal 2.0. The models are compared based on scores for quality, alignment, and aesthetics. The evaluation is based on (a) the metrics ClipScore, Frechet Inception Distance (FID), and Large-scale Artificial Intelligence Open Network (LAION) and (b) human perception obtained in user surveys. The evaluation revealed that all models show predominantly unsatisfactory performance, and the identified weaknesses were confirmed.
引用
收藏
页码:1659 / 1664
页数:6
相关论文
共 50 条
  • [1] Image Diversity Evaluation Metrics Correlated with Human Subjectivity and Prediction of Image Diversity in Text-to-image Synthesis
    Okamoto, Natsuo
    Shinagawa, Seitaro
    Nakamura, Satoshi
    Transactions of the Japanese Society for Artificial Intelligence, 2024, 39 (06)
  • [2] Debiasing Text-to-Image Diffusion Models
    He, Ruifei
    Xue, Chuhui
    Tan, Haoru
    Zhang, Wenqing
    Yu, Yingchen
    Bai, Song
    Qi, Xiaojuan
    PROCEEDINGS OF THE 1ST ACM MULTIMEDIA WORKSHOP ON MULTI-MODAL MISINFORMATION GOVERNANCE IN THE ERA OF FOUNDATION MODELS, MIS 2024, 2024, : 29 - 36
  • [3] A Study on Generating Webtoons Using Multilingual Text-to-Image Models
    Yu, Kyungho
    Kim, Hyoungju
    Kim, Jeongin
    Chun, Chanjun
    Kim, Pankoo
    APPLIED SCIENCES-BASEL, 2023, 13 (12):
  • [4] Open-Source MQTT Evaluation
    Bender, Melvin
    Kirdan, Erkin
    Pahl, Marc-Oliver
    Carle, Georg
    2021 IEEE 18TH ANNUAL CONSUMER COMMUNICATIONS & NETWORKING CONFERENCE (CCNC), 2021,
  • [5] Archetypes of open-source business models
    Estelle Duparc
    Frederik Möller
    Ilka Jussen
    Maleen Stachon
    Sükran Algac
    Boris Otto
    Electronic Markets, 2022, 32 : 727 - 745
  • [6] Example-Based Conditioning for Text-to-Image Generative Models
    Takada, Atsushi
    Kawabe, Wataru
    Sugano, Yusuke
    IEEE ACCESS, 2024, 12 : 162191 - 162203
  • [7] Archetypes of open-source business models
    Duparc, Estelle
    Moeller, Frederik
    Jussen, Ilka
    Stachon, Maleen
    Algac, Sukran
    Otto, Boris
    ELECTRONIC MARKETS, 2022, 32 (02) : 727 - 745
  • [8] Exposing fake images generated by text-to-image diffusion models
    Xu, Qiang
    Wang, Hao
    Meng, Laijin
    Mi, Zhongjie
    Yuan, Jianye
    Yan, Hong
    PATTERN RECOGNITION LETTERS, 2023, 176 : 76 - 82
  • [9] Design Guidelines for Prompt Engineering Text-to-Image Generative Models
    Liu, Vivian
    Chilton, Lydia B.
    PROCEEDINGS OF THE 2022 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI' 22), 2022,
  • [10] Exposing fake images generated by text-to-image diffusion models
    Xu, Qiang
    Wang, Hao
    Meng, Laijin
    Mi, Zhongjie
    Yuan, Jianye
    Yan, Hong
    PATTERN RECOGNITION LETTERS, 2023, 176 : 76 - 82