Comparison of AI text-to-image-tools

How do different AI tools handle non-figurative prompts?

After OpenAI more or less released their tool DALL-E for the public, AI-text-to-picture generation is on everyone’s lips.

I don’t subscribe to the notion of the creative industry’s demise at this point. But: Being a creative with a technical background, I definitely want to get to grips with the new possibilities that these tools offer. However, what I’m particularly interested in is quite specific:

  • How comparable are the results of popular AI tools when fed with the exact same prompts?
  • How do the systems cope with prompts that don’t refer to something representational/figurative? In particular, I find this question quite interesting: the examples on the vendors’ websites mostly display images that were created via prompts in the style of “A clown made of cheese is sitting on a baby elephant in space.” As absurd and grotesque as these inputs are, however, it’s clear that they depict something tangible/imaginable. But how do the tools handle requests that reference abstract concepts that can only be captured by artistic means?

Having just worked as an artist on an exhibition about the Heavenly Jerusalem, I pose the following prompts to the individual AI tools:

Prompt 1: uptopian heavenly paradise atmospheric mythos
Prompt 2: imagined ideal space as utopian environment
Prompt 3: afterlife world environment ideal space 4k cinematic dreamy atmospheric wallpaper

The following tools were used for comparison:

ToolLicensePrice
OpenAI DALL-E 2commercialapprox. $0,13/generation
Midjourneycommercialunclear (in beta phase)
Stable DiffusionOpenSource/commercialfree

While the first two tools run on the servers of the providers and any requests are made via a web interface (DALL-E 2) or via Discord (Midjourney), Stable Diffusion actually uses the computing power of the user’s PC. This requires the strongest possible graphics card and a lot of VRAM. For the results presented here, a nVidia RTX3080ti was used, whose resources were also fully utilized. Although this hardware seems fairly potent it ran out of memory once larger images than 512 x 512 pixels were to be created. The biggest advantage here is the independance from any commercial provider and that it is (with the exception of electrical power) free, while Midjourney and OpenAI charge you a small fee for each creation. 

Results from prompt 1

uptopian heavenly paradise atmospheric mythos

DALL-E 2 (click to enlarge)

Midjourney (click to enlarge)

Stable Diffusion (click to enlarge)

Results from prompt 2

imagined ideal space as utopian environment

DALL-E 2 (click to enlarge)

Midjourney (click to enlarge)

Stable Diffusion (click to enlarge)

Results from prompt 3

afterlife world environment ideal space 4k cinematic dreamy atmospheric wallpaper

DALL-E 2 (click to enlarge)

Midjourney (click to enlarge)

Stable Diffusion (click to enlarge)

Please note: I am not affiliated in any way with the developers/distributors of the presented tools. These images are NOT published under a Creative Commons license. All systems are under heavy development, so outputs my change significantly in the future. The images displayed here were created on September 7th and September 8th 2022.


Useful links & tools


last update of this post: 05-10-2022