The Art of Gratitude: A Moment of AI Inspiration

Hello AI creatives,

My institution hosts a Giving Day once a year. Each year, I am surprised by how inspired I get. Without fail, it manages to get my creative juices flowing.

For three years, I created content for and produced a livestream. One year, I created David Lipscomb (our founder) bobblehead dad jokes.

The next year, trying to outdo myself, I did David Lipscomb and Lou the Bison (our mascot) bobblehead knock-knock jokes. I also figured out how to make the basketball spin and put the characters in the environment using video and effects to create a shadow.

This year, I figured the day would pass rather uneventfully as I was not involved in the livestream. But the Giving Day creative muses found a way to strike again. As I wound down from a long day with a rather nasty cold, inspiration hit. What if AI could create a thank you message for our donors? The result was priceless.

AI-Assisted Creation

To bring this vision to life, I turned to the AI image generator DALL·E, a cutting-edge tool capable of generating images from textual descriptions. I often use Midjourney as well. Both excel in different ways.

I feel like Midjourney is currently better at creating photo-realistic images. Until this experiment, I’ve been rather unimpressed with DALL-E. But DALL-E’s ability to handle text surprised me. Text is an issue for AI. It doesn’t always get it right. I can tell DALL-E has advanced a lot in the past several months.

I asked Chat-GPT to explain why AI image-generating programs often struggle with text. It responded, “I can provide insights into why AI models, including those designed for generating images, often struggle with producing text that is spelled correctly within those images. This challenge stems from several key factors inherent in the training and operation of AI image generation models:

Lack of Semantic Understanding: AI models, particularly those trained for image generation, do not inherently understand the meaning of text or the rules of spelling and grammar. Their learning is based on identifying and replicating patterns found in the training data, rather than understanding or applying the rules of language. When generating text, the model essentially guesses what characters or symbols might look like in a given context, without comprehending what they mean or how they should be correctly spelled.
Training on Visual Patterns: The models are trained on vast datasets of images that include text in various forms, styles, and contexts. However, the focus is on visual patterns rather than the accuracy or relevance of text content. This means that the model learns to replicate the appearance of text but not necessarily to ensure that it is spelled correctly or makes sense within the image.
Resolution and Detail Limitations: Generating sharp and detailed text requires the model to focus on small-scale patterns within the image, which can be challenging given the resolution limitations of current AI image generators. The fine details necessary for legible and correctly spelled text can be lost, leading to errors or ambiguities in the generated text.
Contextual and Conceptual Gaps: Correctly spelling words within an image requires an understanding of both the context in which the text appears and the concepts the text is supposed to convey. AI models might struggle to bridge these gaps, resulting in text that is visually plausible but incorrectly spelled or contextually inappropriate.
Data Diversity and Bias: The datasets used to train these models can include images with incorrectly spelled text, creative or stylized spellings, and a wide variety of fonts and handwriting styles. This diversity can introduce ambiguity into what constitutes “correct” spelling, leading the model to reproduce errors or unconventional spellings found in its training data.

In the examples below, you can see several instances where the same prompt did not meet expectations regarding text.

DALL-E’s failed attempts.

Midjourney’s failed attempts.

The process of generating successful AI images is a blend of creativity, technology, and luck. tweaking and refining the inputs based on the outputs. There were moments of frustration when the results didn’t quite capture the essence of my vision. However, through persistence and experimentation, I discovered the right combination of words that led the AI to produce images that were creative and clever.

It is important to note you can feed the system the same prompt and get two widely different outcomes. See the example below. The prompt was “Render an image of a student bison celebrating a successful giving day at his university. The school’s colors are purple and gold. He should be surrounded by signs of celebration and others join in the happiness. Make it in the style of an old cartoon like Betty Boop.”

This was given twice to DALL-E and twice to Midjourney with varying results. For both, I opened a new chat in between rendering to try to elicit equal results not informed by the previous return.

Midjourney’s First Return

DALL-E’s First Return

Midjourney’s Second Return

DALL-E’s Second Return

AI is developing all the time. Two weeks ago, I was praising Midjourney’s results. This experience has me very pleased with DALL-E. I think it fair to say the market for AI image generation is still up for grabs and will continue to improve rapidly.

AI and Human Creativity

Working with AI has reshaped my understanding of creativity. It’s a partnership where human imagination meets machine intelligence, leading to outcomes that neither could achieve alone. While there are valid concerns about AI in creative fields, such as the loss of personal touch or originality, my experience has been overwhelmingly positive.

I think AI has the potential to amplify human creativity, not replace it. In the next few years, we will be forced to redefine what we can achieve and reshape how we define humanity and creative talent.