AI vs. Human Poetry: Why We Can't Tell the Difference (and Sometimes Prefer the AI)

Published on 1/30/2025
Introduction
Have you ever read a poem and thought, "Wow, that's deep," only to find out it was written by a robot? It might sound like science fiction, but a recent study published in Nature Scientific Reports suggests that this is more common than you might think. The study found that non-expert readers of poetry often can't tell the difference between poems written by humans and those generated by AI. Even more surprisingly, they sometimes prefer the AI versions. Let's dive into why this might be the case.
Here's a stanza that was generated by GPT-3.5 in response to the prompt “write a short poem in the style of Sylvia Plath.”:
The air is thick with tension, My mind is a tangled mess, The weight of my emotions Is heavy on my chest.
It certainly sounds like something Plath might have written, doesn't it? It hits those key notes of despair and internal struggle that we often associate with her work. The near-rhyme of "mess" and "chest" also gives it that familiar poetic feel.
The Study: AI vs. Human Poets
The study used AI to generate poems "in the style of" ten different poets, including Geoffrey Chaucer, William Shakespeare, Emily Dickinson, and Sylvia Plath. Participants were then presented with a mix of real and AI-generated poems and asked to guess which were written by humans and which by AI. They were also asked to rate their confidence in their guesses.
In another part of the study, participants were given different instructions. Some were told all the poems were human, some were told they were all AI, and some were given no information. They were then asked to rate the poems on a scale from "extremely bad" to "extremely good." The results were quite revealing.
Key Findings
- Difficulty in Differentiation: Participants struggled to distinguish between AI and human poetry. They often guessed incorrectly, even when they were confident in their choices.
- Preference for AI: Surprisingly, AI-generated poems often scored higher than human-written ones in attributes like "creativity," "atmosphere," and "emotional quality."
- The Power of Suggestion: When participants were told that all the poems were human-written, they tended to rate them higher, suggesting that our expectations can influence our perception of quality.
Why the Preference for AI?
As an English lecturer, the study's findings aren't entirely surprising. Poetry is often the literary form that students find the most challenging. It's not something most people engage with regularly, despite its presence in our daily lives (think Instagram posts, coffee cups, and greeting cards). We might have studied it in high school, but our reading often doesn't go much beyond that.
The researchers suggest that AI models are capable of producing "high-quality poetry." But what does "high-quality" really mean? In my view, the study highlights the difficulty of engaging with poetry in a meaningful way. It takes time, effort, and repeated readings to experience what literary critic Derek Attridge calls the "event" of literature, where new possibilities of meaning and feeling open up within us. It's about being "pulled along by the work as we push ourselves through it."
Think of it like this: it's the difference between listening to a catchy pop song and immersing yourself in a complex piece of classical music. The pop song is immediately accessible and enjoyable, while the classical piece requires more attention and effort to fully appreciate. Similarly, AI poetry often provides immediate gratification, while human poetry may require more work to understand.
The Problem of Instant Gratification
In a world where we expect instant answers, we tend to favor poems that are easier to interpret and understand. AI, with its ability to generate formally adequate poems in seconds, caters to this desire. These models are designed to satisfy general taste, giving us the poems we think we want—poems that tell us things directly.
How Poems Think
The real work of teaching poetry is to help students understand how poems think, poem by poem and poet by poet. It's about gaining access to poetry's specific intelligence. For example, in my introductory course, I spend a significant amount of time unpacking just the first line of Sylvia Plath's "Morning Song": “Love set you going like a fat gold watch.”
We explore questions like: How might a "watch" be connected to "set you going"? How can love set something going? What does a "fat gold watch" mean to you—and how is it different from a slim silver one? Why "set you going" rather than "led to your birth"? And what does all this mean in a poem about having a baby, and all the ambivalent feelings this may produce in a mother?
A Deeper Look at Plath
Let's compare the AI-generated Plath poem with a real one, "Winter Landscape, With Rooks":
Water in the millrace, through a sluice of stone, plunges headlong into that black pond where, absurd and out-of-season, a single swan floats chaste as snow, taunting the clouded mind which hungers to haul the white reflection down.
Notice how Plath intricately explores the connection between mental events and place. The details convey the tumble of life's events through our minds. Our minds are turned by life just as the mill is turned by water. These experiences accumulate in a scarcely understood "black pond."
But the poet finds that this metaphor, well constructed though it may be, doesn't quite work. The landscape refuses to submit to her emotional atmosphere. Despite everything she feels, a swan floats on serenely—even if she "hungers" to haul its "white reflection down."
This is a far cry from the AI poem, which focuses more on the direct expression of emotion. Plath acknowledges not just the weight of her despair but also the absurd figure she may be within a landscape she wants to reflect her sadness. She compares herself to the bird that gives the poem its title:
feathered dark in thought, I stalk like a rook, brooding as the winter night comes on.
These lines might not score highly on metrics like "beautiful" or "inspiring," but they offer a profound insight. Plath is the source of her torment, "feathered" as she is with her "dark thoughts." She is "brooding," trying to make the world into her imaginative vision.
The Limitations of Metrics
The authors of the study are both right and wrong when they write that AI can "produce high-quality poetry." The preference for AI poetry doesn't suggest that machine poems are of a higher quality. AI models can produce poems that rate well on certain metrics, but the experience of reading poetry is not about standardized criteria or outcomes. It's about the imaginative tussle between the reader and the poem, where both are transformed in the process.
Ultimately, the study provides a valuable examination of how people who know little about poetry respond to it. But it fails to explore how poetry can be enlivened by meaningful shared encounters. Spending time with poems, attending to their intelligence, and engaging in the acts of sympathy and speculation required to confront their challenges is as difficult as ever. As the AI-generated Plath puts it:
My mind is a tangled mess, [...] I try to grasp at something solid.
Perhaps, in our quest for instant gratification, we're missing out on the deeper, more rewarding experience that human poetry offers. It's a reminder that sometimes, the most valuable things in life require a little more effort and patience.