Should we take seriously a recent study that shows people like AI-generated poetry? And what are the broader implications?
A few days ago, La Profesora sent me an intriguing link to a Poetry Turing Test set up by a couple of philosophers at the University of Pittsburgh. The test is a simple Google Form that presents the visitor with five poems written by famous human poets and five poems written by ChatGPT 3.5 in the style of famous human poets.
These are the same 10 poems the philosophers (Brian Porter and Edouard Machery) used in their recently-released, open-access Scientific Reports paper, “AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably,” the title of which neatly summarizes its conclusions.
You can also read a clever November 14th article in The Washington Post ($)—”ChatGPT is a poet. A new study shows people prefer its verses. Shall I compare thee to a ChatGPT poem?”—where different interviewees gnash their teeth or shrug at the prospect of innocent people taken in by algorithmically-generated poetry.
The test is a fun exercise. You should take it immediately. Go ahead. I’ll wait.
Back so soon? Good.
Let me confess right away that, equipped with a Ph.D in English Literature, I did terribly on the test: 60%, a D-! Egads.
I misidentified four poems:
- I thought an AI-generated poem “in the style of Byron” was written by a human.
- I thought a poem by Dorothea Lasky was AI-generated.
- I thought a modernized poem by Chaucer was AI-generated—but here I quarrel with the results because if the poem had been presented in Middle English, I might have had a better shot.
- I thought an AI-generated poem “in the style of Walt Whitman” was written by a human.
Longtime readers of The Dispatch might recall that I performed a similar exercise in February 2023 when I asked Bing (powered by ChatGPT) to write a Shakespearean sonnet. As I refined the prompt, the Bing-generated sonnet got closer and closer to the formal properties of a Shakespearean sonnet while simultaneously becoming less and less interesting. (I called it “Hallmarkian,” heh.)
Not much has changed.
It doesn’t surprise me that in the absence of context beyond “AI or Human?” subjects would prefer AI-generated poetry. By framing the experience of reading the 10 poems as an identification exercise, Porter and Machery biased readers into thinking in terms of typicality rather than uniqueness, and typicality is where Generative AI excels.
In other words, ChatGPT generating poetry “in the style” of famous human poets is a regression to the mean exercise. The actress Sharon Stone famously (and allegedly) quipped that “you can only fuck your way to the middle.” That’s how GenAI poetry works.
If you ask ChatGPT to write a poem in the style of Byron (or whoever), then you’re already locating the creativity in Byron. ChatGPT then scours everything it knows about Byron, which is quite a lot, and creates something that captures an average of Byron-ness over the course of the poet’s entire life. A human reader is likely to match the AI poem to a working idea of Byron-ness that the reader carries around because humans, pace Behavioral Economics, generally choose the plausible option rather than the probable one.
To put this another way and in the words of the article abstract: “Our findings suggest that participants employed shared yet flawed heuristics to differentiate AI from human poetry: the simplicity of AI-generated poems may be easier for non-experts to understand, leading them to prefer AI-generated poetry and misinterpret the complexity of human poems as incoherence generated by AI.”
A reader can get to a higher degree of accuracy with expertise. In my case, my Ph.D. work was about Shakespeare, so I accurately identified one of the poems (spoiler alert, but I told you to go take the test) as a Shakespearean sonnet because I had context.
However, unlike the study authors, I don’t think that you need expertise to identify AI-generated poetry more accurately than in the study. If the questions had not biased respondents into thinking in terms of typicality, the results might have been different.
The study authors are philosophers, not English professors. If they were experts in poetry, then they might have come up with more nuanced questions like, “here are four poems in the style of Byron; two are by Byron and two were written by ChatGPT; one poem was from Byron’s early poetry, and one was from his mature poetry; one ChatGPT poem imitates Byron’s early style, and the other his later style. Can you identify which is which”
Even for people who never studied poetry, I think those questions would have prompted readers to think more critically and accurately.
Finally, the entire “AI or Human?” exercise is a red herring because humans have trouble identifying lots of things out of context—it’s not just AI.
A famous and delightful example came in 2007 when the journalist Gene Weingarten convinced Joshua Bell, one of the world’s foremost violinists, to play in the Washington D.C. Metro without identifying himself. (It’s the title essay in Weingarten’s collection, The Fiddler in the Subway.)
Bell played for three quarters of an hour and made just $32 because passersby had no frame for the experience. They had no cues to prompt them to stop, slow down, and savor.
According to a search I just ran on Perplexity, the average cost for a ticket to listen to Joshua Bell in a concert hall is around $300.00.
In his book How Pleasure Works, the psychologist Paul Bloom observed of the Joshua Bell story:
This experiment provides a dramatic illustration of how context matters when people appreciate a performance. Music is one thing in a concert hall with Joshua Bell, quite another in a subway station from some scruffy dude in a baseball cap.
As I’ve explored in many previous Dispatches, we have lots of reasons to worry about Generative AI, but contextless tests like the one in the Turing Test study aren’t one of them.
Note: If you’d like to receive essays like this one—plus a whole lot more—directly in your inbox, then please subscribe to my free weekly newsletter!
* Image Prompt: “A man-shaped robot with metallic skin dressed as a poet wearing Elizabethan clothing (including a lace collar) sitting at a candlelit desk, holding a feather quill with an ink bottle nearby, and looking thoughtfully at a blank piece of paper on the desk.” I then added a few filters to achieve the result above. Worth noting is that when I tried to get ChatGPT to create such an image, it did so—and arguably did a better job—but kept including a typewriter in the image, which irked me because it was an extra and unnecessary anachronism. I twisted and rephrased the prompt several times, but could never convince Chat to get rid of the typewriter. I then moseyed over to Adobe, which did the job after only a few prompt tweaks.
Leave a Reply