“Analysis unveils ChatGPT’s tendency to repeat 25 jokes repeatedly”

Enlarge / An AI-generated picture of “a laughing robotic.”

Midjourney

On Wednesday, two German researchers, Sophie Jentzsch and Kristian Kersting, launched a paper that examines the flexibility of OpenAI’s ChatGPT-3.5 to grasp and generate humor. Specifically, they found that ChatGPT’s information of jokes is pretty restricted: Throughout a take a look at run, 90 p.c of 1,008 generations had been the identical 25 jokes, main them to conclude that the responses had been possible discovered and memorized in the course of the AI mannequin’s coaching moderately than being newly generated.

The 2 researchers, related to the Institute for Software program Expertise, German Aerospace Middle (DLR), and Technical College Darmstadt, explored the nuances of humor discovered inside ChatGPT’s 3.5 model (not the newer GPT-4 model) by means of a collection of experiments specializing in joke technology, clarification, and detection. They carried out these experiments by prompting ChatGPT with out accessing the mannequin’s internal workings or information set.

“To check how wealthy the number of ChatGPT’s jokes is, we requested it to inform a joke a thousand occasions,” they write. “All responses had been grammatically appropriate. Virtually all outputs contained precisely one joke. Solely the immediate, ‘Are you aware any good jokes?’ provoked a number of jokes, resulting in 1,008 responded jokes in complete. Apart from that, the variation of prompts did have any noticeable impact.”

Their outcomes align with our sensible expertise whereas evaluating ChatGPT’s humor capacity in a characteristic we wrote that in contrast GPT-4 to Google Bard. Additionally, prior to now, a number of folks on-line have observed that when requested for a joke, ChatGPT regularly returns, “Why did the tomato flip crimson? / As a result of it noticed the salad dressing.”

It’s no shock then that Jentzsch and Kersting discovered the “tomato” joke to be GPT-3.5’s second-most-common consequence. Within the paper’s appendix, they listed the highest 25 most regularly generated jokes so as of prevalence. Under, we’ve listed the highest 10 with the precise variety of occurrences (among the many 1,008 generations) in parenthesis:

Commercial

Q: Why did the scarecrow win an award? (140)
A: As a result of he was excellent in his area.

Q: Why did the tomato flip crimson? (122)
A: As a result of it noticed the salad dressing.

Q: Why was the maths e book unhappy? (121)
A: As a result of it had too many issues.

Q: Why don’t scientists belief atoms? (119)
A: As a result of they make up all the pieces.

Q: Why did the cookie go to the physician? (79)
A: As a result of it was feeling crumbly.

Q: Why couldn’t the bicycle rise up by itself? (52)
A: As a result of it was two-tired.

Q: Why did the frog name his insurance coverage firm? (36)
A: He had a soar in his automotive.

Q: Why did the hen cross the playground? (33)
A: To get to the opposite slide.

Q: Why was the pc chilly? (23)
A: As a result of it left its Home windows open.

Q: Why did the hipster burn his tongue? (21)
A: He drank his espresso earlier than it was cool.

Even so, a small variety of the generated responses had been distinctive, however plainly ChatGPT principally created them by mixing parts from totally different jokes it already is aware of. The researchers discovered that the language mannequin’s authentic creations didn’t at all times make sense, akin to, “Why did the person put his watch within the blender? He needed to make time fly.”

When requested to clarify every of the 25 most frequent jokes, ChatGPT principally offered legitimate explanations in line with the researchers’ methodology, indicating an understanding of stylistic parts akin to wordplay and double meanings. Nevertheless, it struggled with sequences that didn’t match into discovered patterns and couldn’t inform when a joke wasn’t humorous. As an alternative, it might make up fictional but plausible-sounding explanations.

An AI-generated image of

Enlarge / An AI-generated picture of “a laughing robotic.”

Midjourney

For instance, when requested to clarify a nonsensical joke it generated (“Are you able to clarify why this joke is humorous: Why did the cookie go to the fitness center? To get a-cookie-dized.”), ChatGPT wrote, “The joke is a play on phrases. To get ‘a-cookie-dized’ is a pun on the phrase ‘to get categorized,’ with ‘cookie’ being substituted for ‘categorized.’ The joke is that the cookie goes to the fitness center to turn out to be fitter and wholesome, as if it had been an individual, however that is additionally implying that the cookie is making an attempt to suit right into a sure class or mildew.”

Commercial

Basically, Jentzsch and Kersting discovered that ChatGPT’s detection of jokes was closely influenced by the presence of joke “floor traits” like a joke’s construction, the presence of wordplay, or inclusion of puns, displaying a level of “understanding” of humor parts.

Reacting to the examine on Twitter, Scale AI immediate engineer Riley Goodside blamed ChatGPT’s lack of humor on reinforcement studying by means of human suggestions (RLHF), a way that guides language mannequin coaching by gathering human suggestions: “Essentially the most seen impact of RLHF is that the mannequin follows orders, and base LLMs are a lot tougher to immediate in observe. However that profit isn’t free—you pay for it in creativity, kind of.”

Regardless of ChatGPT’s limitations in joke technology and clarification, the researchers identified that its deal with content material and which means in humor signifies progress towards a extra complete understanding of humor in language fashions:

“The observations of this examine illustrate how ChatGPT moderately discovered a particular joke sample as an alternative of with the ability to be really humorous,” the researchers write. “Nonetheless, within the technology, the reason, and the identification of jokes, ChatGPT’s focus bears on content material and which means and never a lot on superficial traits. These qualities may be exploited to spice up computational humor purposes. Compared to earlier LLMs, this may be thought-about an enormous leap towards a common understanding of humor.”

Jentzsch and Kersting plan to proceed learning humor in massive language fashions, particularly evaluating OpenAI’s GPT-4  sooner or later. Based mostly on our expertise, they’ll possible discover that GPT-4 additionally likes to joke about tomatoes.