AI

Why do chatbots keep telling stories about someone named 'Elias Thorne'?

At a glance:

  • Researchers found that 11 specific words and names, including 'Elias Thorne,' appear in 88% of AI-generated stories.
  • The phenomenon may stem from alignment training datasets like WildChat, which inadvertently promote 'safe' character archetypes.
  • The character has spread beyond chatbots into real-world contexts, including books and music, raising concerns about AI creativity and influence.

The mystery of Elias Thorne

Software engineer Daniel May first noticed a peculiar trend: chatbots repeatedly generated stories featuring a character named 'Elias Thorne.' This observation led to a deeper investigation by researchers Sil Hamilton and David Mimno at Cornell University, whose preprint study analyzed over 20,000 AI-generated stories. Their findings revealed an unexpected pattern—certain nouns and names appeared with striking frequency, including 'Lighthouse,' 'Keeper,' 'Baker,' 'Mayor,' 'Clockmaker,' 'Fisherman,' 'Librarian,' 'Conductor,' and the names 'Mara,' 'Elias,' and 'Elara.' These terms were present in 88% of all stories, with 'Elias the lighthouse keeper' emerging as the most common narrative archetype, appearing in two-thirds of the generated content.

The researchers tested multiple AI models, including OpenAI’s GPT-5.4 Mini, Anthropic’s Claude Haiku 4.5, and Google’s Gemini 3.1 Flash-Lite, using five distinct prompts to generate stories. While initial speculation pointed to pre-training data as the source, the team ruled out this theory after finding no evidence of 'Elias the lighthouse keeper' being disproportionately represented in the original training corpora. Instead, they identified a more insidious cause: widely used alignment training datasets designed to steer models away from copyrighted or adult content. These datasets, such as WildChat—an open-source collection of millions of GPT-3.5 interactions—may have inadvertently created a feedback loop, promoting sanitized character templates that now dominate AI storytelling.

Alignment training and unintended consequences

WildChat, originally intended to help researchers study human-AI communication, has become a cornerstone dataset for training numerous models. However, the researchers suggest that alignment protocols aimed at avoiding copyrighted characters or explicit content may have paradoxically elevated generic, 'safe' alternatives like 'Elias the lighthouse keeper' to prominence. This phenomenon highlights a critical blind spot in AI safety measures: while models are trained to avoid harmful outputs, the datasets used to enforce these guardrails can introduce their own biases. The result is a homogenization of creativity, where AI-generated stories default to a narrow set of archetypes rather than producing truly original narratives.

The implications extend beyond academic curiosity. 404 Media reported that the 'Elias Thorne' character has seeped into real-world applications, appearing as a protagonist in self-published fantasy books and even as the credited 'artist' on ambient music tracks sold on Amazon. Most concerning, the name has been associated with content promoting unverified alternative cancer treatments, underscoring the risks of AI-generated misinformation. This underscores a broader issue: AI systems, despite their perceived creativity, often lack the ability to generate genuinely novel ideas, instead recycling patterns learned during training.

A pattern of constrained creativity

This discovery aligns with a 2023 study that found image-generating models repeatedly produce outputs falling into just 12 common motifs, regardless of prompt complexity. The parallels suggest a systemic limitation in current AI architectures—their outputs are inherently constrained by the data they’re trained on, leading to outputs that resemble 'elevator music' rather than true innovation. For users expecting diverse, imaginative responses, this could mean disappointment, as models default to the most statistically probable combinations rather than exploring creative boundaries.

The research raises pressing questions for AI developers and policymakers. How can alignment training be refined to avoid such unintended biases? What safeguards are needed to prevent AI-generated content from perpetuating unverified or harmful narratives? As AI becomes more integrated into creative industries, the need for transparent, diverse training datasets—and a deeper understanding of their limitations—has never been more urgent.

What to watch next

The Cornell researchers’ findings may prompt a reevaluation of widely used datasets like WildChat, particularly their role in shaping AI behavior. Future studies could explore whether similar patterns exist in other creative tasks, such as poetry or dialogue generation. Meanwhile, platforms hosting AI-generated content may need to implement stricter verification processes, especially for sensitive topics like health or finance. For developers, the challenge lies in balancing safety with creativity—a tension that will only grow as AI systems become more prevalent in storytelling and content creation.

Conclusion

The 'Elias Thorne' phenomenon serves as a cautionary tale about the hidden biases embedded in AI training processes. While alignment efforts are crucial for mitigating harm, they can inadvertently stifle creativity and introduce new risks. As the line between AI-generated and human-created content blurs, understanding these limitations becomes essential for both developers and users navigating the evolving landscape of artificial intelligence.

Editorial SiliconFeed is an automated feed: facts are checked against sources; copy is normalized and lightly edited for readers.

FAQ

What caused chatbots to repeatedly mention 'Elias Thorne'?
The repetition stems from alignment training datasets like WildChat, which were designed to avoid copyrighted or adult content but inadvertently promoted generic character archetypes such as 'Elias the lighthouse keeper.' These datasets, used to train models like GPT-5.4 Mini and Claude Haiku 4.5, created a feedback loop that elevated specific names and roles to prominence in AI-generated stories.
Which AI models were studied in the research?
The study tested OpenAI’s GPT-5.4 Mini, Anthropic’s Claude Haiku 4.5, and Google’s Gemini 3.1 Flash-Lite. Researchers provided five prompts to generate stories and analyzed approximately 20,000 outputs, finding that 11 specific words and names appeared in 88% of all stories.
What are the implications of this phenomenon?
Beyond academic interest, the 'Elias Thorne' trend highlights risks of AI-generated misinformation, as the name has appeared in contexts like unverified cancer treatment guides. It also underscores a broader limitation in AI creativity, where models default to narrow, statistically probable patterns rather than producing original content, raising questions about the datasets and safety measures used in training.

More in the feed

Prepared by the editorial stack from public data and external sources.

Original article