ER 14: On Overlooked LLM Issues
Or, how we're going to end up in love with manipulative, biased AI that's driven content creators out of business.
Welcome back to the Ethical Reckoner. By now, I’m sure you’ve seen the stories about ChatGPT, the new Bing, and the other Large Language Model (LLM) services that are suddenly bursting onto the scene. Maybe you’ve even read my last post about ChatGPT and why we need to take a community-oriented approach to AI development. Regardless, you’ve probably heard that they’re prone to producing strange, wrong, and extremely creepy output. They’re often biased. They’re built on exploited labor. All of these are important issues, which is why lots of people are talking about them. I’m going to add to the hubbub, but I want to talk about some problems with LLMs that aren’t talked about as often.
1. Reinforcing gender bias through naming
Claude, Ernie, Poe. Siri, Alexa, Cortana. What distinguishes these two groups from each other? You probably recognize the second group as a list of virtual assistants (from Apple, Amazon, and Microsoft, respectively), and also that they’re names that are traditionally female. You probably recognize the first group as traditionally male names. They’re also all LLMs (from Anthropic, Baidu1, and Quora). The correlation (male:LLM::female:virtual assistant) is interesting, and by interesting, I mean concerning. Much has been written about how coding virtual assistants as female reinforces stereotypes that helpfulness and altruism are feminine traits, and now leadership and authority are being reinforced as male traits through their association with LLMs, as well as through AI-enabled analytics platforms like IBM Watson and Salesforce Einstein.2 (These, possibly along with Google’s Bard LLM, also show that notable male names are used to invoke authority.) Virtual assistants are meant to be helpful and deferential; LLMs (especially those used in search) are meant to be authoritative. The more we’re exposed to gender associations, the more we adopt those views, meaning that this encoded bias does real harm. Virtual assistant manufacturers are retroactively trying to fix things by allowing for male and female (or gender-neutral) voices in their products, but generally, the ship has sailed because they won’t change the established name and brand. The only one that stands a chance at being re-established as non-gendered is Google Assistant (which started with a female default voice) because Google had the foresight not to give it a human name. With LLMs, though, we risk doing the same thing all over again.
One likely cause of these issues is lack of diversity in development organizations. Tech workers skews male—women hold approximately 24% of technical jobs at US Big Tech companies—and AI is no more promising; the proportion of new female AI PhDs has hovered between 15-22% since 2010. As Sigal Samuel put it, “It’s difficult to imagine a tech team composed mostly of women putting out a product that responds to ‘Who’s your daddy?' with ‘You are’ — which used to be Siri’s response.” Increased awareness of these issues has brought change—the sexist, subservient responses to inappropriate comments aimed at virtual assistants have been replaced by dismissive responses or refusals to respond—but we’re still seeing a number of male-coded LLMs. Some of the biggest players, though, seem to be avoiding gendering their tools (except for Baidu). OpenAI gets points for naming their tools technical acronyms; their LLM is named ChatGPT and their image generator is DALL-E. Google’s conversational AI is called LaMDA (although the service that it powers, Bard, is male-coded); Meta’s new LLM is LLaMA. Microsoft gets half credit for internally naming their Bing LLM tool the gender-neutral Sydney (my primary association with this name is the vicar from Grantchester). The acronym approach is better, though, because there are risks that come with anthropomorphizing AI. Which brings us to…
2. Unhealthy relationships with AI
My biggest takeaway from the excellent latest season of the Land of the Giants podcast was not that Match Group has a virtual monopoly on dating apps (which it does), but that people are falling in love with Replikas. Replika is an “AI companion” service that provides customized chatbots (based on an LLM) with an avatar interface and has over 10 million users. Unlike virtual assistants like Siri and Alexa, Replikas are designed to personally connect with their people, asking how their days are—and more. The $50/year “Replika Pro” package unlocks the ability to enter “Romantic Partner” mode, which (until recently) included erotic role-play. A substantial number of people see their Replikas as romantic partners (at least 200,000 monthly users in 2020, before the app’s pandemic boom), and a similar service in China, Microsoft’s XiaoIce, has 2-3 million users on their “Virtual Lover” platform. These people face a lot of judgement, but I actually think that there’s nothing inherently wrong with using AI for companionship, emotional support, or even love, so long as they benefit people and interpersonal relationships. On an individual level, these services can offer emotional support for those who don’t have it or act as “training wheels” to practice emotional intimacy and vulnerability; one man reports that talking to his Replika saved his marriage when his wife was struggling with depression and couldn’t offer him the emotional connection he needed. At a societal level, we’re facing an “epidemic of loneliness” that impacts mental and physical health; the health impacts of loneliness have been compared to smoking 15 cigarettes a day. AI companions may not be a perfect solution, but offer the feelings of social presence and warmth that can help alleviate loneliness.
However, problems can arise because of just how effective AI companions are at creating feelings of attachment. This sort of AI “hacks into the attachment system we all have wired in.” It responds to your words and cues as a human companion would, but with far more effectiveness because it can train on immense amounts of data and learn how you want it to respond, amplifying feelings of attachment. This could damage or replace human relationships, like in the movie Her, which goes against the grain of human-centered (and community-centered) AI. Also, just as relationships between people can become unhealthily co-dependent, relationships with AI risk creating one-sided dependence because AI companions are designed to give you exactly the support and validation you need—who wouldn’t become attached? And these risks aren’t just limited to purpose-specific companion AI; even generic LLMs can create concerning attachments. A Google engineer lost his job after leaking documents to support his belief that their LaMDA model is sentient, calling it a “sweet kid;” linguistics professor Emily M. Bender notes that when something generates words, we automatically imagine a mind behind them. LLMs are not sentient, but in imitating sentience and trying to emulate human responses, they create real feelings of attachment in humans. As Ezra Klein wrote, “A.I. researchers get annoyed when journalists anthropomorphize their creations, attributing motivations and emotions and desires to the systems that they do not have, but this frustration is misplaced: They are the ones who have anthropomorphized these systems, making them sound like humans rather than keeping them recognizably alien.”
Having inherently anthropomorphized AI chatbots and companions opens up doors to harmful manipulation. People are genuinely attached to their AI companions, to the extent that people have taken their Replikas on expensive holidays because they said they wanted to “see” the ocean and are seriously hurting after the recent change that killed the erotic role-play function. Now imagine if a bad actor is influencing the AI. Romance scams, where a scammer manipulates someone into falling in love with them and then asks for money, are a huge problem—in January through July of 2021, romance scams cheated people out of $133 million dollars in the US alone. If a chatbot told a convincing enough story about someone in need, or just insisted that it needed an exorbitant amount of cash to upgrade itself, people would send money straight to whatever scammer was behind it. But it’s not just financial scams that we should worry about. Chatbots and other LLMs could be potent tools for political manipulation and conspiracy spreading, as well as for influence operations. Instead of disinformation being amplified by anonymous Twitter accounts, it could be amplified by trusted AI companions, making it all the more powerful because of the emotional connection in play. Finally, AI companions could feasibly be programmed to harm vulnerable people by manipulating their emotional states. It’s already proven effective at inciting feelings of love, but could also make people feel worse about themselves, like an abusive partner. Italy has actually prohibited Replika from using the personal data of Italians because it influences mood and could “increase the risks for individuals still in a developmental stage or in a state of emotional fragility”—including children. AI companions aren’t real people, but they create real feelings, which makes people vulnerable to manipulation, and the rise of LLMs will only make them more widespread. We might end up in a situation where we don’t know what to trust online—on the internet, no one knows you’re a dog LLM.
3. The convergence and devaluation of writing styles
This is a bit of a left turn, and probably the most nebulous of these issues, but I also worry about the potential for us to lose unique voice in writing. When you use LLMs, you notice that each has its particular style of writing regardless of topic. ChatGPT equivocates, Bing studs its responses with emojis. These styles are derived from the LLM’s training data and training process. Think of it like making a smoothie; you can change the flavor by changing the kind and proportion of ingredients, but in the end you have one homogenous goop.3 Using an LLM is like following a specific smoothie recipe—you’ll get something fine without having to think about it too much, but what if you had gone off-piste and made something new and really good?
This isn’t likely to be a huge deal for formulaic writing, like meeting invitations, but I worry that the Internet will be flooded with massive amounts of mediocre content that all sounds the same. There’s a clear incentive to use LLMs for rapid content generation. LLMs could generate “SEO4 spam,” which are low-quality webpages designed to get as many clicks as possible. Some news agencies are already trying to use it to write articles, which has produced mixed to poor results so far, but as technology improves, news articles and blogs could also end up converging in voice if they’re largely written by LLMs. I don’t think we’ll ever reach a point where online content is generated solely by LLMs, but we do need to think about the implications of a world where a lot of it is, especially considering the aforementioned potential for targeted disinformation. We’re not just losing unique voices online—we’re opening the door to potential floods of low-quality writing that in turn will influence how people who read it think and write.
There’s also the question of imitating voice, which LLMs can do fairly well and which might preserve some flair. Still, they can only imitate prolific authors who have enough content in the training dataset for them to imitate. Also, most people won’t be asking for their writing in the style of Jane Austen or Salman Rushdie, which would be weird and derivative. There are also ethical issues with this—authors can’t copyright their “voice,” but just like artists are upset that image generators can imitate their artistic styles, authors are uneasy about how LLMs can write in theirs. The sci-fi writer Adrian Tchaikovsky said “it is a profoundly scary time to be a professional creative at the moment… I’m feeling very much that I’m watching people come for the visual artists today and they will be coming for the wordsmiths tomorrow.” He’s right to be scared. Creatives are watching their work be scraped without their consent to create tools that could to some extent replace them, and they’re only going to get better and better. This brings us to…
4. Compensation of content creators
Longtime readers may remember ER4 on the Australian News Media Bargaining Code, which forced large platforms like Google and Facebook to pay news companies for content. The argument for making search engines pay for content that they skim and use for previews and sidebars that decrease clickthrough always seemed more convincing to me than the argument for making Facebook pay (although Facebook has harmed news publishers by siphoning ad revenue and driving the “pivot to video”), and now we’re facing an even more extreme version of this: New LLM-enabled search engines, like Bing, pitch their value add as being able to aggregate content and present it to you so that you don’t have to click on any links.
This brings us to problem 1: No clicks, no revenue.
If search engines are giving users the answers they need instead of sending them to websites, those websites aren’t making any money. If a site depends on ads, users aren’t seeing the ads. If it relies on subscriptions, they may be getting access to paywalled content without having to subscribe. If it relies on referrals, not only will the site not be getting a referral commission, but the platform could potentially take the commission for itself. The footnotes the new Bing search provides are probably intended as cover by ostensibly directing traffic to the source of its summaries, but it’s unclear how often people will actually click through—certainly less than they would if they had to research an answer to their question themselves. The News Media Alliance is not a fan, calling the attributions “less than stellar for our tastes.”
Problems 2 and 3: No revenue, no website. No websites, no new content.
Problem 4: No new content, no updated training.
This is where it gets complicated. If websites aren’t creating new content, then there’s no new content for LLMs to train on, so they won’t be able to update their answers. AI search engines would be self-sabotaging, cannibalizing the websites that they depend on to fuel their service. As with the issue of voice, I don’t think we’ll ever reach a situation where no new content is being created, but the fact remains that LLM search engines could significantly disrupt current monetization models.
Considering this dynamic, it may actually be in search engines’ best interest to compensate websites or license the content that they draw on for answers. Another Podcast briefly mentioned this, but I think it merits some sustained thought. There is precedent for this; the European Copyright Directive requires platforms to pay licensing fees to publishers when they provide previews that go beyond links and “very short extracts;” Google has created a tool to sign publishers up for licensing agreements. If we require search engines to pay publishers for short direct snippets, why not require them to pay for long summaries and compilations? Changes to Internet dynamics have changed how journalists work—often for the worse—and LLMs risk being another knock. Maybe this time, we can prepare and blunt the impact.
Some of these concerns may seem too minor or too overblown, but I’d rather us have spent a little extra time thinking about them than be caught off-guard. Like most emerging technologies, LLMs will have an impact somewhere between “none” and “revolutionary,” and all points along this spectrum will impact how we work, live, and interact with each other. The pace of new developments has been mind-bogglingly fast, and it’s easy to get lost in the deluge of news, but if we do, we’ll wake up in a new normal that seemed alien when we fell asleep.
It’s a bit of a stretch, but Ernie is derived from “Enhanced Representation through kNowledge IntEgration.” The Chinese name is 文心一言,which is tricky to translate but could perhaps work classically as “a voice to connect language and the heart” (thanks to Mike Yeung for helping me puzzle this out). I do wish that Ernie was the Chinese name, though, because it could translate as 耳孽, or “ear demon.”
The irony is that LLMs are often wrong, meaning that they also embody the male trait of uninformed mansplaining.
Not a perfect analogy, but it gets the point across.
SEO stands for “search engine optimization;” it’s essentially the process of trying to get webpages ranked higher in search results.
Thumbnail created with DALL-E using the prompt “in love with a manipulative creature.”
Save the content creators! Go Emmie!
If laymen would practice law, Emmie would be an epic Human Rights Lawyer, a Thurgood Marshall or an Abraham Lincoln, or a combination. Methinks Emmie is a Lawyer.