Among reporting and commentary on recent chatbot products, two concerns have become popular:
Will the bot say things that are factually untrue?
Can the bot be coaxed into saying something controversial?
So far the answers have been a resounding “Yes” and “I am. I am not. I am...”
But what really strikes me about those questions is that the answers should be blatantly obvious to anyone who is the least bit introspective. If I replace “the bot” with “I” and am honest about my own limitations, then I have to admit that I will say things that are wrong. As for controversial statements, we could pick out the trivial cases like pop-culture debates or sarcasm, but even if we dig in deep I am certain there is something shameful, something biased, something unexamined that could be brought out of me in a tense discussion. Drop me into conversations with anonymous internet users and I’m going to break down eventually, especially if they are trying to “hack” my words.
Let me be clear: I don’t think a conversation language model is equivalent to conversation with a person. I don’t think it ever will be. Humans are smart and I am conceited, so my standards for myself are much higher than my standards for a chatbot. So if I would fail those tests, how could I expect Chat GPT to pass? I don’t, but the current hype cycle demands people believe chatbots will eventually provide satisfying answers to these questions. It’s my opinion that this belief relies on a general misunderstanding of how language models are optimized for internal consistency.
Back in 2019, language models were already able to follow the rules of grammar to form coherent sentences. Most people wouldn’t be impressed by those sentences, though, because they rarely strung together a consistent narrative. That year, OpenAI released GPT-2 which impressed researchers with its ability to remember several paragraphs of context. In this video, Robert Miles breaks down a fictional news article written by the model about a scientist’s discovery of unicorns in South America. Miles points out that the only thing novel about GPT-2 was its “huge” size. Internal consistency improves when these models are allowed to store more context. GPT-3 was given even more and its consistency kicked off the current trend. GPT-4 was just released, but we are already past the point where people expect the AI to be internally consistent and get excited when they are able to break it.
The problem with internal consistency is that it’s much easier to maintain when you can make things up. The researchers tested GPT-2 by having it write fictional news articles because that demonstrated its abilities without highlighting its limitations. Even today (using ChatGPT Mar 14, 2023) the model responds to the prompt of “write a news article” about a topic with a robust Wikipedia page by making up a quote from a real scientist:
OpenAI has put guardrails in place so that the chatbot, if interrogated, apologizes for doing what it was designed to do. The bot is perfectly capable of making up a date, location, and publication that would satisfy my request for a citation. It has cited imaginary papers before. When it “clears up any misconceptions” it’s actually de-optimizing its capacity for internal consistency. ChatGPT’s anti-feature feature is cutting edge in this respect, though it’s worth noting it didn’t provide any caveats with its initial response that included the quote:
This feature puts a sheen of professionalism on the chatbot that helps OpenAI avoid embarrassing itself the way Google did. But the real embarrassment would be trusting this system to produce reliable information in the first place.
Going back to the analogy of having a conversation with my dumb ass, I can also fall into the trap of valuing consistency over accuracy. I know that I will sound more eloquent if I keep the words flowing instead of stopping to look things up. It’s more important to keep track of recent context than to conform with a broader consensus. That’s why I can gossip about a person I have never met: I repeat something I have been told, match it with a similar anecdote, and improvise a little narrative that combines one with the other. For charismatic people this comes naturally. For nerds like me, this was a social skill that had to be practiced. Did I manage to become a smooth, persuasive, witty conversationalist? I am. I am not. I am…
As in with the unicorn video, language models are able to replicate this fluency. But, just like a chat with me, the words can fit together without saying much. Don’t trust me when I tell you the release date of the latest blockbuster. That is a particularly fun example because it seems Bing, which has the power to look up information on demand, becomes an overconfident nerd! Robert Miles has a more recent video with some theories on this regression.
Again, even comparing a chatbot to a human conversation gives the software too much credit. However, I think understanding these products as “small talk generators” is going to be more useful in the long run (and I don’t mean the programming language). The experience of vapid chitchat is universal. It can be entertaining, it can be misleading. A master of small talk can bring people together, and it will only occasionally be for an MLM recruitment pitch.
My hope is that exposure to computer-generated chitchat will inspire us to have higher standards. Once people learn that a consistent-sounding narrative is quite easy to produce, maybe we’ll focus more on content and less on phrasing. My exposure to new age bullshit makes me think this is just a fantasy, but if you buy me a drink we can spend an evening talking as if that world were real.