Conversational AI

OpenAI’s GPT failed a test to generate random numbers. What does this say?

As someone who has been leading the development of AI, speech recognition and telephony systems for more than 20 years, I have been troubled by the abundance of hyperbole surrounding OpenAI’s ChatGPT, GPT-3 and GPT-4, especially from experts in the field who inaccurately ascribe cognition and problem-solving skills to these language models, some of whom have even gone so far as to ascribe emotion, malevolence and malice.

Disarming the misinformation

This kind of misinformation is causing a lot of unnecessary confusion and fear, and I have to presume it will lead to more than a few ill-fated business decisions. Heads of industry and commerce are being sold on capabilities that simply don’t exist, while many average consumers have convinced themselves that AI is going to somehow take over the universe, like robots in a sci-fi movie.

Setting realistic expectations

Mind you, I am not debating the vast potential and sophistication of OpenAI’s language models. As an engineer, I embrace innovation and look forward to taking full advantage of them, to the extent they serve a useful purpose. That said, in light of the many untruths surrounding how they are architected and what they are able to do, I felt compelled to conduct some light experimentation. My aim was to objectively demonstrate that these language models are not the equivalent of thinking, feeling humans . . . or what the scientific community refers to as “anthropomorphic.”

Experimenting with random numbers

For at least two decades, it has been proposed that the generation of random numbers could serve as a reverse analog to a Turing test for AI systems. Evidence has shown that the manner in which humans think about random numbers is very personal, almost like a fingerprint, yet also driven by social, economic, cultural and geographic variables. I hypothesized that asking these language models to generate random numbers could potentially prove or disprove their inherent capabilities, or at minimum, offer insights into their strengths and weaknesses, specifically as relate to gaps or biases in their training data.

Getting surprising results

Using the Davinci model, I repeatedly asked GPT-3 to give me a random number between 1 and 100. To ensure I did not receive cached or canned results, I rotated API keys and set the “Force New Response” flag. The response to more than a thousand calls made over the course of several days was always 45 or 87. 

I then repeatedly asked GPT-3 to give me a random number between 1 and 100 that wasn’t 45 or 87. In this scenario, the response to more than a thousand calls made over the course of several days was always -17.

Speaking to the truths

To me, these results speak to some fundamental truths. Firstly, these language models are basic pattern matching systems, they have no cognition or problem-solving skills. They understand neither the question nor the response, just the pattern. Granted, the response they provide to other types of questions may seem eerily “human,” and in some instances, downright creepy or disturbing. However, the reason for this is easy to explain. We need only look in the mirror of humankind to appreciate all of the dark and disquieting content used as training data for these language models.

Secondly, the response ChatGPT, GPT-3 or GPT-4 provides will be wrong in many instances and will often be erroneous, as has been widely reported. These language models only know what they know, and unfortunately, at present, they have no means of articulating a confidence level consistent with the sample size or quality of associated training data. They cannot say, for instance, “I am 70% confident the answer to your question is . . .” Instead, their response is nothing less than complete certitude.

Appreciating the limitations

Though these language models show great promise, it is important that people understand their inherent limitations and do not accept all of their responses as 100% fact. It is also important that as a society, we embrace the promise of AI for improving many aspects of our lives and livelihoods.

Author

Related Articles

Back to top button