A new era for ChatGPT? GPT-4 may have passed the famous Turing Test

Researchers evaluated the ability of the GPT-4 language model and concluded that it passed the Turing test.

Group of researchers claim that GPT-4 passed the Turing Test and managed to fool people who are human.

The use of language models, such as the famous ChatGPT, is increasingly common within society and the model is increasingly better. Since the release of ChatGPT, OpenAI has already tested new updated versions that are better than the old versions. Recently, the company made GPT-4 available, which has even better performance compared to other versions.

Models are so good at conversations and answering questions that they can often give the impression that we are talking to another human being. The test of whether an artificial intelligence can impersonate a human being and deceive other humans is called the Turing test. The test was proposed by Alan Turing in his famous 1950 paper discussing thinking machines.

A new study by researchers at the University of San Diego concluded that the GPT-4 language model passes the Turing test. He would be able to impersonate a human being to deceive other humans. The group tested three language models, including GPT-3, having a group of people interact for 5 minutes with each model or with one person without knowing it.

1950 article

In 1950, Alan Turing published his famous article Computing Machinery and Intelligence which became one of the most important articles in Computer Science. At the beginning of the article, he starts by questioning whether machines can think, starting a discussion about possible artificial intelligence. This article is seen as one of the starting points in the search for artificial intelligence.

In the 1950 paper, Turing refers to these machines that would do the same things as humans as thinking machines.

One of the key points of the article is the discussion that Turing delves into about the possibility of a machine thinking. The article has a very in-depth philosophical approach. It draws attention mainly for its discussion of how machines could learn through patterns. Learning through patterns is the basis of machine learning today.

Test the Turing

Within the article there is also a proposal for a test that became known as the Turing Test. The idea behind the test would be to evaluate whether a machine would be able to impersonate a human being and deceive another. Making an interrogator believe they are talking to another human via text messages. If the machine was successful in this test, it could be considered intelligent.

Alan Turing's article was revolutionary for the area of Computer Science and brings concepts that are still important today. Credit: Hitoric Tech.

The test began to gain momentum in the following decades and several competitions were created where researchers and companies took their machines for testing. One of the applications of the Turing Test is CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart) used on websites to find possible bots.

Language models

A language model is a model that has been trained to understand, create, or interact with text. The most famous example is models who learn to translate from one language to another. The idea is that the language model receives an input, which may or may not be text, and returns a response in texts or sentences. The area of specialty is called natural language processing (NLP).

Currently, language models are advanced and can maintain a conversation such as answering questions and performing tasks such as summarising a text. Most of these current models are based on neural networks and a technique called Transformers. They are trained with a large amount of texts where they can find patterns, interpret texts and perform tasks.

Which models pass the test?

Considering current language models, a group of researchers from the University of San Diego proposed to subject GPT-3.5 and GPT-4 to the Turing Test. The group's idea was to get people to chat for 5 minutes via text with an individual. These individuals could be a real person or some language model that was being tested.

New Preprint: People cannot distinguish GPT-4 from a human in a Turing test.

In a pre-registered Turing test we found GPT-4 is judged to be human 54% of the time.

On some interpretations this constitutes the most robust evidence to date that any system passes the Turing test pic.twitter.com/yF6wQjQWsv
— Cameron Jones @NAACL (@camrobjones) May 15, 2024

In the paper, they concluded that GPT-4 managed to fool people half the time. To confirm the test, the group also used the chatbot Eliza, which has a performance of around 22%. Thus, the study concludes that GPT-4 would have passed the Turing Test but humans are still able to identify other humans better most of the time.

GPT-4

GPT-4 is an updated version of the language models that power ChatGPT. These models were created by the company OpenAI and are in their fourth generation. Recently, OpenAI announced that ChatGPT would be modelled on an even more optimised version of GPT-4 called GPT-4o. One of the biggest differences is GPT-4's ability to create longer, more cohesive texts.

Furthermore, ChatGPT also has an interface that can search the internet and answer current questions. This is an advance compared to GPT-3 which was one of the first to be used in ChatGPT when it was introduced but only had information until 2022.

Reference of the news:

Jones & Bergen 2024 People cannot distinguish GPT-4 from a human in a Turing test arXiv.