ChatGPT-4 and Other LLMs in the Turing Test: A Critical
Analysis

Giunti, Marco

doi:10.1007/s11023-026-09760-5

This paper critically examines the recent publication “ChatGPT-4 in the Turing Test” by Restrepo Echavarría (Minds and Machines 35:8, 2025) challenging its central claims regarding the absence of minimally serious test implementations and the conclusion that ChatGPT-4 fails the Turing Test. The analysis reveals that the criticisms based on rigid criteria and limited experimental data are not fully justified. More importantly, the paper makes several constructive contributions that enrich our understanding of Turing Test implementations. It demonstrates that two distinct formats, the three-player and two-player tests, are both valid, each with unique methodological implications. The work distinguishes between absolute criteria for passing the test—the machine’s probability of incorrect identification equals or exceeds the human’s probability of correct identification—and relative criteria—which measure how closely a machine’s performance approximates that of a human—, offering a more nuanced evaluation framework. Furthermore, the paper clarifies the probabilistic underpinnings of both test types by modeling them as Bernoulli experiments—correlated in the three-player version and uncorrelated in the two-player version. This formalization allows for a rigorous separation between the theoretical criteria for passing the test, defined in probabilistic terms, and the experimental data that require robust statistical methods for proper interpretation. In doing so, the paper not only refutes key aspects of the criticized study but also lays a solid foundation for future research on objective measures of how closely an AI’s behavior aligns with, or deviates from, that of a human.

ChatGPT-4 and Other LLMs in the Turing Test: A Critical Analysis

Marco Giunti

2026-01-01

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Anno di pubblicazione

2026

Parole chiave

Turing Test ; Two-player test vs. three-player test ; Large Language Models; Criteria for passing the Turing Test; Degree of humanness; Statistical methods applied to the Turing Test

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/470085

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

UNICA IRIS Institutional Research Information System

ChatGPT-4 and Other LLMs in the Turing Test: A Critical Analysis

Marco Giunti

2026-01-01

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Attenzione

Citazioni

social impact

UNICA IRIS Institutional Research Information System

ChatGPT-4 and Other LLMs in the Turing Test: A Critical Analysis

Marco Giunti

2026-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Attenzione

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)