Hi Dan,

2 min readApr 28, 2023

Hi Dan,

It is an interesting question you raise. What is expected and what is needed to be able to judge if that expectation is met or not? Please bear with me (it might get long) as I try to address your question.

Indeed, you and I can both string words (or symbols) together into sentences which are ‘technically coherent’, by which I mean: the sentences comply to the rules of the used language (grammar, usage, syntax, …) sufficiently for those sentences to be interpretable by someone familiar with those rules.

Most, if not all, humans are capable of doing this at various degrees of proficiency, at least as of a certain age and for their native language.

And yes, GPT (and other similar LLMs) is also capable of doing this, at an impressive level of coherence at that. Which is what one could expect as GTP was designed / trained to do.

I am not sure however this is the expectation Russell is testing here though. Assuming I correctly interpret his post, he set out to test if GPT was able to solve problems, specifically a logic problem. Something GPT was, to my knowledge, not designed / trained to do.

To be clear, I’m not saying one cannot expect GPT to be capable of solving problems and/or using logic. However, given it was not trained to do so, it would be quite unexpected if it exhibited such emergent behavior.

I believe this is also Russell’s view and I quote “The fact that GPT-4 could not only understand the problem perfectly, but also undertake the correct logical process towards finding a solution, is absolutely amazing!”

Russell not only states he is amazed though, he also makes it clear what he is amazed about. He claims that GPT-4 understood the problem perfectly and undertook the correct logical process. That’s a pretty strong claim and, I hope you agree, strong claims require strong evidence.

As I said in my reply to Russell’s post, I don’t think GPT-4’s reply is the strong evidence needed to support these claims.

Did GPT-4 ‘understand the problem perfectly’? It’s unclear to me what Russell’s definition of ‘understand’ is, but whatever that may be, GPT-4 does not understand the problem ‘perfectly’ as evidenced by GPT-4 writing that “liars will answer ‘no’ (lying)” when liars are asked ‘are you a truth-teller’.

But more importantly, it is also possible to explain GPT-4’s response considering only what it was designed / trained to do. GPT-4 is designed / trained to respond in the tone, style, structure of the prompt it is given, including in the response terms found in the prompt and using words related to those terms. Which is what it is doing in this case.

So what level of evidence would I expect? Strong evidence.

Written by Tom Kochuyt

Responses (1)