1

I asked the following question to both ChatGPT 4 and Bard to see if they can get a simple matrix calculation right (after all Bill Gates said he was impressed by ChatGPT's math ability).

So I asked,

explain to me the result of d in the following numpy code

a = np.array([[[1, 2, 3], [2, 1, 3]]]) b = np.array([[[2, 1, 1], [2, 4, 0]]]) b1 = b.reshape(1, 3, 2) d= a @ b1

Both ChatGPT and Bard got it totally wrong no matter how many times I tried. Because every time they will give me a slightly different result, so I just paste one of them.

This one is from GPT4 and as you can see, 2 out of 4 calculations are wrong, (1*2)+(2*1)+(3*1) = 7 NOT 5, (2*2)+(1*1)+(3*1) = 8 NOT 7

But Why is that?

ChatGPT 4

Bard is no different. The second mistake they both made is they both got reshape wrong but I am only concerned with matrix calculation here.

Bard

I let Bard do it many times and the closest I can get is 3 out 4 calculations right. d_10 = [[2, 1, 3]] @ [[2, 1, 1]] = 2 * 2 + 1 * 1 + 3 * 1 = 7 is wrong:

Bard again

---- update for chatgpt 4 turbo ----

I tried chatgpt 4 turbo, this time although it still got reshape wrong (and a silly mistake I really don't understand) but it finally got the 4 arithmetic calculations right.

reshape wrong for 4 turbo

The last row should [4,0]!

Finally, 4 arithmetic calculations right (but to the wrong reshape matrix)

4 arithmetic calculations right

---- update for Claude v2 & Llama v2 ----

I decided to let Claude v2 & Llama v2 try it. They did not even come closer. I am really surprised to see that. I just pasted llama's answer here.

llama

Qiulang
  • 143
  • 6

1 Answers1

7

In my understanding, LLMs are, very simplified speaking, probabilistic solvers. Math problems such as matrix multiplications are, on the other hand, deterministic in nature. Thus, using a an LLM for solving a math problem mostly comes down to using the wrong tool for the task.

Of course, when it comes to text problems, LLMs should be beneficial because they can "understand" language and thus should be able to do precisely the task in question, translate a problem from language to mathematics. However, this does not work on more complex tasks (generally, many of them apparently do not pass the 6th grade in China https://arxiv.org/pdf/2306.16636.pdf).

An entire discussion about the use of LLMs and their obstacles with math problems can be found here https://arxiv.org/abs/2301.09723

However, some the shortcomings of LLMs can be circumventent. Instead of asking an LLM "How much is 3+2" it should be asked "Please write a program which adds two numbers and use the program to add 3 and 3 and show me the result" - just for much more sophisticated problems, as can be seen here https://www.pnas.org/doi/abs/10.1073/pnas.2123433119

As a simple thought experiment, you can try to set up a feedforward network which adds two numbers, preferably they should add to less than ten. Even when setting up a perfect network (and there is no gurantee that it would actually turn out like that once learned) there are a number of obstacles to face. The idea comes from the book (which I haven't read in a while and have no access to any more) "Deep Learning with Python, Second Edition" by Francois Chollet.