I have an example of a generated image containing words, as well as several red arrows pointing to certain characters.

I need to get these characters from GPT, but when I ask "what characters do the red arrows point to?" it gives the wrong letters, although it can correctly recognize all the text in the image and send it as a message.
Maybe there is another way to explain it to him? Or could he handle it in other versions of the API?