is window size and context length of language model one and the same thing?
******** following text is added as question with ONLY above text was not allowed ***** I am trying to understand how GPT model is trained and this question to my mind. I tried to search answer on google but couldn't find an answer thus asking here.