Do Large Language Models learn world models or just surface statistics?

published on 2023/01/22

How do these models achieve this kind of performance? Do they merely memorize training data and reread it out loud, or are they picking up the rules of English grammar and the syntax of C language? Are they building something like an internal world model—an understandable model of the process producing the sequences?

The Gradient