How do LLM's solve math exactly?
I'm watching this video by andrej karpathy and he mentions that after training we use reinforcement learning for the model . But I don't understand how it can work on newer data , when all the model is technically doing is predicting the next word in the sequence .Even though we do feed it questions and ideal answers how is it able to use that on different questions .
Now obviously llms arent super amazing at math but they're pretty good even on problems they probably haven't seen before . How does that work?
p.s you probably already guessed but im a newbie to ml , especially llms , so i'm sorry if what i said is completely wrong lmao