A great interview with Ted Chiang:
Predicting the most likely next word is different from having correct information about the world, which is why LLMs are not a reliable way to get the answers to questions, and I don’t think there is good evidence to suggest that they will become reliable. Over the past couple of years, there have been some papers published suggesting that training LLMs on more data and throwing more processing power at the problem provides diminishing returns in terms of performance. They can get better at reproducing patterns found online, but they don’t become capable of actual reasoning; it seems that the problem is fundamental to their architecture. And you can bolt tools onto the side of an LLM, like giving it a calculator it can use when you ask it a math problem, or giving it access to a search engine when you want up-to-date information, but putting reliable tools under the control of an unreliable program is not enough to make the controlling program reliable. I think we will need a different approach if we want a truly reliable question answerer.