Predicting the most likely next word is different from having correct information about the world, which is why LLMs are not a reliable way to get the answers to questions, and I don’t think there is good evidence to suggest that they will become reliable. Over the past couple of years, there have been some papers published suggesting that training LLMs on more data and throwing more processing power at the problem provides diminishing returns in terms of performance. They can get better at reproducing patterns found online, but they don’t become capable of actual reasoning; it seems that the problem is fundamental to their architecture. And you can bolt tools onto the side of an LLM, like giving it a calculator it can use when you ask it a math problem, or giving it access to a search engine when you want up-to-date information, but putting reliable tools under the control of an unreliable program is not enough to make the controlling program reliable. I think we will need a different approach if we want a truly reliable question answerer.
In this piece published a year ago, Ted Chiang pours cold water on the idea of a bootstrapping singularity.
How much can you optimize for generality? To what extent can you simultaneously optimize a system for every possible situation, including situations never encountered before? Presumably, some improvement is possible, but the idea of an intelligence explosion implies that there is essentially no limit to the extent of optimization that can be achieved. This is a very strong claim. If someone is asserting that infinite optimization for generality is possible, I’d like to see some arguments besides citing examples of optimization for specialized tasks.