Audio By Carbonatix
Back in May, a secret meeting was held involving 30 of the world’s best mathematicians. The reason for the meeting was to see if they could come up with mathematical questions so difficult that artificial intelligence (AI) couldn’t even solve them.
During the meeting, these 30 mathematicians took on a “reasoning” chatbot, the large language model (LLM) o4-mini. The project, FrontierMath, had previously created mathematical questions that only a small group of people in the world would be capable of developing, according to a report by Scientific America.
“FrontierMath comprises 350 original mathematics problems spanning from challenging university-level questions to problems that may take expert mathematicians days to solve, covering a wide variety of topics,” the researchers running the project wrote on their website. “The questions demand creative insight, connecting disparate concepts, and sophisticated reasoning.
“The benchmark is organized into tiers of increasing difficulty. The most challenging tier (Tier 4) contains 50 extremely difficult problems developed as short-term research projects by mathematics professors and postdoctoral researchers. Solving these tasks would provide evidence that AI can perform the complex reasoning needed for scientific breakthroughs in technical domains.”
To ensure that the LLM couldn’t “cheat,” the mathematicians at this secret meeting had to sign a nondisclosure agreement and communicate by only using the messaging app Signal, because the LLM could potentially scan a traditional e-mail, therefore training it before the test even began. As an added bonus, each question that the mathematicians came up with that stumped o4-mini would earn them a $7,500 reward.
For two days in May, the 30 scholars were broken up into six groups and tasked with coming up with questions that were solvable by them, but not by o4-mini. They ended up creating just 10 questions that stumped the artificial intelligence chatbot.
“I was not prepared to be contending with an LLM like this,” said Ken Ono, a mathematician at the University of Virginia and a leader of the meeting. “I’ve never seen that kind of reasoning before in models. That’s what a scientist does. That’s frightening.”
So what do the results of the meeting mean for the future of mathematicians? The group said that if artificial intelligence is ever able to correctly answer “tier five” mathematical questions – questions that even the best mathematicians can’t solve – it could cause a seismic change. One that could see mathematicians go from solving problems to merely posing the questions to reasoning AI bots to discover new mathematical truths.
“I don’t want to add to the hysteria, but in some ways these large language models are already outperforming most of our best graduate students in the world,” said Ono.