Modeling and Reasoning Math

13hon MSN

AI models are starting to crack high-level math problems

“I was curious to establish a baseline for when LLMs are effectively able to solve open math problems compared to where they ...

Forbes

AI Models Still Struggle With Reasoning — And Here’s Why

Forbes contributors publish independent expert analyses and insights. I write about the economics of AI. What looks like intelligence in AI models may just be memorization. A closer look at benchmarks ...

NextBigFuture

OpenAI o1 Model Sets New Math and Complex Reasoning Records

OpenAI o1 is a new large language model trained with reinforcement learning to perform complex reasoning. o1 thinks before it answers—it can produce a long internal chain of thought before responding ...

TII’s Falcon H1R 7B can out-reason models up to 7x its size — and it’s (mostly) open

According to TII’s technical report, the hybrid approach allows Falcon H1R 7B to maintain high throughput even as response ...

Forbes

OpenAI Unveils O1 - 10 Key Facts About Its Advanced AI Models

OpenAI has introduced the o1 series, its most sophisticated AI models to date, which are designed to excel at complex reasoning and problem-solving tasks. The o1 models, which use reinforcement ...

ExtremeTech

Microsoft's Phi-4-Reasoning Models Bring AI Math and Logic Skills to Smaller Devices

Microsoft has introduced a new set of small language models called Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning, which are described as "marking a new era for efficient AI." These ...

Dagens.com on MSN

Even the best AI models can’t reliably do simple math

A new study digs into why modern AI models stumble over multi-digit multiplication and what kind of training finally makes ...

Business Insider

This DeepSeek demo shows how good the Chinese AI model is at math and reasoning

You're currently following this author! Want to unfollow? Unsubscribe via the link in your email. Follow Alistair Barr Every time Alistair publishes a story, you’ll get an alert straight to your inbox ...

EurekAlert!

MathEval: a comprehensive benchmark for evaluating large language models on mathematical reasoning capabilities

This study introduces MathEval, a comprehensive benchmarking framework designed to systematically evaluate the mathematical reasoning capabilities of large language models (LLMs). Addressing key ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results