Gemini 3.5 Flash might be fast enough for gen AI to make sense

1966woodenghost May 20, 2026 0

According to Doshi, the team made numerous improvements in pre-training with Gemini 3.5 Flash, but insights gleaned from how devs use Gemini models are really paying off.

“With post-training, we’re really starting to unlock some of the value of the feedback we’re getting from users, for example, from Antigravity,” said Doshi. “That’s really what you’re seeing play out in terms of the code performance and the tool use performance. And then, the hope is that you’ll continue to see the step change where 3.5 Pro will be better, and the next Flash meets Pro performance with that series.”

Google is focused on code generation with the new model, which is a core agentic angle for AI. Both Terminal Bench and SWE-Bench Pro tests show substantial improvements—3.5 Flash clobbers older Flash models and shows a small but measurable improvement versus Gemini 3.1 Pro. Its scores are in the same neighborhood as OpenAI’s much larger and more expensive GPT 5.5.

A major barrier in agentic workflows is how generative models can use interfaces designed for humans. It’s not an easy problem to solve, Doshi said. “Certain things like UI control are expensive to do because the model has to search the page, it has to know where to click, it has to act through multiple steps. I think Flash is able to do that well because of that combination of quality and cost.”

Google’s AI evaluations demonstrate these improvements, too. Among Google’s current collection of benchmarks is OSWorld-Verified, which tests how models handle general tasks in real computing environments. It’s similar to the coding improvements. Gemini 3.5 Flash substantially outperforms older Flash models and is even a bit faster than Gemini 3.1 Pro. It’s essentially tied with GPT 5.5.

Google’s new Flash model is, again, a little better than the last-gen Pro.

Credit:

Google

Gemini 3.5 Flash has been rolled out internally at Google, and Doshi noted that it’s having a big impact. “We have a set of internal metrics we’ve been evaluating that measures how Googlers code, so looking at our own code bases and how well the models perform on that,” Doshi said. “And you can see a massive, massive jump between where 3.1 Pro was and where 3.5 Flash is.”

Source: arstechnica.com…