Google Cuts Token Usage 45% With New Gemini 3.5 Flash Low

May 25, 20263 min read

TL;DR

Google's new Gemini 3.5 Flash Low tier generates 45% fewer tokens than the standard model, addressing developer quota frustrations in Antigravity.

Google shipped a new model variant Monday designed to solve one specific problem: too many tokens burned on simple tasks. Gemini 3.5 Flash Low generates roughly 45% fewer tokens than its predecessor, now relabeled Gemini 3.5 Flash Medium, a structural change that reorganizes the product line and directly addresses weeks of developer frustration with quota limits in Antigravity, the company's AI coding assistant.

The crunch had been building. After Google quietly tightened the AI Pro plan alongside the original Gemini 3.5 Flash launch, developers using Antigravity for software engineering work blew through their quota on routine operations. Google had already raised Antigravity's token ceiling by 9x across two separate increases, Android Authority reported. The fixes held briefly, then the complaints resumed.

Varun Mohan, a director at Google DeepMind overseeing Antigravity, acknowledged the core issue publicly: simple operations were consuming a disproportionate share of the token budget, making the product feel constrained even for paying subscribers. The Low variant targets that exact scenario: lightweight, repetitive work where output volume does not compromise the result.

On benchmarks, Google positions the Low model as generally outperforming the older Gemini 3 Flash, now labeled the High variant, on software engineering tasks. The hedge "generally" leaves interpretive room, but the direction is clear. For the use cases Antigravity handles most, Google argues that cheaper output and better baseline performance are compatible goals, according to the report.

Token efficiency

Alongside the launch, Google reset token quotas on all Gemini plans, paid and free, giving developers a clean allocation heading into the week. The reset came without much announcement but followed a clear logic: the original configuration underestimated real-world consumption, and the fix required both a new model tier and an administrative reset.

One developer challenged Mohan on whether Google had done adequate internal testing before the original release, noting it felt like users were serving as beta testers. Mohan acknowledged the sentiment without a detailed rebuttal, per Android Authority. That response reads as an implicit admission that live usage revealed something pre-launch testing missed.

Shipping fast and calibrating on real-world signal has become standard practice in artificial intelligence product development. The cost is specific when the product is a coding assistant. Developers build workflows around quota assumptions and model behavior; when those shift without warning, pipelines break and trust erodes.

The new hierarchy

Gemini 3.5 Flash is now a three-tier family: Low, Medium, and High. The structure mirrors how competing AI providers have organized their lineups, moving away from single flagships toward a spectrum of capability and cost. It also gives Google more room to differentiate pricing over time without launching entirely new product lines.

Token economics have become a central competitive axis in the broader artificial intelligence market, where providers compete on cost per task as much as raw capability. Forbes identified this race across cost, speed, and capability as structural among leading AI providers heading into 2026, driven partly by enterprise customers seeking predictable spending. A 45% reduction in token output is significant enough to change the unit economics of multi-agent workflows, where consumption compounds across steps. Whether the Low variant preserves sufficient reasoning depth for complex engineering tasks beyond edits and lookups is something developers will verify quickly in production, where benchmark numbers are a starting point, not a verdict.

The quota reset buys goodwill, but it does not answer the harder question. Two increases to Antigravity's limits followed by a new model variant in quick succession suggest the original launch made assumptions about developer usage patterns that reality did not support. Android Authority's coverage points to a candid acknowledgment from Mohan that Google is still calibrating what users actually need at the velocity artificial intelligence development now demands. The next adjustment will say more about whether this iteration solved the problem or deferred it.

---

FAQ

What is Gemini 3.5 Flash Low?
A new variant of Google's Gemini 3.5 Flash model designed to produce around 45% fewer tokens than the standard version, now called Gemini 3.5 Flash Medium. It targets simple and repetitive tasks in tools like the Antigravity coding assistant.

How does the Low tier compare to Medium and High?
Google now offers three Gemini 3.5 Flash tiers. Low generates the fewest tokens, Medium is the relabeled original release, and High refers to the older Gemini 3 Flash. Google says the Low model generally outperforms the High tier on software engineering benchmarks.

What is Antigravity?
Antigravity is Google's AI-powered coding assistant available to Gemini subscribers. Persistent complaints about tight token quotas in Antigravity drove both repeated limit increases and the development of the new Low model variant.

Did Google reset token quotas alongside this launch?
Yes. Google reset quotas across all Gemini plans, paid and free, as part of the Gemini 3.5 Flash Low rollout, ensuring developers have a fresh token allocation for software engineering work this week.