Google resets Gemini 3.5 Flash quotas and fixes output quality
AI

Google resets Gemini 3.5 Flash quotas and fixes output quality

June 3, 20263 min read
TL;DR

Google's refreshed Gemini 3.5 Flash model addresses quality regressions from its Low-effort variant while resetting quotas to zero for free and paid users.

Google wiped quota counters to zero for every Gemini user on Tuesday and shipped a refreshed version of its 3.5 Flash model, addressing a quality regression that had been quietly frustrating developers for weeks.

Varun Mohan, a director at Google DeepMind working on the company's internal Antigravity platform, announced the deployment on X. The update delivers what Mohan described as "much less" token generation alongside "significantly higher endurance" on harder software engineering tasks, per Android Authority. The quota reset covers both free and paid tiers.

The problem it's fixing

The trouble began with an earlier iteration. Google had introduced a "Low-effort" variant of Gemini 3.5 Flash to stop the model from over-processing simple requests and draining developer quotas on basic coding queries. That variant cut token generation by roughly 45% compared to the original, now retroactively labeled the "Medium" variant.

Developers noticed the model struggled once tasks crossed a complexity threshold. A request that looked routine but required deeper analysis would yield inconsistent output and structural problems. Google had traded efficiency for endurance, leaving a gap precisely where reliability matters most. Whether the new fix applies to the Low or Medium effort variant specifically remains unclear.

The broader context

Rate limit resets are a familiar Google gesture, typically deployed alongside model updates to let developers stress-test new behavior without running into prior quota ceilings. The reset carries no cost for users but signals that Google expects usage patterns to shift.

What the company is navigating here is a tension running through the entire artificial intelligence industry: smaller, faster models optimized for throughput degrade on edge cases, while larger models burn tokens on tasks that don't need them. For developer tools, unpredictable output quality is more damaging than raw speed.

Gemini is also expanding on other fronts this week. PCWorld reported that Google's AI avatar tool, which allows paid subscribers to generate personalized videos using the Gemini app, is entering general availability. Another sign that the platform's commercial surface area is growing alongside its model revisions.

Industry backdrop

Google isn't alone in iterating under scrutiny. Anthropic this week expanded enterprise access to its Mythos-class artificial intelligence models, increasing the number of companies under Project Glasswing from 50 to 150, according to Yahoo Finance. The Mythos models have drawn regulatory attention for their ability to surface vulnerabilities in legacy software systems.

That capability shaped a now-shelved White House executive order. President Trump canceled the planned signing on May 21, citing concerns about domestic competition and U.S. advantage over China. Wired reported that internal deliberations since have been widely described as chaotic, and a revised order remains possible but uncertain.

What to watch

The iteration pace inside Antigravity reflects a faster, more public feedback loop than Google DeepMind typically ran two years ago. Multiple variants, rapid fixes, individual team leads posting updates on X -- whether this is genuine developer responsiveness or strategic visibility is probably both.

For engineers and product teams building on Gemini, the takeaway is concrete: quality issues in the Low-effort variant should now be resolved, quotas are reset, and the updated model is positioned for tasks that blur the line between lightweight and complex. The open question is whether Google will consolidate its effort tiers or publish benchmarks that let developers anticipate which tier a task demands.

Frequently asked questions

What is Gemini 3.5 Flash?
Gemini 3.5 Flash is a mid-tier model in Google's Gemini family, built for speed and efficiency. Google has been iterating on it inside Antigravity, its internal testing platform, with variants tuned for different workloads.

Why did Google reset Gemini rate limits?
The quota reset accompanied the new model deployment. Google typically zeroes counters after significant updates to give all users equal footing to test the new version.

What is the difference between the Low and Medium effort variants?
The Low-effort variant cut token generation by roughly 45% versus the original, now called Medium, to reduce costs on simple tasks. The tradeoff was lower output quality on moderately complex requests, which the latest update aims to correct.

How does this affect developers using the Gemini API?
Developers should see improved output consistency on tasks that previously triggered quality drops. The rate limit reset means quotas restart from zero, giving more headroom to evaluate the updated model.