Google Gemma 4: Free Open AI Models You Can Use Commercially

April 20, 20263 min read

TL;DR

Google DeepMind releases its most capable open model family under Apache 2.0, allowing free commercial use and taking on Meta in the open AI race.

Google DeepMind has officially released Gemma 4, its most powerful family of open AI models to date, built on the same technology underpinning the company's flagship Gemini 3 system. The launch, announced on April 2, represents a strategic inflection point for Google's approach to open-source AI: for the first time, the Gemma models ship under the Apache 2.0 license, granting developers full commercial freedom and what the company describes as "complete digital sovereignty" over their data, infrastructure, and models.

The Gemma 4 family spans four model sizes designed to cover workloads from cloud-scale inference to edge deployment. The largest, a 31-billion-parameter dense model, has claimed the number three position among all open models on the Arena AI Text Leaderboard — a remarkable achievement given that it outperforms models up to 20 times its size. A 26-billion-parameter Mixture of Experts variant, ranked sixth on the same leaderboard, activates only 3.8 billion parameters during inference, delivering competitive performance at a fraction of the computational cost. Rounding out the lineup are the E4B and E2B edge models, purpose-built for mobile devices and IoT hardware.

The technical ambitions of Gemma 4 extend well beyond raw language modeling. The models are natively multimodal and designed for agentic workflows, capable of processing text, images, video, and — in the smaller edge variants — audio. Language coverage spans more than 140 languages, and the larger models support a 256,000-token context window, with the edge models handling up to 128,000 tokens. These specifications position Gemma 4 as a versatile foundation for developers building applications that require long-context reasoning and multi-format understanding.

Perhaps the most striking numbers come from the edge models. The E2B variant runs in less than 1.5 gigabytes of memory on supported devices, processing 4,000 input tokens across two distinct capabilities in under three seconds. On a Raspberry Pi 5 — an $80 single-board computer — the model achieves 133 tokens per second during prefill and 7.6 tokens per second during decoding. These benchmarks underscore Google's push to make capable AI models practical on consumer and embedded hardware, not just data center GPUs.

Availability is broad and immediate. The models can be downloaded from Hugging Face, Kaggle, and Ollama, or accessed through Google AI Studio and AI Edge Gallery. NVIDIA has announced targeted optimizations for running Gemma 4 on its RTX consumer GPUs, and the models support 2-bit and 4-bit quantization for further memory savings. Platform support covers Android, iOS, Windows, Linux, macOS, browser-based deployment via WebGPU, and embedded devices including Raspberry Pi 5 boards and Qualcomm IQ8 NPU chips.

The licensing shift to Apache 2.0 may prove to be the most consequential element of the release. Previous Gemma models shipped under a proprietary Google license that imposed restrictions on usage and redistribution, limiting adoption among enterprises and researchers wary of vendor lock-in. Apache 2.0 removes those barriers entirely, allowing unrestricted commercial use, modification, and redistribution — putting Gemma 4 on equal legal footing with truly open-source projects and distinguishing it from Meta's Llama models, which carry their own custom license terms.

The timing of the release is no coincidence. Google faces intensifying competition in the open model space from both Western and Chinese rivals. Meta's Llama 4 continues to command significant developer mindshare, while Alibaba's Qwen 3.6 has gained traction particularly in multilingual and Asian-market deployments. By combining strong benchmark performance, genuine permissive licensing, and an unusually wide hardware compatibility matrix, Google appears to be betting that developer trust and ecosystem breadth will prove more valuable than maintaining proprietary control over its model weights.

For the broader AI ecosystem, Gemma 4's arrival signals that the competition among tech giants is increasingly being fought on the open frontier. As the performance gap between open and proprietary models continues to narrow, the terms under which models are released — licensing, hardware requirements, and deployment flexibility — are becoming as important as the benchmarks themselves. Google's decision to embrace Apache 2.0 suggests the company believes the next phase of AI adoption will be won not by those who build the best models behind closed doors, but by those who put the most capable tools directly into developers' hands.