{"id":95562,"date":"2026-06-06T17:38:21","date_gmt":"2026-06-06T17:38:21","guid":{"rendered":"https:\/\/youzum.net\/google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory\/"},"modified":"2026-06-06T17:38:21","modified_gmt":"2026-06-06T17:38:21","slug":"google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory","status":"publish","type":"post","link":"https:\/\/youzum.net\/zh\/google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory\/","title":{"rendered":"Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory"},"content":{"rendered":"<p class=\"wp-block-paragraph\">Google DeepMind released Quantization-Aware Training (QAT) checkpoints for the Gemma 4 family. The release targets local deployment on edge devices and consumer GPUs. It follows the Gemma 4 launch in April and a 12B model two days earlier.<\/p>\n<p class=\"wp-block-paragraph\">We compared the available Gemma 4 edge-model formats using only published numbers. The goal was simple. Show what each precision level costs in memory. Then show what QAT actually changes.<\/p>\n<h2 class=\"wp-block-heading\"><strong>What QAT actually does<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">Quantization shrinks a model by lowering weight precision. Standard Post-Training Quantization (PTQ) compresses a finished model. That often degrades quality. QAT instead simulates quantization during training. The model learns to compensate for the precision loss.<\/p>\n<p class=\"wp-block-paragraph\">Google\u2019s AI team states its QAT results yield higher overall quality than standard PTQ baselines. Google did not publish Gemma 4 QAT benchmark scores in the announcement. For context, Gemma 3 QAT cut the Q4_0 perplexity drop by 54% using llama.cpp evaluation. We cite that only as prior-generation precedent.<\/p>\n<h2 class=\"wp-block-heading\"><strong>The comparison task<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">Compare Gemma 4 E2B and E4B across three formats. The formats are BF16, Q4_0 QAT, and the new mobile QAT schema. Rank them on memory footprint, quality preservation, and on-device accessibility. Use published figures only.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Memory results<\/strong><\/h2>\n<figure class=\"wp-block-table\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<th>Format<\/th>\n<th>E2B<\/th>\n<th>E4B<\/th>\n<th>Basis<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>BF16 (16-bit)<\/td>\n<td>9.6 GB<\/td>\n<td>15 GB<\/td>\n<td>Official Gemma 4 docs<\/td>\n<\/tr>\n<tr>\n<td>Q4_0 (4-bit, QAT)<\/td>\n<td>3.2 GB<\/td>\n<td>5 GB<\/td>\n<td>Official Gemma 4 docs<\/td>\n<\/tr>\n<tr>\n<td>Mobile (QAT, E2B)<\/td>\n<td>~1 GB<\/td>\n<td>\u2014<\/td>\n<td>QAT announcement<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<p class=\"wp-block-paragraph\">The Q4_0 figures match the footprint of PTQ Q4_0. QAT does not change the size at a given format. It improves quality at that size. The new mobile schema delivers the additional reduction.<\/p>\n<p class=\"wp-block-paragraph\">Using that mobile schema, Google reduced Gemma 4 E2B to about 1GB. Developers can go lower still. The text-only model without Per-Layer Embeddings needs under 1GB, dropping the audio and vision encoders.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Per-format breakdown<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">BF16 is the quality baseline. E2B needs 9.6 GB and E4B needs 15 GB. It is the reference point, not a phone deployment target.<\/p>\n<p class=\"wp-block-paragraph\">Q4_0 QAT is the general-purpose local format. E2B drops to 3.2 GB and E4B to 5 GB. QAT preserves more quality here than PTQ at the same size. This format fits consumer GPUs. Earlier E2B testing also ran on a Raspberry Pi 5 at INT4.<\/p>\n<p class=\"wp-block-paragraph\">The mobile format is the edge-specialized schema. It brings E2B to about 1 GB. It uses static activations, channel-wise quantization, and targeted 2-bit compression.<\/p>\n<h2 class=\"wp-block-heading\"><strong>How the mobile schema works<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">Google AI team engineered four techniques for mobile hardware. Static activations pre-calculate scaling during training, reducing on-device work. Channel-wise quantization fits the design of mobile accelerators. Targeted 2-bit quantization compresses only the token-generation layers. Embedding and KV cache optimization shrinks the active memory footprint.<\/p>\n<p class=\"wp-block-paragraph\">Core reasoning layers stay at higher precision. That protects capability while cutting storage. Developers can also deploy text-only and drop the audio and vision encoders. That trims memory further for use cases that need no multimodality.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Dimension breakdown<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">Scores are a qualitative ranking of the formats for on-device use. Memory is the only hard-measured axis. Quality reflects Google\u2019s disclosed design, not measured Gemma 4 numbers. Each score has a one-line basis.<\/p>\n<figure class=\"wp-block-table\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<th>Dimension<\/th>\n<th>BF16<\/th>\n<th>Q4_0 QAT<\/th>\n<th>Mobile QAT<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Memory footprint<\/td>\n<td>1 \u2014 heaviest, 9.6 GB E2B<\/td>\n<td>4 \u2014 3.2 GB E2B<\/td>\n<td>5 \u2014 ~1 GB E2B text-only<\/td>\n<\/tr>\n<tr>\n<td>Quality preservation<\/td>\n<td>5 \u2014 full-precision baseline<\/td>\n<td>4 \u2014 QAT-preserved, near baseline<\/td>\n<td>3 \u2014 2-bit token layers, core kept higher<\/td>\n<\/tr>\n<tr>\n<td>Decode speed<\/td>\n<td>2 \u2014 no quantization speedup<\/td>\n<td>4 \u2014 4-bit accelerates decode<\/td>\n<td>5 \u2014 mobile-optimized static activations<\/td>\n<\/tr>\n<tr>\n<td>Deployment breadth<\/td>\n<td>4 \u2014 loadable but heavy<\/td>\n<td>5 \u2014 llama.cpp, Ollama, LM Studio, vLLM, MLX<\/td>\n<td>3 \u2014 LiteRT-LM, Transformers.js, edge-focused<\/td>\n<\/tr>\n<tr>\n<td>On-device accessibility<\/td>\n<td>1 \u2014 needs large GPU<\/td>\n<td>4 \u2014 consumer GPU, Raspberry Pi 5<\/td>\n<td>5 \u2014 runs on phones<\/td>\n<\/tr>\n<tr>\n<td><strong>Total (\/25)<\/strong><\/td>\n<td><strong>13<\/strong><\/td>\n<td><strong>21<\/strong><\/td>\n<td><strong>21<\/strong><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<h2 class=\"wp-block-heading\"><strong>Winner<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">The result is a tie by design. Q4_0 QAT and mobile QAT both score 21, but for different hardware. For phones, the mobile format leads. It reaches about 1GB on E2B and targets mobile accelerators directly. For laptops and consumer GPUs, Q4_0 QAT is the practical default. BF16 stays the quality reference, not a local choice.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Methodology and limits<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">Memory figures come from Google\u2019s Gemma 4 documentation. The ~1GB E2B figure comes from the QAT announcement. Quality is Google\u2019s stated claim. No independent Gemma 4 QAT quality numbers were published at release. We did not run the models locally for this comparison. Developers should test at their own quantization and workload before building.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h2>\n<ul class=\"wp-block-list\">\n<li>Q4_0 QAT cuts Gemma 4 E2B to 3.2 GB and E4B to 5 GB, from 9.6 GB and 15 GB at BF16.<\/li>\n<li>A new mobile QAT schema brings E2B to about 1 GB; text-only without PLE goes under 1 GB.<\/li>\n<li>QAT changes quality at a given size, not the size itself; the mobile format drives the extra memory cut.<\/li>\n<li>Google claims higher quality than PTQ but published no Gemma 4 QAT benchmark numbers at release.<\/li>\n<li>Weights ship today on Hugging Face with llama.cpp, Ollama, LM Studio, vLLM, MLX, and LiteRT-LM support.<\/li>\n<\/ul>\n<h2 class=\"wp-block-heading\"><strong>Marktechpost\u2019s Visual Explainer<\/strong><\/h2>\n<div>\n<div class=\"g4-track\">\n<div class=\"g4-slide is-on\">\n      <span class=\"g4-kicker\">Marktechpost \u00b7 Benchmark<\/span>\n<h2>Gemma 4 QAT: Comparing Q4_0 and the New Mobile Format<\/h2>\n<p>Google DeepMind released Quantization-Aware Training checkpoints for Gemma 4. We compared three edge-model formats on published numbers.<\/p>\n<div class=\"g4-card\">\n<p class=\"g4-sub\"><strong>Formats compared<\/strong><\/p>\n<p class=\"g4-sub\">BF16 (16-bit) \u00a0\u00b7\u00a0 Q4_0 QAT (4-bit) \u00a0\u00b7\u00a0 Mobile QAT<\/p>\n<p class=\"g4-meta\">June 5, 2026<\/p>\n<\/div>\n<\/div>\n<div class=\"g4-slide\">\n      <span class=\"g4-kicker\">The Comparison Task<\/span>\n<h3>What we ranked<\/h3>\n<pre><code>$ compare gemma-4 --models E2B,E4B \n    --formats BF16,Q4_0-QAT,MOBILE-QAT \n    --rank memory,quality,accessibility \n    --source published-only --no-self-run<\/code><\/pre>\n<p class=\"g4-sub\">Memory from official Gemma 4 docs. Quality from Google\u2019s stated claim. No models run locally.<\/p>\n<\/div>\n<div class=\"g4-slide\">\n      <span class=\"g4-kicker\">Format 1 of 3 \u00b7 Reference<\/span>\n<h3>BF16 (16-bit)<\/h3>\n<div class=\"g4-score\">13<span> \/ 25<\/span><\/div>\n<div class=\"g4-card\">\n<p>The full-precision quality baseline. E2B needs 9.6 GB and E4B needs 15 GB.<\/p>\n<p class=\"g4-meta\">Top observation: a reference point, not a phone or laptop deployment target.<\/p>\n<\/div>\n<\/div>\n<div class=\"g4-slide\">\n      <span class=\"g4-kicker\">Format 2 of 3 \u00b7 Laptop \/ GPU<\/span>\n<h3>Q4_0 QAT (4-bit)<\/h3>\n<div class=\"g4-score\">21<span> \/ 25<\/span><\/div>\n<div class=\"g4-card\">\n<p>The general-purpose local format. E2B drops to 3.2 GB and E4B to 5 GB.<\/p>\n<p class=\"g4-meta\">Top observation: QAT preserves more quality than PTQ at the same 4-bit size.<\/p>\n<\/div>\n<\/div>\n<div class=\"g4-slide\">\n      <span class=\"g4-kicker\">Format 3 of 3 \u00b7 Mobile<\/span>\n<h3>Mobile QAT<\/h3>\n<div class=\"g4-score\">21<span> \/ 25<\/span><\/div>\n<div class=\"g4-card\">\n<p>The edge-specialized schema. Brings E2B to about 1 GB.<\/p>\n<p class=\"g4-meta\">Top observation: 2-bit on token layers, reasoning layers kept at higher precision.<\/p>\n<\/div>\n<\/div>\n<div class=\"g4-slide\">\n      <span class=\"g4-kicker\">Leaderboard<\/span>\n<h3>Full ranking<\/h3>\n<table>\n<thead>\n<tr>\n<th>Dimension<\/th>\n<th>BF16<\/th>\n<th>Q4_0 QAT<\/th>\n<th>Mobile QAT<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Memory footprint<\/td>\n<td>1<\/td>\n<td>4<\/td>\n<td class=\"g4-win\">5<\/td>\n<\/tr>\n<tr>\n<td>Quality preservation<\/td>\n<td class=\"g4-win\">5<\/td>\n<td>4<\/td>\n<td>3<\/td>\n<\/tr>\n<tr>\n<td>Decode speed<\/td>\n<td>2<\/td>\n<td>4<\/td>\n<td class=\"g4-win\">5<\/td>\n<\/tr>\n<tr>\n<td>Deployment breadth<\/td>\n<td>4<\/td>\n<td class=\"g4-win\">5<\/td>\n<td>3<\/td>\n<\/tr>\n<tr>\n<td>On-device accessibility<\/td>\n<td>1<\/td>\n<td>4<\/td>\n<td class=\"g4-win\">5<\/td>\n<\/tr>\n<tr>\n<td><strong>Total<\/strong><\/td>\n<td><strong>13<\/strong><\/td>\n<td class=\"g4-win\"><strong>21<\/strong><\/td>\n<td class=\"g4-win\"><strong>21<\/strong><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p class=\"g4-sub\">Tie by design: Q4_0 wins laptops and GPUs; mobile wins phones.<\/p>\n<\/div>\n<div class=\"g4-slide\">\n      <span class=\"g4-kicker\">Key Takeaways<\/span>\n<h3>What developers should know<\/h3>\n<ul>\n<li>Q4_0 QAT cuts E2B to 3.2 GB and E4B to 5 GB, from 9.6 GB and 15 GB at BF16.<\/li>\n<li>A new mobile QAT schema brings E2B to about 1 GB; text-only without PLE goes under 1 GB.<\/li>\n<li>QAT changes quality at a given size; the mobile format drives the extra memory cut.<\/li>\n<li>Google claims higher quality than PTQ but published no Gemma 4 QAT numbers.<\/li>\n<li>Weights ship today on Hugging Face with llama.cpp, Ollama, vLLM, and MLX support.<\/li>\n<\/ul><\/div>\n<\/div>\n<div class=\"g4-nav\">\n    <button class=\"g4-btn\">\u2039 Prev<\/button>\n<div class=\"g4-dots\"><\/div>\n<p>    <button class=\"g4-btn\">Next \u203a<\/button>\n  <\/p><\/div>\n<div class=\"g4-foot\">Memory: official Gemma 4 documentation. ~1 GB E2B: QAT announcement (mobile format); text-only without PLE is under 1 GB. Quality: Google\u2019s stated claim \u2014 no independent Gemma 4 QAT scores at release.<\/div>\n<\/div>\n<p class=\"wp-block-paragraph\">\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<\/p><p class=\"wp-block-paragraph\">\n<\/p><p class=\"wp-block-paragraph\">Check out\u00a0the\u00a0<strong>Model weights <\/strong>(<strong><a href=\"https:\/\/huggingface.co\/collections\/google\/gemma-4-qat-q4-0\">Q4_0 QAT collection<\/a>, <a href=\"https:\/\/huggingface.co\/collections\/google\/gemma-4-qat-mobile\">Mobile QAT collection<\/a>)<\/strong> and <a href=\"https:\/\/blog.google\/innovation-and-ai\/technology\/developers-tools\/quantization-aware-training-gemma-4\/\" target=\"_blank\" rel=\"noreferrer noopener\">Google blog (QAT release)<\/a><strong>.\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">150k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p class=\"wp-block-paragraph\">Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.?\u00a0<strong><a href=\"https:\/\/forms.gle\/wbash1wF6efRj8G58\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Connect with us<\/mark><\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/06\/05\/google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory\/\">Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Google DeepMind released Quantization-Aware Training (QAT) checkpoints for the Gemma 4 family. The release targets local deployment on edge devices and consumer GPUs. It follows the Gemma 4 launch in April and a 12B model two days earlier. We compared the available Gemma 4 edge-model formats using only published numbers. The goal was simple. Show what each precision level costs in memory. Then show what QAT actually changes. What QAT actually does Quantization shrinks a model by lowering weight precision. Standard Post-Training Quantization (PTQ) compresses a finished model. That often degrades quality. QAT instead simulates quantization during training. The model learns to compensate for the precision loss. Google\u2019s AI team states its QAT results yield higher overall quality than standard PTQ baselines. Google did not publish Gemma 4 QAT benchmark scores in the announcement. For context, Gemma 3 QAT cut the Q4_0 perplexity drop by 54% using llama.cpp evaluation. We cite that only as prior-generation precedent. The comparison task Compare Gemma 4 E2B and E4B across three formats. The formats are BF16, Q4_0 QAT, and the new mobile QAT schema. Rank them on memory footprint, quality preservation, and on-device accessibility. Use published figures only. Memory results Format E2B E4B Basis BF16 (16-bit) 9.6 GB 15 GB Official Gemma 4 docs Q4_0 (4-bit, QAT) 3.2 GB 5 GB Official Gemma 4 docs Mobile (QAT, E2B) ~1 GB \u2014 QAT announcement The Q4_0 figures match the footprint of PTQ Q4_0. QAT does not change the size at a given format. It improves quality at that size. The new mobile schema delivers the additional reduction. Using that mobile schema, Google reduced Gemma 4 E2B to about 1GB. Developers can go lower still. The text-only model without Per-Layer Embeddings needs under 1GB, dropping the audio and vision encoders. Per-format breakdown BF16 is the quality baseline. E2B needs 9.6 GB and E4B needs 15 GB. It is the reference point, not a phone deployment target. Q4_0 QAT is the general-purpose local format. E2B drops to 3.2 GB and E4B to 5 GB. QAT preserves more quality here than PTQ at the same size. This format fits consumer GPUs. Earlier E2B testing also ran on a Raspberry Pi 5 at INT4. The mobile format is the edge-specialized schema. It brings E2B to about 1 GB. It uses static activations, channel-wise quantization, and targeted 2-bit compression. How the mobile schema works Google AI team engineered four techniques for mobile hardware. Static activations pre-calculate scaling during training, reducing on-device work. Channel-wise quantization fits the design of mobile accelerators. Targeted 2-bit quantization compresses only the token-generation layers. Embedding and KV cache optimization shrinks the active memory footprint. Core reasoning layers stay at higher precision. That protects capability while cutting storage. Developers can also deploy text-only and drop the audio and vision encoders. That trims memory further for use cases that need no multimodality. Dimension breakdown Scores are a qualitative ranking of the formats for on-device use. Memory is the only hard-measured axis. Quality reflects Google\u2019s disclosed design, not measured Gemma 4 numbers. Each score has a one-line basis. Dimension BF16 Q4_0 QAT Mobile QAT Memory footprint 1 \u2014 heaviest, 9.6 GB E2B 4 \u2014 3.2 GB E2B 5 \u2014 ~1 GB E2B text-only Quality preservation 5 \u2014 full-precision baseline 4 \u2014 QAT-preserved, near baseline 3 \u2014 2-bit token layers, core kept higher Decode speed 2 \u2014 no quantization speedup 4 \u2014 4-bit accelerates decode 5 \u2014 mobile-optimized static activations Deployment breadth 4 \u2014 loadable but heavy 5 \u2014 llama.cpp, Ollama, LM Studio, vLLM, MLX 3 \u2014 LiteRT-LM, Transformers.js, edge-focused On-device accessibility 1 \u2014 needs large GPU 4 \u2014 consumer GPU, Raspberry Pi 5 5 \u2014 runs on phones Total (\/25) 13 21 21 Winner The result is a tie by design. Q4_0 QAT and mobile QAT both score 21, but for different hardware. For phones, the mobile format leads. It reaches about 1GB on E2B and targets mobile accelerators directly. For laptops and consumer GPUs, Q4_0 QAT is the practical default. BF16 stays the quality reference, not a local choice. Methodology and limits Memory figures come from Google\u2019s Gemma 4 documentation. The ~1GB E2B figure comes from the QAT announcement. Quality is Google\u2019s stated claim. No independent Gemma 4 QAT quality numbers were published at release. We did not run the models locally for this comparison. Developers should test at their own quantization and workload before building. Key Takeaways Q4_0 QAT cuts Gemma 4 E2B to 3.2 GB and E4B to 5 GB, from 9.6 GB and 15 GB at BF16. A new mobile QAT schema brings E2B to about 1 GB; text-only without PLE goes under 1 GB. QAT changes quality at a given size, not the size itself; the mobile format drives the extra memory cut. Google claims higher quality than PTQ but published no Gemma 4 QAT benchmark numbers at release. Weights ship today on Hugging Face with llama.cpp, Ollama, LM Studio, vLLM, MLX, and LiteRT-LM support. Marktechpost\u2019s Visual Explainer Marktechpost \u00b7 Benchmark Gemma 4 QAT: Comparing Q4_0 and the New Mobile Format Google DeepMind released Quantization-Aware Training checkpoints for Gemma 4. We compared three edge-model formats on published numbers. Formats compared BF16 (16-bit) \u00a0\u00b7\u00a0 Q4_0 QAT (4-bit) \u00a0\u00b7\u00a0 Mobile QAT June 5, 2026 The Comparison Task What we ranked $ compare gemma-4 &#8211;models E2B,E4B &#8211;formats BF16,Q4_0-QAT,MOBILE-QAT &#8211;rank memory,quality,accessibility &#8211;source published-only &#8211;no-self-run Memory from official Gemma 4 docs. Quality from Google\u2019s stated claim. No models run locally. Format 1 of 3 \u00b7 Reference BF16 (16-bit) 13 \/ 25 The full-precision quality baseline. E2B needs 9.6 GB and E4B needs 15 GB. Top observation: a reference point, not a phone or laptop deployment target. Format 2 of 3 \u00b7 Laptop \/ GPU Q4_0 QAT (4-bit) 21 \/ 25 The general-purpose local format. E2B drops to 3.2 GB and E4B to 5 GB. Top observation: QAT preserves more quality than PTQ at the same 4-bit size. Format 3 of 3 \u00b7 Mobile Mobile QAT 21 \/ 25 The edge-specialized schema. Brings E2B to about 1 GB. Top<\/p>","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-95562","post","type-post","status-publish","format-standard","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/zh\/google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory\/\" \/>\n<meta property=\"og:locale\" content=\"zh_CN\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/zh\/google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2026-06-06T17:38:21+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"\u4f5c\u8005\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 \u5206\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory\",\"datePublished\":\"2026-06-06T17:38:21+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory\/\"},\"wordCount\":1185,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"zh-Hans\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory\/\",\"url\":\"https:\/\/youzum.net\/google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory\/\",\"name\":\"Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"datePublished\":\"2026-06-06T17:38:21+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory\/#breadcrumb\"},\"inLanguage\":\"zh-Hans\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"zh-Hans\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"zh-Hans\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"zh-Hans\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/zh\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/zh\/google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory\/","og_locale":"zh_CN","og_type":"article","og_title":"Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/zh\/google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2026-06-06T17:38:21+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"\u4f5c\u8005":"admin NU","\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4":"6 \u5206"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory","datePublished":"2026-06-06T17:38:21+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory\/"},"wordCount":1185,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"zh-Hans","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory\/","url":"https:\/\/youzum.net\/google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory\/","name":"Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"datePublished":"2026-06-06T17:38:21+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory\/#breadcrumb"},"inLanguage":"zh-Hans","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"zh-Hans"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/zh\/members\/adminnu\/"}]}},"rttpg_featured_image_url":null,"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/zh\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/zh\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/zh\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/zh\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/zh\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"Google DeepMind released Quantization-Aware Training (QAT) checkpoints for the Gemma 4 family. The release targets local deployment on edge devices and consumer GPUs. It follows the Gemma 4 launch in April and a 12B model two days earlier. We compared the available Gemma 4 edge-model formats using only published numbers. The goal was simple. Show&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/posts\/95562","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/comments?post=95562"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/posts\/95562\/revisions"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/media?parent=95562"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/categories?post=95562"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/tags?post=95562"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}