Gemma 4 本地 RL 訓練與 Agentic 視覺：9GB VRAM 就夠，完全離線

本週關於 Gemma 4 的兩則消息，主題都一樣：不需要雲端，本機就能跑。

一個是訓練端，一個是推理端。

Unsloth：9GB VRAM 跑 Gemma 4 RL 訓練

You can now train Gemma 4 with RL in our free notebook! You just need 9GB VRAM to RL Gemma 4 locally! Gemma 4 will learn to solve sodoku autonomously via GRPO. RL Guide: unsloth.ai/docs/get-start… GitHub: github.com/unslothai/unsl… Gemma 4 Colab: colab.research.google.com/github/unsloth…

2:53 PM · Apr 15, 2026

1.4K

Read 21 replies

Unsloth 釋出免費 notebook，讓你用 GRPO 在本地對 Gemma 4 做 reinforcement learning。最低需求：9GB VRAM。

GRPO（Group Relative Policy Optimization）是目前最流行的 LLM RL 算法之一，DeepSeek-R1 用的就是這個。Unsloth 的優化讓 GRPO 的 VRAM 用量比原本少 80%，所以一張消費級顯卡就能跑。

Notebook 示範的任務是讓 Gemma 4 自主學會解數獨——模型從幾乎不會，透過 RL 反覆嘗試，solve rate 持續上升。

Loading diagram...

Unsloth 的核心技術是 4-bit 量化 + 自訂 CUDA kernel，讓記憶體效率比 HuggingFace 原生高出許多：

GRPO RL 訓練 VRAM 用量（相對值，越低越好）

Google Gemma：離線 Agentic 視覺推理

Google Gemma

@googlegemma

·Follow

We are in the era of local AI orchestration Gemma 4 evaluates a scene, reasons about what to ask, and calls a segmentation model to execute the vision tasks: 🚗 "Segment all vehicles." ➔ 64 found 🚙 "Now just the white ones." ➔ 23 found All happening offline on a laptop.

Watch on X

7:39 PM · Apr 14, 2026

3.0K

Read 57 replies

Google Gemma 官方帳號展示的場景：一台筆電，完全離線，Gemma 4 接收自然語言指令，判斷該呼叫哪個視覺模型，再把結果整合回來。

示範流程：

「Segment all vehicles.」→ Gemma 4 理解指令，呼叫分割模型 → 找到 64 輛
「Now just the white ones.」→ 加入條件，重新執行 → 縮小到 23 輛

這是 tool use + 視覺推理 的組合，Gemma 4 扮演的是 orchestrator，分割模型是它呼叫的 tool。整個 pipeline 跑在本機，沒有任何 API 呼叫。

Gemma 4 的規格支撐這件事的關鍵在於它的多模態能力從 E2B 這個最小版本就內建，加上原生 function calling 支援，讓本地 agentic workflow 不需要額外的 orchestration 框架。

兩件事合在一起說明什麼

訓練端可以在 9GB VRAM 做 RL，推理端可以在筆電離線跑 agentic 視覺任務——Gemma 4 正在把以前只有雲端才能做的事情，搬到本機消費級硬體上。

這對獨立開發者的意義很直接：你可以在本地訓練一個針對你自己任務優化的模型，然後部署在同一台機器上跑 agentic workflow，全程不需要 API key，不需要按用量付費。

Unsloth Colab notebook 免費，直接開跑：unsloth.ai