Function Gemma performance on Pixel 7
- Google’s FunctionGemma, highlighted in a May 22 video, was presented as a 270 million-parameter model running nearly 2,000 prefill tokens per second on Pixel 7. - The most telling figure was the accuracy spread: 46% out of the box on app intents, rising to 90% on eight of ten functions after fine-tuning. - Google’s developer docs, model page and Gemma cookbook now provide the next step for developers testing on-device function-calling workflows.
Google’s FunctionGemma is being pitched as a small model for a narrow job: turning natural-language requests into function calls that can run locally on a device. A YouTube video posted in the last few days said the 270 million-parameter model processes nearly 2,000 prefill tokens per second on a Pixel 7 and reached 46% accuracy out of the box on a fixed set of app intents. After fine-tuning on a synthetic dataset, the video said, the model cleared 90% on eight of ten functions. Those numbers matter because FunctionGemma is not being sold as a general chatbot. Google’s developer documentation says it is a specialized version of Gemma 3 270M tuned for function calling, aimed at custom, fast, private local agents that map text into executable API actions. Google announced the model in December 2025 and published a training recipe for developers who want to adapt it to their own tools or app actions. (youtube.com) ### Why does the Pixel 7 number stand out? The Pixel 7 figure stands out because it puts a concrete device and throughput number on edge inference. The YouTube video summary says FunctionGemma processes nearly 2,000 prefill tokens per second on that phone, which is a speed claim tied to local execution rather than a cloud API round trip. Google’s own materials frame the model for “the edge,” but they do not present it as a universal benchmark winner. (ai.google.dev) The company says FunctionGemma is a base for further training into custom agents, which suggests the Pixel 7 result is best read as a deployment demonstration for a specialized task. That is an inference from Google’s positioning and the video’s benchmark framing. ### What does the 46% to 90% range actually describe? (youtube.com) The 46% to 90% range describes task performance under different conditions, not a single headline benchmark. The video says the model hit 46% accuracy out of the box on a fixed set of app intents, then exceeded 90% on eight of ten functions after fine-tuning on synthetically generated data. That spread is consistent with how Google describes the model. (ai.google.dev) The developer page says FunctionGemma is intended as a starting point for specialization, and Google’s Gemma cookbook includes a notebook for fine-tuning the 270M model for mobile actions. The notebook says an end-to-end run on a Colab A100 can take about 60 minutes. ### Is this about chat, or about controlling apps? FunctionGemma is about controlling tools and apps. (youtube.com) Google says the model is tuned for function calling, and a Google Developers blog post on AI Edge Gallery describes on-device function calling as a way for a model to move from describing tasks to predicting tool use. The practical distinction is that developers are not asking the model to be broadly knowledgeable. (ai.google.dev) They are asking it to reliably choose and format the right action, argument or API call for a narrow domain. That is also why prompt design, evaluation method and dataset choice can move reported accuracy materially. ### Where did the UCLA course reference come from? The UCLA reference appears to point to broader teaching on building LLM systems from scratch rather than to FunctionGemma itself. (ai.google.dev) UCLA Extension lists a Large Language Models course covering transformer architecture, prompt engineering, fine-tuning and agent systems. Separately, an open-sourced course by Hamza Farooq says it has been taught at UCLA among other institutions. (youtube.com) Google’s next-step materials are already public. The developer overview, Hugging Face model page and Gemma cookbook notebook give developers the main places to test the model, fine-tune it for mobile actions and compare their own on-device results against the Pixel 7 demonstration. (ai.google.dev) (uclaextension.edu)