A new tutorial shows how to run large language models and vision-language models locally on the Arduino UNO Q microcontroller. Edge Impulse's Marc Pous has outlined steps using the yzma tool to enable offline AI inference on the board's Linux environment. This approach allows for privacy-focused applications in edge computing.
The Arduino UNO Q, introduced in recent months, has sparked varied opinions among users. Some appreciate its increased computational power and ability to run Linux, while others view the App Lab environment as confusing and restrictive. Unlike previous Arduino boards, the UNO Q features an STM32H5 coprocessor, making it suitable for complex projects beyond basic tasks like LED blinking.
In a tutorial published on Hackster.io, Edge Impulse engineer Marc Pous demonstrates running high-performance large language models (LLMs) and vision-language models (VLMs) directly on the UNO Q. The guide leverages yzma, a Go wrapper for llama.cpp developed by Ron Evans, known for projects such as Gobot and TinyGo. Yzma simplifies integration of AI inference into Go applications, avoiding complex CGo bindings, and operates within the board's Debian-based Linux system.
Users follow steps to install Go on the UNO Q, configure yzma, and download compatible GGUF models from Hugging Face. For text-based tasks, Pous uses the SmolLM2-135M-Instruct model, which has about 135 million parameters. Quantization and llama.cpp's efficiency enable it to run on the Arm-based hardware, supporting fully offline chat interactions.
The tutorial extends to multimodal capabilities with the SmolVLM2-500M-Video-Instruct model, featuring around 500 million parameters. This model processes images and short videos alongside text. In one example, the UNO Q analyzes a photo of markers on a desk and produces a detailed description without cloud connectivity.
Such local AI execution supports privacy-conscious edge systems, combining microcontroller control with AI for applications in robotics and smart homes. Developers can interpret images, handle voice commands, or process sensor data on-device, opening possibilities for innovative designs.