The new Cactus AI inference engine allows mobile devices to run local models using 10x less RAM through NPU optimization and ...
Want AI on your phone without cloud limits? Models like Llama 3.2, Qwen3, Gemma 3, and SmolLM2 run locally for private chats, coding, reasoning, and image tasks. Llama 3.2 is the best all-rounder, ...
Small brains with big thoughts.
The tech industry has spent years bragging about whose cloud-based AI model has the most trillions of parameters and who poured more billions of dollars into data centers. However, the open-source AI ...
With the launch of Google’s Gemma 4 family of AI models, AI enthusiasts now have access to a new class of small, fast, and omni-capable AI designed for fast and efficient local deployment, and NVIDIA ...
Osaurus combines local and cloud AI models in a Mac app that keeps users’ memory, files, and tools on their own hardware.
OMLX is a specialized inference engine designed to harness the full capabilities of Apple Silicon for running local AI models. By using Apple’s MLX framework and advanced memory management techniques, ...
Google’s AI Edge Gallery app is now officially available on the Google Play Store and Apple App Store. The app allows users to run the brand-new Gemma 4 model entirely on-device, requiring no internet ...