Ollama 0.19 has pushed out a preview update that leans on Apple’s MLX framework to make local AI models run faster on Macs with Apple silicon. The headline numbers are hard to ignore: prompts are processed about 1.6 times faster, and response generation is nearly twice as quick.

The biggest gains are on Macs with M5-series chips, where Apple’s new GPU Neural Accelerators do some of the heavy lifting. That puts Ollama in the same lane as other local AI tools trying to squeeze more out of Apple hardware instead of waiting for cloud servers to do the work. For anyone running chatbots or coding helpers locally, that usually means less spinning, fewer pauses, and fewer chances to stare at a progress bar like it owes you money.

What Ollama 0.19 changes

Beyond the raw speed bump, Ollama says memory handling is smarter in this release, which should help longer sessions stay responsive. That matters for AI-assisted coding and multi-turn chats, where models can bog down as conversation history grows. The preview is being pitched as a practical upgrade for users of assistants such as OpenClaw, Claude Code, OpenCode, and Codex.

  • Prompt prefill speed: about 1.6 times faster
  • Response generation: nearly 2 times faster
  • Best results: Macs with M5-series chips
  • Memory requirement: more than 32GB of unified memory

Which models work right now

For now, support is limited to Alibaba’s Qwen3.5, so this is not a universal turbo button for every model in Ollama’s catalog. Still, the company says more AI models are on the way, which suggests this release is more of a first step than the final shape of the feature.

Apple silicon gets a rare software win

Apple has spent years selling the idea that its hardware is especially good at machine learning, but software teams do not always line up behind that promise. Ollama’s move is a neat reminder that local AI is becoming a hardware race as much as a model race, and Macs with roomy memory configurations are now better positioned to benefit than slimmer setups. The catch, as usual, is that speedups in AI tend to show up first for people who already own the pricey machine with plenty of RAM.

The preview is available as Ollama 0.19, and the obvious question is how quickly the company broadens model support beyond Qwen3.5. If that happens soon, Apple silicon Macs could become one of the more appealing places to run local AI without handing every prompt to the cloud.

Leave a comment

Your email address will not be published. Required fields are marked *