Apple plans to make on-device AI a centerpiece of WWDC 2026

SiliconFeed EditorialMay 28, 2026

on-device AI Apple WWDC Gemini Private Cloud Compute

Sections and tags — in the Topics menu Search the feed

At a glance:

Apple will spotlight on-device AI at WWDC 2026, leveraging 15 years of custom silicon expertise for iPhones, Watches, and Macs.
The company plans to use a distilled version of Google's Gemini model for local inference while relying on Nvidia's confidential compute in Google Cloud for heavier tasks.
Apple is reportedly considering acquisitions like Liquid AI to advance model-shrinking efforts, marking a shift from its original all-Private Cloud Compute strategy.

Apple is preparing to reframe its AI narrative at the 2026 Worldwide Developers Conference, scheduled for June 8-12, by placing on-device artificial intelligence at the center of its presentation. According to people familiar with the company's plans speaking to The Information, Apple will emphasize how its custom silicon—developed over the past 15 years for iPhones, Apple Watches, and Macs—gives it a unique advantage in running AI models directly on user devices. This approach positions local inference as both a privacy-preserving alternative to cloud processing and a cost-effective strategy compared to the massive data center investments being pursued by competitors.

The company's approach involves a hybrid model architecture. Apple is set to use a large version of Google's Gemini model as the foundation for training, then distill it into a smaller version capable of running efficiently on Apple hardware. This strategy allows Apple to leverage Google's advanced model capabilities while adapting them for local deployment. However, the company is also actively scouting acquisitions to accelerate its model-shrinking work, with Liquid AI, a Massachusetts startup specializing in on-device AI, reportedly among the companies under consideration.

Despite the focus on local processing, Apple acknowledges that certain queries will still require cloud-based computation. The company is believed to have approved the use of Nvidia's confidential compute technology within Google Cloud to handle processing of the larger Gemini-based model. This security feature encrypts both data and AI models during processing, adding a modest performance overhead but providing stronger privacy protections for sensitive information. This arrangement represents a notable departure from Apple's original Apple Intelligence announcement, which promised that all cloud-bound queries would be handled exclusively by its own Private Cloud Compute infrastructure running on Apple silicon.

The shift in strategy becomes clearer when examining the technical limitations Apple faces. Google's full Gemini model operates with trillions of parameters, and The Information reports that Apple has struggled to run this model even within its Private Cloud Compute infrastructure, which relies on the same Apple silicon chips found in Mac computers. This technical constraint appears to have influenced Apple's decision to partner with Google Cloud and Nvidia for certain cloud-based processing tasks, rather than attempting to build out equivalent capacity within its own data centers.

Apple's AI rollout has encountered significant challenges since Apple Intelligence was first announced at WWDC 2024. Initial features received a tepid response from users, and the more personalized version of Siri has faced protracted delays. With WWDC 2026 just a month away, Apple is positioning the event as an opportunity to reframe the narrative, reintroduce delayed features, and debut new capabilities that better align with user expectations and technical realities.

The company's approach reflects broader industry tensions between privacy, performance, and cost. By combining on-device processing for everyday tasks with selective cloud usage for complex queries, Apple aims to balance these competing priorities. The integration of Nvidia's confidential compute technology adds an additional layer of security, though it comes with performance trade-offs that Apple will need to carefully manage. As the company prepares for its developer conference keynote, attention will focus on how successfully it can communicate this hybrid approach to consumers and developers alike.

Editorial SiliconFeed is an automated feed: facts are checked against sources; copy is normalized and lightly edited for readers.

FAQ

How is Apple planning to run AI models on devices?

Apple will leverage 15 years of custom silicon expertise designed for iPhones, Apple Watches, and Macs to run AI models directly on user devices. The company plans to showcase how these chips provide an edge in processing AI queries locally, which serves as a privacy-preserving and cost-saving alternative to cloud-based processing.

What role does Google's Gemini play in Apple's AI strategy?

Apple is set to use a large version of Google's Gemini model to train a smaller, distilled version that can run locally on Apple hardware. This partnership represents a shift from Apple's original plan, as the company is also using Nvidia's confidential compute technology within Google Cloud to handle processing of the larger Gemini-based model for more complex queries.

Why is Apple using cloud processing at all if it's promoting on-device AI?

Some queries are too complex for on-device processing. Google's full Gemini model runs into the trillions of parameters, and Apple has reportedly struggled to run it on its Private Cloud Compute infrastructure. Additionally, Apple is using Nvidia's confidential compute in Google Cloud for stronger privacy protections, though this adds performance overhead.

More in the feed

Prepared by the editorial stack from public data and external sources.

Original article