Hardware

AMD Ryzen AI Max 400 Gorgon Halo pushes unified memory to 192GB with Zen 5 and RDNA 3.5

At a glance:

  • AMD's Ryzen AI Max 400 'Gorgon Halo' refresh adds up to 192GB of unified memory, claiming to be the first x86 client SoC capable of running 300B+ parameter LLMs.
  • The lineup includes three Pro SKUs — Max+ Pro 495, Max Pro 490, and Max Pro 485 — all built on Zen 5, RDNA 3.5, and XDNA 2 NPU architectures.
  • The Ryzen AI Halo box starts at $3,999 with a Ryzen AI Max+ 395, 128GB unified memory, and 2TB storage; pre-orders open in June, with partner systems arriving in Q3 2026.

A minor refresh with a major memory bump

AMD is rolling out a modest but strategically important refresh of its large SoC lineup, giving the Ryzen AI Max 300 'Strix Halo' chips a new codename: Gorgon Halo. The Ryzen AI Max 400 series reuses the same core building blocks — Zen 5 CPU cores, RDNA 3.5 GPU cores, and an XDNA 2 NPU — but adds a single headline feature: support for up to 192GB of unified memory. That's a meaningful jump over the Strix Halo generation, and AMD is positioning the update around AI workloads that simply can't fit inside smaller memory pools.

The three chips in the Pro lineup are:

  • Ryzen AI Max+ Pro 495: 16 cores / 32 threads, boost to 5.2 GHz, 80 MB cache, 55 NPU TOPS, Radeon 8065S (40 CUs), up to 192 GB unified memory (160 GB usable as VRAM)
  • Ryzen AI Max Pro 490: 12 cores / 24 threads, boost to 5 GHz, 76 MB cache, 50 NPU TOPS, Radeon 8050S (32 CUs), up to 192 GB unified memory (160 GB usable as VRAM)
  • Ryzen AI Max Pro 485: 8 cores / 16 threads, boost to 5 GHz, 40 MB cache, 50 NPU TOPS, Radeon 8050S (32 CUs), up to 192 GB unified memory (160 GB usable as VRAM)

The flagship 495 gets a 100 MHz clock bump over the outgoing 395, pushing its boost frequency to 5.2 GHz. Otherwise, the spec sheets are remarkably similar if you simply swap the "4" for a "3" in the model number.

Why 192GB unified memory matters

AMD says that up to 160GB of the unified memory pool can function as VRAM, with 32GB reserved for the system. That configuration makes the Ryzen AI Max 400 chips, the company claims, the first x86 client processors capable of running a 300B+ parameter large language model. It's a category of one for now — Intel doesn't produce a large SoC in this class, and Apple's offerings rely on ARM ISA rather than x86.

The memory advantage isn't just about raw capacity. Unified memory means the CPU, GPU, and NPU all share the same address space, eliminating costly data copies between discrete pools. For inference workloads running massive models locally, that architecture saves both latency and power. It also aligns the Ryzen AI Max 400 more closely with the memory-hungry demands of agentic AI frameworks, where token throughput and context window size directly drive hardware requirements.

The Ryzen AI Halo box and its positioning

The only confirmed system shipping with the new chips so far is the Ryzen AI Halo box, configured with the Ryzen AI Max+ Pro 495. Pre-orders open in June, with a starting price of $3,999. That configuration includes a Ryzen AI Max+ 395 (the existing Strix Halo chip), 128GB of unified memory, and 2TB of storage. AMD says additional configurations will be detailed closer to launch.

The Ryzen AI Halo measures just 5.9 x 5.9 x 1.7 inches, and its specs read like a mini workstation:

  • Wi-Fi 7, Bluetooth 5.4, 10Gbps Ethernet
  • HDMI 2.1b display output
  • Three USB-C ports plus a fourth USB-C for power delivery
  • Rated TDP up to 120W

AMD is comparing the Halo box to Nvidia's DGX Spark, which retails for $4,700 with 128GB unified memory, Nvidia's GB10 chip, and 4TB of storage. On Linux, AMD claims the Ryzen AI Halo delivers up to 14% higher tokens per second than the DGX Spark when running the GLM 4.7 Flash 30B model, and up to 4% higher tokens per second on Qwen 3.6 35B. The Halo also supports Windows, whereas the DGX Spark is Linux-only. AMD also benchmarks against the Mac Mini M4 Pro, showing roughly 4X scaling in AI workloads — though it concedes that a Mac Studio is a more appropriate point of comparison for the Halo's class of compute.

The 'token economy' pitch and cost of cloud vs. on-prem

AMD is leaning hard into the "token economy" narrative, arguing that running inference on-premises can save significant money compared to cloud API calls. The company estimates that a single Ryzen AI Halo box can save up to $750 per month over cloud compute, breaking even on cost after six months at a usage rate of six million tokens per day. That math lines up with real-world anecdotes: OpenClaw developer Peter Steinberger recently reported racking up $1.3 million in OpenAI API usage over 30 days across a three-person team working on an agentic AI framework.

Whether that cost-saving narrative holds depends on workload consistency. Teams running sporadic or highly variable inference jobs may not see the same break-even timeline, but for organizations with sustained, high-volume token consumption, a local box with 128–192GB of unified memory could be a compelling alternative to perpetual cloud spend.

What's still unclear

Several details remain pending. AMD hasn't shared an exact pre-order date for June, nor has it named OEM partners for Gorgon Halo systems. An AMD spokesperson said that "several OEM partners have expressed excitement" and that "systems will be announced from our partners starting in Q3 2026," but no specific manufacturers or models were disclosed.

The consumer fate of Gorgon Halo is also uncertain. All three announced SKUs carry the "Pro" tag, which AMD says indicates enterprise-grade security, manageability, and reliability features. When asked about consumer variants, AMD didn't commit either way. The Strix Halo generation was a niche product, available in only a handful of machines — the Framework Desktop, ROG Flow Z13, and GMKtec EVO-X2 — so a similarly conservative rollout for Gorgon Halo is possible.

Notably, AMD kept the GPU core count at 32 CUs for the 490 and 485, down from the 40 CUs used on the refreshed Strix Halo 385 and 390. It's possible a further refresh could bring 40 CUs to the lower-end Gorgon Halo SKUs, but AMD hasn't indicated that yet.

What to watch next

With pre-orders opening in June and partner systems expected in Q3 2026, the next few months should clarify pricing for the full Halo box lineup and reveal which OEMs plan to ship Gorgon Halo machines. Benchmarks from independent reviewers will also be critical — AMD's own token-per-second claims, while promising, need real-world validation across diverse model sizes and inference patterns. If the 300B+ parameter LLM capability holds up, the Ryzen AI Max 400 could carve out a genuine niche for on-premises, x86-based AI inference at a price point below Nvidia's DGX Spark.

Editorial SiliconFeed is an automated feed: facts are checked against sources; copy is normalized and lightly edited for readers.

FAQ

What are the three Ryzen AI Max 400 Pro SKUs and their key specs?
The lineup consists of the Ryzen AI Max+ Pro 495 (16C/32T, 5.2 GHz boost, 80 MB cache, 55 NPU TOPS, 40 GPU CUs), the Ryzen AI Max Pro 490 (12C/24T, 5 GHz boost, 76 MB cache, 50 NPU TOPS, 32 GPU CUs), and the Ryzen AI Max Pro 485 (8C/16T, 5 GHz boost, 40 MB cache, 50 NPU TOPS, 32 GPU CUs). All three support up to 192GB of unified memory with 160GB usable as VRAM.
How does the Ryzen AI Halo box compare to Nvidia's DGX Spark?
The Ryzen AI Halo starts at $3,999 with a Ryzen AI Max+ 395, 128GB unified memory, and 2TB storage, while the DGX Spark costs $4,700 with 128GB unified memory, Nvidia's GB10 chip, and 4TB storage. On Linux, AMD claims the Halo delivers up to 14% higher tokens per second on GLM 4.7 Flash 30B and up to 4% higher on Qwen 3.6 35B. The Halo also supports Windows, whereas the DGX Spark is Linux-only.
When will the Ryzen AI Halo be available and what's the break-even timeline?
AMD is opening pre-orders in June for the Ryzen AI Halo box with the Ryzen AI Max+ 395, priced starting at $3,999. Partner systems based on the Ryzen AI Max Pro 400 series are expected starting in Q3 2026. AMD estimates the box saves up to $750 per month versus cloud compute, breaking even after six months at six million tokens per day.

More in the feed

Prepared by the editorial stack from public data and external sources.

Original article