Baltra AI Server ASIC

- An analysis revealed Apple's Baltra AI server ASIC, a custom 3nm chiplet developed with Broadcom for cloud inference. - The report says mass production is targeted for H2 2026 and aims to reduce Apple’s dependence on NVIDIA for inference workloads. - If accurate, this signals Apple shifting some inference capacity to custom cloud silicon, which could reshape server-side ML choices (x.com).

Apple is reportedly building a custom server chip for artificial intelligence, a move that would put more of its cloud work on Apple-designed silicon. (9to5mac.com) The chip has been described in reports as “Baltra,” with Apple working with Broadcom on networking technology for the project and targeting mass production in the second half of 2026. (bloomberg.com) Artificial intelligence inference is the step where a trained model answers a user’s prompt, like turning a Siri request into a summary or rewrite. Apple already sends some of those heavier requests to its cloud system, which it calls Private Cloud Compute. (apple.com) Apple said in June 2024 that Private Cloud Compute handles requests that need “larger, server-based models” and runs those models on Apple silicon servers rather than only on iPhones, iPads, or Macs. (apple.com) That makes a server chip different from the A-series and M-series processors people know from Apple devices. Instead of fitting inside a phone or laptop, the chip would sit in data-center racks and handle many user requests at once. (apple.com) Reports in December 2024 said Apple’s existing Apple Intelligence servers were using M2 Ultra chips and were expected to move to M4-based systems before any fully custom server processor arrived. (macrumors.com) Broadcom’s role also fits the kind of hardware these systems need. The company sells high-speed networking and optical parts for artificial intelligence clusters, including 3-nanometer transceiver technology built for data-center links. (broadcom.com) Broadcom also markets chiplet-based packaging for custom artificial intelligence accelerators, a design approach that breaks a big processor into smaller pieces linked inside one package. That is one way to push more performance and memory bandwidth into a server part. (broadcom.com) Apple has not publicly announced Baltra or confirmed a production schedule. The company’s public statements so far say only that Private Cloud Compute runs on dedicated Apple silicon servers and is meant to keep user data inaccessible even to Apple. (apple.com) If Apple follows the reported 2026 timetable, the next marker will be whether it starts replacing general-purpose Mac-class servers with a chip built specifically for cloud inference. (bloomberg.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.