GPUs get confidential computing

NVIDIA’s H100 now supports trusted execution environments that encrypt data in use — AES‑256‑GCM is cited — letting teams protect models while they run and reducing insider or hypervisor risks. (x.com). The coverage also notes similar platforms from Intel (TDX) and AMD (SEV‑SNP) and frames this as a practical step to shield high‑value models (the note referenced $50M+ model valuations) from operational exposure. (x.com)

Most cloud security protects data in two places: on disk and on the wire. The weak spot is the moment a model is actually running, because its weights and prompts have to sit in working memory in plain form to be computed on. (developer.nvidia.com) A trusted execution environment is a locked room inside a chip. Code runs inside that room, and the system can produce an attestation report that lets a customer verify the room is real before sending in sensitive data. (developer.nvidia.com) That idea started on central processors, which are the general-purpose chips that run the operating system. Intel’s Trust Domain Extensions isolate a virtual machine from the hypervisor, and AMD’s Secure Encrypted Virtualization-Secure Nested Paging encrypts memory and adds integrity checks so the host cannot quietly remap or replay it. (cc-enabling.trustedservices.intel.com) (ubuntu.com) The problem is that modern artificial intelligence usually leaves the central processor and runs on a graphics processor. If the graphics processor is outside the locked room, the most valuable part of the workload still ends up exposed during the actual computation. (developer.nvidia.com) NVIDIA’s Hopper H100 was the first graphics processor the company shipped with confidential computing support. NVIDIA says the chip has an on-die hardware root of trust, which is the built-in starting point used to verify firmware and establish the trusted execution environment. (developer.nvidia.com) When the H100 runs in confidential mode, data moving between the central processor and the graphics processor goes through an encrypted bounce buffer. A Communications of the ACM article describing the design says the direct-memory-access engine uses Advanced Encryption Standard Galois/Counter Mode with 256-bit keys, usually shortened to AES-256-GCM, for those transfers. (developer.nvidia.com) (cacm.acm.org) That closes a very specific cloud risk. A cloud operator, a malicious insider, or compromised hypervisor software can have enormous privileges over a normal virtual machine, but confidential computing is designed so those layers cannot simply inspect the model or the prompts while the job is running. (developer.nvidia.com) (cc-enabling.trustedservices.intel.com) It also changes who can use shared infrastructure. Google Cloud now offers confidential virtual machines on A3 systems with NVIDIA H100 graphics processors, so customers can combine a central-processor confidential machine with a protected graphics processor instead of choosing between security and accelerator performance. (cloud.google.com) (docs.cloud.google.com) There is still a tradeoff. A 2024 benchmark study found that most of the slowdown comes from extra central-processor-to-graphics-processor transfer work over Peripheral Component Interconnect Express, not from the math inside the graphics processor, and it reported overhead below 5% for many large language model inference queries. (arxiv.org) So the shift here is not that chips suddenly became private. The shift is that the expensive part of artificial intelligence — the model weights on the graphics processor — can now sit inside the same kind of hardware-verified boundary that central processors have been building for years. (developer.nvidia.com 1) (developer.nvidia.com 2)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.