GPU Platforms Promoted for In-Orbit AI/ML Workloads
Colossus Computing is promoting its GPU-based platforms for running CUDA AI/ML workloads in space. The technology aims to enable real-time, onboard data processing and autonomous decision-making without delays from ground communication. This reflects a growing demand for high-throughput, flexible compute at the edge for complex tasks like sensor fusion and computer vision in orbital environments.
- Colossus’s Falcon platform is based on the NVIDIA Jetson Orin AGX module, offering up to 248 TOPS, and integrates a semi-customizable FPGA for interface flexibility and additional hardware computation. The more entry-level Kestrel platform uses the NVIDIA Jetson TX2i module. - To mitigate the effects of radiation, Colossus platforms employ resettable eFuse circuits on each subsystem, use Error Correction Codes (ECC) for RAM, and feature multiple boot sectors for redundancy. Their hardware is designed for 5-year low Earth orbit missions with an operating temperature range of -40°C to +60°C. - Loft Orbital selected Colossus GPUs for its YAM-6 mission, which launched aboard a SpaceX Transporter-10 rideshare, to power its "Virtual Mission" service that allows customers to access in-space data. - In a separate effort, a startup named Starcloud is launching the NVIDIA H100 GPU, stated to be 100 times more powerful than any processor previously flown in space, on its Starcloud-1 satellite. This mission will also test Google's open-source Gemma AI model in orbit. - While FPGAs offer lower latency and power consumption ideal for specific, deterministic tasks at the edge, GPUs excel at high-throughput, parallel processing suitable for training complex AI models. The combination of both, as seen in some space-bound platforms, provides a hybrid approach to handle diverse workloads from sensor data processing to running neural networks. - The European Space Agency's Φ-sat-1, launched in 2020, was the first European Earth observation mission to carry an AI chip onboard, which is used to filter out cloudy images and reduce the amount of data that needs to be downlinked. - Applying aviation software safety standards like DO-178C to AI/ML systems presents challenges, as the standard's verification and traceability objectives don't translate well to neural networks and training data. Current strategies for certification of lower-criticality (DAL-D) functions involve treating neural network weightings as controlled parameters and extensive testing to ensure deterministic outputs. - Radiation-tolerant alternatives to commercial off-the-shelf (COTS) GPUs exist, such as AMD's Versal AI Core XQRVC1902 adaptive SoC and Moog's single board computer that pairs an AMD G-series SOC GPU with a Xilinx Ultrascale FPGA. However, COTS systems like the NVIDIA Tegra K1 have demonstrated adequate radiation tolerance for some low Earth orbit missions.