Amazon SageMaker Adds Custom Inference for Nova Models
Amazon SageMaker now supports custom inference for Amazon Nova models. The update gives engineers more control over instance types, scaling, and model customization, bringing the proprietary model family to parity with open-weight LLM deployment workflows on the platform.
- This general availability release specifically covers custom-trained versions of Nova Micro, Nova Lite, and Nova 2 Lite models with reasoning capabilities. - The update supports deploying models that have undergone full-rank customization—such as continued pre-training and supervised fine-tuning—not just adapter-based methods like LoRA. - Cost reduction is a key feature, enabling the use of more cost-efficient Amazon EC2 G5 and G6 GPU instances and auto-scaling based on 5-minute usage patterns to handle variable workloads. - Billing for these custom endpoints is per-hour for the compute instances used, with no minimum commitments or upfront fees. - The Amazon Nova model family itself was introduced at AWS re:Invent 2025 and includes fast, low-cost text models (Nova Micro) and multimodal models that handle images and video (Nova Lite, Pro, and Premier). - As an example of the benefits of customization, early customer Nexthink reported an 80% reduction in token usage and a 30% increase in accuracy for domain-specific queries after fine-tuning a Nova model. - At launch, this new inference capability is available in the US East (N. Virginia) and US West (Oregon) AWS Regions. - This update closes a significant gap in the MLOps workflow, as enterprises could previously customize Nova models in SageMaker but lacked a managed service to deploy the fine-tuned versions into production.