SageMaker AI adds observability for inference endpoints
Amazon SageMaker AI now offers an observability capability for generative AI inference workloads, providing real-time visibility into token performance and infrastructure health. This feature automates the collection and display of key metrics, reducing manual troubleshooting time from hours to minutes. It is designed for engineers and architects managing production AI inference, with a new CloudWatch dashboard and Grafana integration available across multiple AWS regions.
- →New observability for AI inference endpoints
- →Pre-built CloudWatch dashboard for real-time insights
- →Grafana integration for existing observability stacks
- →Broad availability across AWS regions
Features (2) ›
- New observability for AI inference endpoints
SageMaker AI now provides enhanced observability for generative AI inference workloads, offering comprehensive visibility into token performance, GPU health, and autoscaling behavior. This feature automates the collection of real-time inference performance and infrastructure health metrics, surfacing them in a single view to enable faster issue diagnosis and resolution.
- Pre-built CloudWatch dashboard for real-time insights
A new SageMaker AI Insights dashboard in Amazon CloudWatch offers a consolidated view of token latency, GPU utilization, scaling events, and cold start breakdowns using OpenTelemetry native metrics. No additional instrumentation is required, allowing teams to quickly identify and tune performance degradations and autoscaling policies.
Enhancements (1) ›
- Grafana integration for existing observability stacks
Customers can connect SageMaker AI inference observability data directly to Grafana using a regional PromQL endpoint and import a pre-configured dashboard template. This allows for seamless integration with existing observability tools and workflows, further maximizing the performance of AI investments.
Notes (1) ›
- Broad availability across AWS regions
The new SageMaker AI Inference observability capability is available in numerous AWS regions globally, including North America, South America, Europe, and Asia Pacific. Detailed documentation is available for further information.
https://aws.amazon.com/about-aws/whats-new/2026/06/amazon-sagemaker-ai-inference/