Amazon SageMaker AI adds multi-turn reinforcement learning
Amazon SageMaker AI now supports multi-turn reinforcement learning for customizing foundation models on multi-step, agentic tasks. This serverless capability simplifies the complex process of training AI agents by rewarding full decision sequences, enabling specialization of smaller models for specific workloads. It is available today through SageMaker Studio and the Python SDK, with support for various models and AWS compute options.
- →New multi-turn reinforcement learning for AI agent customization
- →Simplified agent training and management
- →Integrated tracking and evaluation tools
- →Serverless operation and cost efficiency
- →Availability and supported models
Features (1) ›
- New multi-turn reinforcement learning for AI agent customization
SageMaker AI introduces multi-turn reinforcement learning (RL), a serverless technique to fine-tune models for multi-step agent tasks. This feature trains models against custom agent environments by rewarding the complete sequence of decisions an agent makes, facilitating the specialization of smaller models to match larger ones on target workloads.
Enhancements (3) ›
- Simplified agent training and management
SageMaker's Multi-turn RL handles the full training loop, including rollout orchestration, trajectory collection, and checkpoint management, eliminating the need for custom infrastructure. Users can connect their agents running on various AWS services or custom infrastructure.
- Integrated tracking and evaluation tools
The offering includes built-in MLflow tracking to inspect agent trajectories, rewards, and traces. Evaluation jobs provide key metrics like reward, pass@k, and trajectory metrics for pre-deployment benchmarking.
- Serverless operation and cost efficiency
Multi-turn RL operates as a fully serverless capability, meaning users only pay for tokens processed without needing to provision or manage infrastructure, making it a cost-effective solution.
Notes (1) ›
- Availability and supported models
Multi-turn RL is available now via SageMaker Studio and the SageMaker Python SDK. Specific model support varies by region, with models like Qwen 3.6 27B, Nova Lite 2.0, GPT-OSS-20B, and Gemma 31B listed for us-west-2 and us-east-1.
https://aws.amazon.com/about-aws/whats-new/2026/06/multi-turn-reinforcement-learning-on-sagemaker-ai/
