{"id":79614,"date":"2026-03-28T14:44:24","date_gmt":"2026-03-28T14:44:24","guid":{"rendered":"https:\/\/youzum.net\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/"},"modified":"2026-03-28T14:44:24","modified_gmt":"2026-03-28T14:44:24","slug":"nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale","status":"publish","type":"post","link":"https:\/\/youzum.net\/it\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/","title":{"rendered":"NVIDIA AI Unveils ProRL Agent: A Decoupled Rollout-as-a-Service Infrastructure for Reinforcement Learning of Multi-Turn LLM Agents at Scale"},"content":{"rendered":"<p>NVIDIA researchers introduced <strong>ProRL AGENT<\/strong>, a scalable infrastructure designed for reinforcement learning (RL) training of multi-turn LLM agents. By adopting a \u2018Rollout-as-a-Service\u2019 philosophy, the system decouples agentic rollout orchestration from the training loop. This architectural shift addresses the inherent resource conflicts between I\/O-intensive environment interactions and GPU-intensive policy updates that currently bottleneck agent development.<\/p>\n<h3 class=\"wp-block-heading\"><strong>The Core Problem: Tight Coupling<\/strong><\/h3>\n<p>Multi-turn agent tasks involve interacting with external environments, such as code repositories or operating systems, via iterative tool use. Many existing frameworks\u2014including <strong>SkyRL<\/strong>, <strong>VeRL-Tool<\/strong>, <strong>Agent Lightning<\/strong>, <strong>rLLM<\/strong>, and <strong>GEM<\/strong>\u2014embed rollout control directly within the training process.<\/p>\n<p><strong>This tight coupling leads to two primary limitations:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Conflicting System Requirements<\/strong>: Rollouts are I\/O-bound, requiring sandbox creation, long-lived tool sessions, and asynchronous coordination. Training is GPU-intensive, centered on forward\/backward passes and gradient synchronization. Running both in one process causes interference and reduces hardware efficiency.<\/li>\n<li><strong>Maintenance Barriers<\/strong>: Embedding rollout logic in the trainer makes it difficult to migrate to different training backends or support new runtime environments without re-implementing the execution pipeline.<\/li>\n<\/ul>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1746\" height=\"924\" data-attachment-id=\"78660\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/03\/27\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/screenshot-2026-03-27-at-10-37-36-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-27-at-10.37.36-PM-1.png\" data-orig-size=\"1746,924\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-03-27 at 10.37.36\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-27-at-10.37.36-PM-1-300x159.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-27-at-10.37.36-PM-1-1024x542.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-27-at-10.37.36-PM-1.png\" alt=\"\" class=\"wp-image-78660\" \/><figcaption class=\"wp-element-caption\">https:\/\/arxiv.org\/pdf\/2603.18815<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>System Design: Rollout-as-a-Service<\/strong><\/h3>\n<p><strong>ProRL AGENT<\/strong> operates as a standalone HTTP service that manages the full rollout lifecycle. The RL trainer interacts with the server solely through an API, remaining agnostic to the underlying rollout infrastructure.<\/p>\n<h4 class=\"wp-block-heading\"><strong>Three-Stage Asynchronous Pipeline<\/strong><\/h4>\n<p><strong>To maximize throughput, the server orchestrates rollouts through an asynchronous three-stage \u2018assembly line\u2019:<\/strong><\/p>\n<ol start=\"1\" class=\"wp-block-list\">\n<li><strong>INIT<\/strong>: Initialization workers spin up sandbox containers and configure tools.<\/li>\n<li><strong>RUN<\/strong>: Rollout workers drive the multi-turn agent loop and collect trajectories.<\/li>\n<li><strong>EVAL<\/strong>: Evaluation workers score results against ground truth to produce reward signals.<\/li>\n<\/ol>\n<p>By assigning each stage to an independent worker pool, <strong>ProRL AGENT<\/strong> allows phases to overlap across different jobs, preventing slow evaluations (such as full test suite executions) from stalling the rollout process.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img decoding=\"async\" width=\"1690\" height=\"1146\" data-attachment-id=\"78662\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/03\/27\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/screenshot-2026-03-27-at-10-38-17-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-27-at-10.38.17-PM-1.png\" data-orig-size=\"1690,1146\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-03-27 at 10.38.17\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-27-at-10.38.17-PM-1-300x203.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-27-at-10.38.17-PM-1-1024x694.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-27-at-10.38.17-PM-1.png\" alt=\"\" class=\"wp-image-78662\" \/><figcaption class=\"wp-element-caption\">https:\/\/arxiv.org\/pdf\/2603.18815<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>HPC-Compatible Sandboxing and Optimized Tools<\/strong><\/h3>\n<p><strong>ProRL AGENT<\/strong> utilizes <strong>Singularity<\/strong> for its sandbox infrastructure. Unlike Docker-based platforms, Singularity allows rootless execution, which is required for deployment on shared HPC clusters managed by Slurm.<\/p>\n<p><strong>The system includes several optimizations to reduce tool execution latency, which often dominates total rollout time:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Efficient Bash<\/strong>: Replaces tmux-based terminal multiplexing with a <strong>ptyprocess<\/strong>-based direct pseudo-terminal, reducing shell command latency from 0.78s to 0.42s.<\/li>\n<li><strong>Direct IPython API<\/strong>: Connects to persistent kernels via an in-process API instead of network gateways, removing networking overhead.<\/li>\n<li><strong>Unix Domain Sockets (UDS)<\/strong>: Replaces TCP loopback for communication between the agent and the execution server inside the container to shave off additional latency.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\"><strong>Advanced Features for Scalable RL<\/strong><\/h3>\n<p><strong>The infrastructure introduces mechanisms to improve training stability and hardware utilization:<\/strong><\/p>\n<h4 class=\"wp-block-heading\"><strong>Load Balancing and Prefix Cache Reuse<\/strong><\/h4>\n<p>The server manages a pool of LLM inference backends (e.g., vLLM) using a min-heap keyed by assignment counts<sup><\/sup><sup><\/sup><sup><\/sup><sup><\/sup>. When a task is assigned, all subsequent calls within that task are routed to the same backend<sup><\/sup>. This strategy maximizes <strong>prefix cache reuse<\/strong>, reducing inference time across multiple agent turns<sup><\/sup>.<\/p>\n<h4 class=\"wp-block-heading\"><strong>Token-in\/Token-out Communication<\/strong><\/h4>\n<p>To eliminate <strong>re-tokenization drift<\/strong>\u2014where the token sequence generated during rollout differs from what is used during training\u2014<strong>ProRL AGENT<\/strong> uses token IDs as the canonical representation throughout the entire process. Log-probabilities and IDs are propagated unchanged from the inference backend to the trainer.<\/p>\n<h4 class=\"wp-block-heading\"><strong>Optimized DAPO Implementation<\/strong><\/h4>\n<p>The system supports <strong>Dynamic Sampling Policy Optimization (DAPO)<\/strong>, which filters out \u2018non-informative\u2019 prompts that yield uniform rewards. <strong>ProRL AGENT<\/strong> uses an asynchronous replenishment mechanism to maintain maximum throughput, terminating redundant active jobs early once the target number of informative prompts is reached.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Experimental Results on SWE-Bench Verified<\/strong><\/h3>\n<p>The system was validated using Qwen3 models across multiple scales. <strong>ProRL AGENT<\/strong> consistently improved performance compared to reproduced baselines.<\/p>\n<figure class=\"wp-block-table is-style-stripes\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<td><strong>Model Scale<\/strong><\/td>\n<td><strong>Reproduced Baseline<\/strong><\/td>\n<td><strong>ProRL Agent (RL)<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>Qwen3-4B<\/strong><\/td>\n<td>14.8<\/td>\n<td><strong>21.2<\/strong><\/td>\n<\/tr>\n<tr>\n<td><strong>Qwen3-8B<\/strong><\/td>\n<td>9.6<\/td>\n<td><strong>18.0<\/strong><\/td>\n<\/tr>\n<tr>\n<td><strong>Qwen3-14B<\/strong><\/td>\n<td>15.4 (reproduced baseline)<\/td>\n<td><strong>23.6<\/strong><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<p><strong>Note: The reported prior result for SkyRL-Agent-14B-v0 was 21.6.<\/strong><\/p>\n<p>In addition to software engineering, the system demonstrated generality in <strong>STEM<\/strong>, <strong>Math<\/strong>, and <strong>Code<\/strong> domains, showing steady reward growth during RL training<sup><\/sup><sup><\/sup><sup><\/sup><sup><\/sup>. Scalability tests confirmed that rollout throughput increases near-linearly as compute nodes are added<sup><\/sup><sup><\/sup><sup><\/sup><sup><\/sup>.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li><strong>Architectural Decoupling<\/strong>: ProRL Agent treats the full agentic rollout lifecycle\u2014including environment initialization, tool execution, and reward scoring\u2014as an independent HTTP service, separating I\/O-intensive tasks from GPU-intensive policy training.<\/li>\n<li><strong>Significant Performance Gains<\/strong>: This infrastructure enabled the Qwen3-8B model to nearly double its performance on the SWE-Bench Verified benchmark (from 9.6% to 18.0%), while the Qwen3-14B model improved from 15.4% to 23.6%.<\/li>\n<li><strong>System Latency Reductions<\/strong>: Targeted optimizations, such as replacing tmux with ptyprocess for shell execution, reduced action latency from 0.78s to 0.42s, contributing to near-linear throughput scaling across compute nodes.<\/li>\n<li><strong>Elimination of Tokenization Drift<\/strong>: The framework utilizes a token-in\/token-out communication pipeline, ensuring that the exact token IDs generated during rollout are passed to the trainer without the risk of lossy re-tokenization.<\/li>\n<li><strong>HPC-Native Deployment<\/strong>: By using Singularity instead of Docker, ProRL Agent supports rootless execution and native Slurm integration, allowing large-scale agent training on shared high-performance computing clusters.<\/li>\n<\/ul>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out\u00a0the\u00a0<strong><a href=\"https:\/\/arxiv.org\/pdf\/2603.18815\" target=\"_blank\" rel=\"noreferrer noopener\">Paper <\/a><\/strong>and<strong>\u00a0<a href=\"https:\/\/github.com\/NVIDIA-NeMo\/ProRL-Agent-Server\" target=\"_blank\" rel=\"noreferrer noopener\">Repo<\/a>.\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">120k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/03\/27\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/\">NVIDIA AI Unveils ProRL Agent: A Decoupled Rollout-as-a-Service Infrastructure for Reinforcement Learning of Multi-Turn LLM Agents at Scale<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>NVIDIA researchers introduced ProRL AGENT, a scalable infrastructure designed for reinforcement learning (RL) training of multi-turn LLM agents. By adopting a \u2018Rollout-as-a-Service\u2019 philosophy, the system decouples agentic rollout orchestration from the training loop. This architectural shift addresses the inherent resource conflicts between I\/O-intensive environment interactions and GPU-intensive policy updates that currently bottleneck agent development. The Core Problem: Tight Coupling Multi-turn agent tasks involve interacting with external environments, such as code repositories or operating systems, via iterative tool use. Many existing frameworks\u2014including SkyRL, VeRL-Tool, Agent Lightning, rLLM, and GEM\u2014embed rollout control directly within the training process. This tight coupling leads to two primary limitations: Conflicting System Requirements: Rollouts are I\/O-bound, requiring sandbox creation, long-lived tool sessions, and asynchronous coordination. Training is GPU-intensive, centered on forward\/backward passes and gradient synchronization. Running both in one process causes interference and reduces hardware efficiency. Maintenance Barriers: Embedding rollout logic in the trainer makes it difficult to migrate to different training backends or support new runtime environments without re-implementing the execution pipeline. https:\/\/arxiv.org\/pdf\/2603.18815 System Design: Rollout-as-a-Service ProRL AGENT operates as a standalone HTTP service that manages the full rollout lifecycle. The RL trainer interacts with the server solely through an API, remaining agnostic to the underlying rollout infrastructure. Three-Stage Asynchronous Pipeline To maximize throughput, the server orchestrates rollouts through an asynchronous three-stage \u2018assembly line\u2019: INIT: Initialization workers spin up sandbox containers and configure tools. RUN: Rollout workers drive the multi-turn agent loop and collect trajectories. EVAL: Evaluation workers score results against ground truth to produce reward signals. By assigning each stage to an independent worker pool, ProRL AGENT allows phases to overlap across different jobs, preventing slow evaluations (such as full test suite executions) from stalling the rollout process. https:\/\/arxiv.org\/pdf\/2603.18815 HPC-Compatible Sandboxing and Optimized Tools ProRL AGENT utilizes Singularity for its sandbox infrastructure. Unlike Docker-based platforms, Singularity allows rootless execution, which is required for deployment on shared HPC clusters managed by Slurm. The system includes several optimizations to reduce tool execution latency, which often dominates total rollout time: Efficient Bash: Replaces tmux-based terminal multiplexing with a ptyprocess-based direct pseudo-terminal, reducing shell command latency from 0.78s to 0.42s. Direct IPython API: Connects to persistent kernels via an in-process API instead of network gateways, removing networking overhead. Unix Domain Sockets (UDS): Replaces TCP loopback for communication between the agent and the execution server inside the container to shave off additional latency. Advanced Features for Scalable RL The infrastructure introduces mechanisms to improve training stability and hardware utilization: Load Balancing and Prefix Cache Reuse The server manages a pool of LLM inference backends (e.g., vLLM) using a min-heap keyed by assignment counts. When a task is assigned, all subsequent calls within that task are routed to the same backend. This strategy maximizes prefix cache reuse, reducing inference time across multiple agent turns. Token-in\/Token-out Communication To eliminate re-tokenization drift\u2014where the token sequence generated during rollout differs from what is used during training\u2014ProRL AGENT uses token IDs as the canonical representation throughout the entire process. Log-probabilities and IDs are propagated unchanged from the inference backend to the trainer. Optimized DAPO Implementation The system supports Dynamic Sampling Policy Optimization (DAPO), which filters out \u2018non-informative\u2019 prompts that yield uniform rewards. ProRL AGENT uses an asynchronous replenishment mechanism to maintain maximum throughput, terminating redundant active jobs early once the target number of informative prompts is reached. Experimental Results on SWE-Bench Verified The system was validated using Qwen3 models across multiple scales. ProRL AGENT consistently improved performance compared to reproduced baselines. Model Scale Reproduced Baseline ProRL Agent (RL) Qwen3-4B 14.8 21.2 Qwen3-8B 9.6 18.0 Qwen3-14B 15.4 (reproduced baseline) 23.6 Note: The reported prior result for SkyRL-Agent-14B-v0 was 21.6. In addition to software engineering, the system demonstrated generality in STEM, Math, and Code domains, showing steady reward growth during RL training. Scalability tests confirmed that rollout throughput increases near-linearly as compute nodes are added. Key Takeaways Architectural Decoupling: ProRL Agent treats the full agentic rollout lifecycle\u2014including environment initialization, tool execution, and reward scoring\u2014as an independent HTTP service, separating I\/O-intensive tasks from GPU-intensive policy training. Significant Performance Gains: This infrastructure enabled the Qwen3-8B model to nearly double its performance on the SWE-Bench Verified benchmark (from 9.6% to 18.0%), while the Qwen3-14B model improved from 15.4% to 23.6%. System Latency Reductions: Targeted optimizations, such as replacing tmux with ptyprocess for shell execution, reduced action latency from 0.78s to 0.42s, contributing to near-linear throughput scaling across compute nodes. Elimination of Tokenization Drift: The framework utilizes a token-in\/token-out communication pipeline, ensuring that the exact token IDs generated during rollout are passed to the trainer without the risk of lossy re-tokenization. HPC-Native Deployment: By using Singularity instead of Docker, ProRL Agent supports rootless execution and native Slurm integration, allowing large-scale agent training on shared high-performance computing clusters. Check out\u00a0the\u00a0Paper and\u00a0Repo.\u00a0Also,\u00a0feel free to follow us on\u00a0Twitter\u00a0and don\u2019t forget to join our\u00a0120k+ ML SubReddit\u00a0and Subscribe to\u00a0our Newsletter. Wait! are you on telegram?\u00a0now you can join us on telegram as well. The post NVIDIA AI Unveils ProRL Agent: A Decoupled Rollout-as-a-Service Infrastructure for Reinforcement Learning of Multi-Turn LLM Agents at Scale appeared first on MarkTechPost.<\/p>","protected":false},"author":2,"featured_media":79615,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-79614","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>NVIDIA AI Unveils ProRL Agent: A Decoupled Rollout-as-a-Service Infrastructure for Reinforcement Learning of Multi-Turn LLM Agents at Scale - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/it\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/\" \/>\n<meta property=\"og:locale\" content=\"it_IT\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"NVIDIA AI Unveils ProRL Agent: A Decoupled Rollout-as-a-Service Infrastructure for Reinforcement Learning of Multi-Turn LLM Agents at Scale - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/it\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2026-03-28T14:44:24+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Scritto da\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Tempo di lettura stimato\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minuti\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"NVIDIA AI Unveils ProRL Agent: A Decoupled Rollout-as-a-Service Infrastructure for Reinforcement Learning of Multi-Turn LLM Agents at Scale\",\"datePublished\":\"2026-03-28T14:44:24+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/\"},\"wordCount\":890,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-27-at-10.37.36-PM-1-FSccHR.png\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"it-IT\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/\",\"url\":\"https:\/\/youzum.net\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/\",\"name\":\"NVIDIA AI Unveils ProRL Agent: A Decoupled Rollout-as-a-Service Infrastructure for Reinforcement Learning of Multi-Turn LLM Agents at Scale - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-27-at-10.37.36-PM-1-FSccHR.png\",\"datePublished\":\"2026-03-28T14:44:24+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/#breadcrumb\"},\"inLanguage\":\"it-IT\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"it-IT\",\"@id\":\"https:\/\/youzum.net\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/#primaryimage\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-27-at-10.37.36-PM-1-FSccHR.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-27-at-10.37.36-PM-1-FSccHR.png\",\"width\":1746,\"height\":924},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"NVIDIA AI Unveils ProRL Agent: A Decoupled Rollout-as-a-Service Infrastructure for Reinforcement Learning of Multi-Turn LLM Agents at Scale\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"it-IT\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"it-IT\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"it-IT\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/it\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"NVIDIA AI Unveils ProRL Agent: A Decoupled Rollout-as-a-Service Infrastructure for Reinforcement Learning of Multi-Turn LLM Agents at Scale - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/it\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/","og_locale":"it_IT","og_type":"article","og_title":"NVIDIA AI Unveils ProRL Agent: A Decoupled Rollout-as-a-Service Infrastructure for Reinforcement Learning of Multi-Turn LLM Agents at Scale - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/it\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2026-03-28T14:44:24+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Scritto da":"admin NU","Tempo di lettura stimato":"4 minuti"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"NVIDIA AI Unveils ProRL Agent: A Decoupled Rollout-as-a-Service Infrastructure for Reinforcement Learning of Multi-Turn LLM Agents at Scale","datePublished":"2026-03-28T14:44:24+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/"},"wordCount":890,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-27-at-10.37.36-PM-1-FSccHR.png","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"it-IT","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/","url":"https:\/\/youzum.net\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/","name":"NVIDIA AI Unveils ProRL Agent: A Decoupled Rollout-as-a-Service Infrastructure for Reinforcement Learning of Multi-Turn LLM Agents at Scale - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-27-at-10.37.36-PM-1-FSccHR.png","datePublished":"2026-03-28T14:44:24+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/#breadcrumb"},"inLanguage":"it-IT","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/"]}]},{"@type":"ImageObject","inLanguage":"it-IT","@id":"https:\/\/youzum.net\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/#primaryimage","url":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-27-at-10.37.36-PM-1-FSccHR.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-27-at-10.37.36-PM-1-FSccHR.png","width":1746,"height":924},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/nvidia-ai-unveils-prorl-agent-a-decoupled-rollout-as-a-service-infrastructure-for-reinforcement-learning-of-multi-turn-llm-agents-at-scale\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"NVIDIA AI Unveils ProRL Agent: A Decoupled Rollout-as-a-Service Infrastructure for Reinforcement Learning of Multi-Turn LLM Agents at Scale"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"it-IT"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"it-IT","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"it-IT","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/it\/members\/adminnu\/"}]}},"rttpg_featured_image_url":{"full":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-27-at-10.37.36-PM-1-FSccHR.png",1746,924,false],"landscape":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-27-at-10.37.36-PM-1-FSccHR.png",1746,924,false],"portraits":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-27-at-10.37.36-PM-1-FSccHR.png",1746,924,false],"thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-27-at-10.37.36-PM-1-FSccHR-150x150.png",150,150,true],"medium":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-27-at-10.37.36-PM-1-FSccHR-300x159.png",300,159,true],"large":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-27-at-10.37.36-PM-1-FSccHR-1024x542.png",1024,542,true],"1536x1536":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-27-at-10.37.36-PM-1-FSccHR-1536x813.png",1536,813,true],"2048x2048":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-27-at-10.37.36-PM-1-FSccHR.png",1746,924,false],"trp-custom-language-flag":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-27-at-10.37.36-PM-1-FSccHR-18x10.png",18,10,true],"woocommerce_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-27-at-10.37.36-PM-1-FSccHR-300x300.png",300,300,true],"woocommerce_single":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-27-at-10.37.36-PM-1-FSccHR-600x318.png",600,318,true],"woocommerce_gallery_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-27-at-10.37.36-PM-1-FSccHR-100x100.png",100,100,true]},"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/it\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/it\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/it\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/it\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/it\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"NVIDIA researchers introduced ProRL AGENT, a scalable infrastructure designed for reinforcement learning (RL) training of multi-turn LLM agents. By adopting a \u2018Rollout-as-a-Service\u2019 philosophy, the system decouples agentic rollout orchestration from the training loop. This architectural shift addresses the inherent resource conflicts between I\/O-intensive environment interactions and GPU-intensive policy updates that currently bottleneck agent development. The&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/posts\/79614","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/comments?post=79614"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/posts\/79614\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/media\/79615"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/media?parent=79614"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/categories?post=79614"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/tags?post=79614"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}