{"id":87594,"date":"2026-05-02T15:53:46","date_gmt":"2026-05-02T15:53:46","guid":{"rendered":"https:\/\/youzum.net\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/"},"modified":"2026-05-02T15:53:46","modified_gmt":"2026-05-02T15:53:46","slug":"a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b","status":"publish","type":"post","link":"https:\/\/youzum.net\/fr\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/","title":{"rendered":"A New NVIDIA Research Shows Speculative Decoding in NeMo RL Achieves 1.8\u00d7 Rollout Generation Speedup at 8B and Projects 2.5\u00d7 End-to-End Speedup at 235B"},"content":{"rendered":"<p>If you have been running reinforcement learning (RL) post-training on a language model for math reasoning, code generation, or any verifiable task, you have almost certainly stared at a progress bar while your GPU cluster burns through rollout generation. <a href=\"https:\/\/arxiv.org\/abs\/2604.26779?linkId=100000420267663\" target=\"_blank\" rel=\"noreferrer noopener\">A team of researchers from NVIDIA proposes a precise fix<\/a> by integrating speculative decoding into the RL training loop itself, and do it in a way that preserves the target model\u2019s exact output distribution.<\/p>\n<p>The research team integrated speculative decoding directly into <strong>NeMo RL v0.6.0<\/strong> with a vLLM backend, delivering lossless rollout acceleration at both 8B and projected 235B model scales.The latest NeMo RL v0.6.0 release officially ships speculative decoding as a supported feature alongside the SGLang backend, the Muon optimizer, and YaRN long-context training.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1500\" height=\"734\" data-attachment-id=\"79449\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/05\/01\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/screenshot-2026-05-01-at-8-46-43-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-01-at-8.46.43-PM-1.png\" data-orig-size=\"1500,734\" data-comments-opened=\"0\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-05-01 at 8.46.43\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-01-at-8.46.43-PM-1-1024x501.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-01-at-8.46.43-PM-1.png\" alt=\"\" class=\"wp-image-79449\" \/><figcaption class=\"wp-element-caption\">https:\/\/arxiv.org\/pdf\/2604.26779<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Why Rollout Generation is the Bottleneck<\/strong><\/h3>\n<p>To understand the problem, it helps to know how a synchronous RL training step breaks down. In NeMo RL, each step consists of <strong>five stages<\/strong>: data loading, weight synchronization and backend preparation (prepare), rollout generation (gen), log-probability recomputation (logprob), and policy optimization (train).<\/p>\n<p>The research team measured this breakdown on Qwen3-8B under <strong>two workloads<\/strong> \u2014 <strong>RL-Think<\/strong>, which continues training a reasoning-capable model, and <strong>RL-Zero<\/strong>, which starts from a base model and learns reasoning from scratch. In both cases, rollout generation accounts for 65\u201372% of total step time. Log-probability recomputation and training together take only about 27\u201333%. This makes generation the only stage worth targeting for acceleration, and the one that determines the ceiling for any rollout-side optimization.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img decoding=\"async\" width=\"1476\" height=\"570\" data-attachment-id=\"79451\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/05\/01\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/screenshot-2026-05-01-at-8-47-09-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-01-at-8.47.09-PM-1.png\" data-orig-size=\"1476,570\" data-comments-opened=\"0\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-05-01 at 8.47.09\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-01-at-8.47.09-PM-1-1024x395.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-01-at-8.47.09-PM-1.png\" alt=\"\" class=\"wp-image-79451\" \/><figcaption class=\"wp-element-caption\">https:\/\/arxiv.org\/pdf\/2604.26779<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>What Speculative Decoding Actually Does<\/strong><\/h3>\n<p>Speculative decoding is a technique where a smaller, faster <em>draft model<\/em> proposes several tokens at once, and the larger <em>target model<\/em> (the one you are actually training) verifies them using a rejection sampling procedure. The key property and why it matters for RL, is that the rejection procedure is mathematically guaranteed to produce the same output distribution as if the target model had generated those tokens autoregressively. No distribution mismatch, no off-policy corrections needed, no change to the training signal.<\/p>\n<p>This is important because in RL post-training, the training reward depends on the policy\u2019s own samples. Methods like asynchronous execution, off-policy replay, or low-precision rollouts all trade some amount of training fidelity for throughput. Speculative decoding trades nothing: the rollouts are identical in distribution to what the target model would have generated on its own, just produced faster.<\/p>\n<h3 class=\"wp-block-heading\"><strong>The System Integration Challenge<\/strong><\/h3>\n<p>Adding a draft model to a serving backend is straightforward. Adding one to an RL training loop is not. Every time the policy updates, the rollout engine must receive new weights. The draft model must remain aligned with the evolving policy. Log-probabilities, KL penalties, and the GRPO policy loss must all be computed against the target (verifier) policy not the draft or the optimization target is silently corrupted.<\/p>\n<p>The NVIDIA research team handles this in NeMo RL with a <strong>two-path architecture<\/strong>. The general path uses EAGLE-3, a drafting framework that works with any pretrained model without requiring native multi-token prediction (MTP) support. A native path is also available for models that ship with built-in MTP heads. When online draft adaptation is enabled, the hidden states and log-probabilities from the MegatronLM verifier forward pass are cached and reused to supervise the draft head via a gradient-detached pathway, so draft training never interferes with the policy gradient signal.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Measured Results at 8B Scale<\/strong><\/h3>\n<p>On 32 GB200 GPUs (8 GB200 NVL72 nodes, 4 GPUs per node), EAGLE-3 reduces generation latency from 100 seconds to 56.6 seconds on RL-Zero \u2014 a 1.8\u00d7 generation speedup. On RL-Think, it drops from 133.6 seconds to 87.0 seconds, a 1.54\u00d7 speedup. Because log-probability re-computation and training are unchanged, these generation-side gains translate to overall step speedups of 1.41\u00d7 on RL-Zero and 1.35\u00d7 on RL-Think. Validation accuracy on AIME-2024 evolves identically under autoregressive and speculative decoding throughout training, confirming that the lossless guarantee holds in practice.<\/p>\n<p>The research team also tests n-gram drafting as a model-free speculative baseline. Despite achieving acceptance lengths of 2.47 on RL-Zero and 2.05 on RL-Think, n-gram drafting is slower than the autoregressive baseline in both settings \u2014 0.7\u00d7 and 0.5\u00d7 respectively. This is a critical finding for practitioners: a positive acceptance length is necessary but not sufficient. If the verification overhead is high enough, speculation makes things worse.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Three Configuration Decisions That Determine Realized Speedup<\/strong><\/h3>\n<p>The research team isolates <strong>three operational choices<\/strong> that practitioners must get right.<\/p>\n<p><strong>Draft initialization<\/strong> matters more than generic drafting ability. An EAGLE-3 draft initialized on the DAPO post-training dataset achieves a 1.77\u00d7 generation speedup on RL-Zero, while a draft initialized on the general-purpose UltraChat and Magpie datasets achieves only 1.51\u00d7 at the same draft length. The draft must be aligned with the actual rollout distribution encountered during RL, not just a broad chat distribution.<\/p>\n<p><strong>Draft length<\/strong> has a non-obvious optimum. At draft length k=3, RL-Zero achieves 1.77\u00d7 speedup and RL-Think achieves 1.53\u00d7. Increasing to k=5 raises the acceptance length but drops speedup to 1.44\u00d7 on RL-Zero and 0.84\u00d7 on RL-Think \u2014 the latter already slower than autoregressive. At k=7, RL-Zero drops further to 1.21\u00d7 and RL-Think to 0.71\u00d7. The contrast matters: RL-Zero\u2019s rollouts are generated from a base model starting with short outputs, making them easier for the draft to predict even at high k. RL-Think\u2019s fully developed reasoning traces are harder to speculate over, so the overhead of longer drafts erases the benefit sooner. More speculative work per step can erase the benefit of higher acceptance entirely, especially in harder generation regimes.<\/p>\n<p><strong>Online draft adaptation<\/strong> \u2014 updating the draft during RL using rollouts generated by the current policy helps most when the draft is weakly initialized. For a DAPO-initialized draft, offline and online configurations perform nearly identically (1.77\u00d7 vs. 1.78\u00d7 on RL-Zero). For a UltraChat-initialized draft, online updating improves speedup from 1.51\u00d7 to 1.63\u00d7 on RL-Zero.<\/p>\n<p><strong>Interaction with asynchronous execution<\/strong> was also tested directly at 8B scale not just in simulation. The research team ran RL-Think at policy lag 1 in a 16-node non-colocated configuration, with 12 nodes dedicated to generation and 4 to training. In asynchronous mode, most of rollout generation is already hidden behind log-probability re-computation and policy updates, so the relevant quantity is the exposed generation time that remains on the critical path. Speculative decoding reduces that exposed generation time from 10.4 seconds to 0.6 seconds per step and lowers effective step time from 75.0 seconds to 60.5 seconds (1.24\u00d7). The gain is smaller than in synchronous RL \u2014 expected, since asynchronous overlap already hides much of the rollout cost \u2014 but it confirms that the two mechanisms are genuinely complementary rather than redundant.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Projected Gains at 235B Scale<\/strong><\/h3>\n<p>Using a proprietary GPU performance simulator calibrated to device-level compute, memory, and interconnect characteristics, the research team projected speculative decoding gains at larger scales. For Qwen3-235B-A22B running synchronous RL on 512 GB200 GPUs, draft length k=3 with an acceptance length of 3 tokens yields a 2.72\u00d7 rollout speedup and a 1.70\u00d7 end-to-end speedup.<\/p>\n<p>At the most favorable simulated operating point \u2014 Qwen3-235B-A22B on 2048 GB200 GPUs with asynchronous RL at policy lag 2 \u2014 rollout speedup reaches approximately 3.5\u00d7, translating to a projected 2.5\u00d7 end-to-end training speedup. Speculative decoding and asynchronous execution are described as complementary: speculation reduces the cost of each individual rollout, while asynchronous overlap hides the remaining generation time behind training and log-probability computation.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li><strong>Rollout generation is the dominant bottleneck in RL post-training<\/strong>, accounting for 65\u201372% of total step time in synchronous RL workloads \u2014 making it the only stage where acceleration has meaningful impact on end-to-end training speed.<\/li>\n<li><strong>Speculative decoding via EAGLE-3 delivers lossless rollout acceleration<\/strong>, achieving 1.8\u00d7 generation speedup at 8B scale (1.41\u00d7 overall step speedup) without changing the target model\u2019s output distribution \u2014 unlike asynchronous execution, off-policy replay, or low-precision rollouts, which all trade training fidelity for throughput.<\/li>\n<li><strong>Draft initialization quality matters more than draft length<\/strong>, with in-domain (DAPO-trained) drafts outperforming general chat-domain drafts by a meaningful margin; longer draft lengths (k\u22655) consistently backfire in harder reasoning workloads, making k=3 the reliable default.<\/li>\n<li><strong>Simulator projections show gains scale up significantly<\/strong>, reaching ~3.5\u00d7 rollout speedup and a projected ~2.5\u00d7 end-to-end training speedup at 235B scale on 2048 GB200 GPUs \u2014 and the technique is already available in NeMo RL v0.6.0 under Apache 2.0.<\/li>\n<\/ul>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out\u00a0the\u00a0<strong><a href=\"https:\/\/arxiv.org\/pdf\/2604.26779\" target=\"_blank\" rel=\"noreferrer noopener\">Full Paper<\/a><\/strong> and <strong><a href=\"https:\/\/github.com\/NVIDIA-NeMo\/RL\/\" target=\"_blank\" rel=\"noreferrer noopener\">Nemo RL Repo<\/a><\/strong>.<strong>\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">130k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.?\u00a0<strong><a href=\"https:\/\/forms.gle\/MTNLpmJtsFA3VRVd9\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Connect with us<\/mark><\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/05\/01\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/\">A New NVIDIA Research Shows Speculative Decoding in NeMo RL Achieves 1.8\u00d7 Rollout Generation Speedup at 8B and Projects 2.5\u00d7 End-to-End Speedup at 235B<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>If you have been running reinforcement learning (RL) post-training on a language model for math reasoning, code generation, or any verifiable task, you have almost certainly stared at a progress bar while your GPU cluster burns through rollout generation. A team of researchers from NVIDIA proposes a precise fix by integrating speculative decoding into the RL training loop itself, and do it in a way that preserves the target model\u2019s exact output distribution. The research team integrated speculative decoding directly into NeMo RL v0.6.0 with a vLLM backend, delivering lossless rollout acceleration at both 8B and projected 235B model scales.The latest NeMo RL v0.6.0 release officially ships speculative decoding as a supported feature alongside the SGLang backend, the Muon optimizer, and YaRN long-context training. https:\/\/arxiv.org\/pdf\/2604.26779 Why Rollout Generation is the Bottleneck To understand the problem, it helps to know how a synchronous RL training step breaks down. In NeMo RL, each step consists of five stages: data loading, weight synchronization and backend preparation (prepare), rollout generation (gen), log-probability recomputation (logprob), and policy optimization (train). The research team measured this breakdown on Qwen3-8B under two workloads \u2014 RL-Think, which continues training a reasoning-capable model, and RL-Zero, which starts from a base model and learns reasoning from scratch. In both cases, rollout generation accounts for 65\u201372% of total step time. Log-probability recomputation and training together take only about 27\u201333%. This makes generation the only stage worth targeting for acceleration, and the one that determines the ceiling for any rollout-side optimization. https:\/\/arxiv.org\/pdf\/2604.26779 What Speculative Decoding Actually Does Speculative decoding is a technique where a smaller, faster draft model proposes several tokens at once, and the larger target model (the one you are actually training) verifies them using a rejection sampling procedure. The key property and why it matters for RL, is that the rejection procedure is mathematically guaranteed to produce the same output distribution as if the target model had generated those tokens autoregressively. No distribution mismatch, no off-policy corrections needed, no change to the training signal. This is important because in RL post-training, the training reward depends on the policy\u2019s own samples. Methods like asynchronous execution, off-policy replay, or low-precision rollouts all trade some amount of training fidelity for throughput. Speculative decoding trades nothing: the rollouts are identical in distribution to what the target model would have generated on its own, just produced faster. The System Integration Challenge Adding a draft model to a serving backend is straightforward. Adding one to an RL training loop is not. Every time the policy updates, the rollout engine must receive new weights. The draft model must remain aligned with the evolving policy. Log-probabilities, KL penalties, and the GRPO policy loss must all be computed against the target (verifier) policy not the draft or the optimization target is silently corrupted. The NVIDIA research team handles this in NeMo RL with a two-path architecture. The general path uses EAGLE-3, a drafting framework that works with any pretrained model without requiring native multi-token prediction (MTP) support. A native path is also available for models that ship with built-in MTP heads. When online draft adaptation is enabled, the hidden states and log-probabilities from the MegatronLM verifier forward pass are cached and reused to supervise the draft head via a gradient-detached pathway, so draft training never interferes with the policy gradient signal. Measured Results at 8B Scale On 32 GB200 GPUs (8 GB200 NVL72 nodes, 4 GPUs per node), EAGLE-3 reduces generation latency from 100 seconds to 56.6 seconds on RL-Zero \u2014 a 1.8\u00d7 generation speedup. On RL-Think, it drops from 133.6 seconds to 87.0 seconds, a 1.54\u00d7 speedup. Because log-probability re-computation and training are unchanged, these generation-side gains translate to overall step speedups of 1.41\u00d7 on RL-Zero and 1.35\u00d7 on RL-Think. Validation accuracy on AIME-2024 evolves identically under autoregressive and speculative decoding throughout training, confirming that the lossless guarantee holds in practice. The research team also tests n-gram drafting as a model-free speculative baseline. Despite achieving acceptance lengths of 2.47 on RL-Zero and 2.05 on RL-Think, n-gram drafting is slower than the autoregressive baseline in both settings \u2014 0.7\u00d7 and 0.5\u00d7 respectively. This is a critical finding for practitioners: a positive acceptance length is necessary but not sufficient. If the verification overhead is high enough, speculation makes things worse. Three Configuration Decisions That Determine Realized Speedup The research team isolates three operational choices that practitioners must get right. Draft initialization matters more than generic drafting ability. An EAGLE-3 draft initialized on the DAPO post-training dataset achieves a 1.77\u00d7 generation speedup on RL-Zero, while a draft initialized on the general-purpose UltraChat and Magpie datasets achieves only 1.51\u00d7 at the same draft length. The draft must be aligned with the actual rollout distribution encountered during RL, not just a broad chat distribution. Draft length has a non-obvious optimum. At draft length k=3, RL-Zero achieves 1.77\u00d7 speedup and RL-Think achieves 1.53\u00d7. Increasing to k=5 raises the acceptance length but drops speedup to 1.44\u00d7 on RL-Zero and 0.84\u00d7 on RL-Think \u2014 the latter already slower than autoregressive. At k=7, RL-Zero drops further to 1.21\u00d7 and RL-Think to 0.71\u00d7. The contrast matters: RL-Zero\u2019s rollouts are generated from a base model starting with short outputs, making them easier for the draft to predict even at high k. RL-Think\u2019s fully developed reasoning traces are harder to speculate over, so the overhead of longer drafts erases the benefit sooner. More speculative work per step can erase the benefit of higher acceptance entirely, especially in harder generation regimes. Online draft adaptation \u2014 updating the draft during RL using rollouts generated by the current policy helps most when the draft is weakly initialized. For a DAPO-initialized draft, offline and online configurations perform nearly identically (1.77\u00d7 vs. 1.78\u00d7 on RL-Zero). For a UltraChat-initialized draft, online updating improves speedup from 1.51\u00d7 to 1.63\u00d7 on RL-Zero. Interaction with asynchronous execution was also tested directly at 8B scale not just in simulation. The research team ran RL-Think at policy lag 1 in a 16-node non-colocated configuration, with 12 nodes dedicated to generation<\/p>","protected":false},"author":2,"featured_media":87595,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-87594","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>A New NVIDIA Research Shows Speculative Decoding in NeMo RL Achieves 1.8\u00d7 Rollout Generation Speedup at 8B and Projects 2.5\u00d7 End-to-End Speedup at 235B - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/fr\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/\" \/>\n<meta property=\"og:locale\" content=\"fr_FR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"A New NVIDIA Research Shows Speculative Decoding in NeMo RL Achieves 1.8\u00d7 Rollout Generation Speedup at 8B and Projects 2.5\u00d7 End-to-End Speedup at 235B - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/fr\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2026-05-02T15:53:46+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"\u00c9crit par\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Dur\u00e9e de lecture estim\u00e9e\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"A New NVIDIA Research Shows Speculative Decoding in NeMo RL Achieves 1.8\u00d7 Rollout Generation Speedup at 8B and Projects 2.5\u00d7 End-to-End Speedup at 235B\",\"datePublished\":\"2026-05-02T15:53:46+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/\"},\"wordCount\":1445,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-01-at-8.46.43-PM-1-21IkEG.webp\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"fr-FR\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/\",\"url\":\"https:\/\/youzum.net\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/\",\"name\":\"A New NVIDIA Research Shows Speculative Decoding in NeMo RL Achieves 1.8\u00d7 Rollout Generation Speedup at 8B and Projects 2.5\u00d7 End-to-End Speedup at 235B - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-01-at-8.46.43-PM-1-21IkEG.webp\",\"datePublished\":\"2026-05-02T15:53:46+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/#breadcrumb\"},\"inLanguage\":\"fr-FR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\/\/youzum.net\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/#primaryimage\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-01-at-8.46.43-PM-1-21IkEG.webp\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-01-at-8.46.43-PM-1-21IkEG.webp\",\"width\":1500,\"height\":734},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"A New NVIDIA Research Shows Speculative Decoding in NeMo RL Achieves 1.8\u00d7 Rollout Generation Speedup at 8B and Projects 2.5\u00d7 End-to-End Speedup at 235B\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"fr-FR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/fr\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"A New NVIDIA Research Shows Speculative Decoding in NeMo RL Achieves 1.8\u00d7 Rollout Generation Speedup at 8B and Projects 2.5\u00d7 End-to-End Speedup at 235B - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/fr\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/","og_locale":"fr_FR","og_type":"article","og_title":"A New NVIDIA Research Shows Speculative Decoding in NeMo RL Achieves 1.8\u00d7 Rollout Generation Speedup at 8B and Projects 2.5\u00d7 End-to-End Speedup at 235B - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/fr\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2026-05-02T15:53:46+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"\u00c9crit par":"admin NU","Dur\u00e9e de lecture estim\u00e9e":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"A New NVIDIA Research Shows Speculative Decoding in NeMo RL Achieves 1.8\u00d7 Rollout Generation Speedup at 8B and Projects 2.5\u00d7 End-to-End Speedup at 235B","datePublished":"2026-05-02T15:53:46+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/"},"wordCount":1445,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-01-at-8.46.43-PM-1-21IkEG.webp","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"fr-FR","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/","url":"https:\/\/youzum.net\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/","name":"A New NVIDIA Research Shows Speculative Decoding in NeMo RL Achieves 1.8\u00d7 Rollout Generation Speedup at 8B and Projects 2.5\u00d7 End-to-End Speedup at 235B - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-01-at-8.46.43-PM-1-21IkEG.webp","datePublished":"2026-05-02T15:53:46+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/#breadcrumb"},"inLanguage":"fr-FR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/"]}]},{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/youzum.net\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/#primaryimage","url":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-01-at-8.46.43-PM-1-21IkEG.webp","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-01-at-8.46.43-PM-1-21IkEG.webp","width":1500,"height":734},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/a-new-nvidia-research-shows-speculative-decoding-in-nemo-rl-achieves-1-8x-rollout-generation-speedup-at-8b-and-projects-2-5x-end-to-end-speedup-at-235b\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"A New NVIDIA Research Shows Speculative Decoding in NeMo RL Achieves 1.8\u00d7 Rollout Generation Speedup at 8B and Projects 2.5\u00d7 End-to-End Speedup at 235B"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"fr-FR"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/fr\/members\/adminnu\/"}]}},"rttpg_featured_image_url":{"full":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-01-at-8.46.43-PM-1-21IkEG.webp",1500,734,false],"landscape":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-01-at-8.46.43-PM-1-21IkEG.webp",1500,734,false],"portraits":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-01-at-8.46.43-PM-1-21IkEG.webp",1500,734,false],"thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-01-at-8.46.43-PM-1-21IkEG-150x150.webp",150,150,true],"medium":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-01-at-8.46.43-PM-1-21IkEG-300x147.webp",300,147,true],"large":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-01-at-8.46.43-PM-1-21IkEG-1024x501.webp",1024,501,true],"1536x1536":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-01-at-8.46.43-PM-1-21IkEG.webp",1500,734,false],"2048x2048":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-01-at-8.46.43-PM-1-21IkEG.webp",1500,734,false],"trp-custom-language-flag":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-01-at-8.46.43-PM-1-21IkEG-18x9.webp",18,9,true],"woocommerce_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-01-at-8.46.43-PM-1-21IkEG-300x300.webp",300,300,true],"woocommerce_single":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-01-at-8.46.43-PM-1-21IkEG-600x294.webp",600,294,true],"woocommerce_gallery_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-01-at-8.46.43-PM-1-21IkEG-100x100.webp",100,100,true]},"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/fr\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/fr\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/fr\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/fr\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/fr\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"If you have been running reinforcement learning (RL) post-training on a language model for math reasoning, code generation, or any verifiable task, you have almost certainly stared at a progress bar while your GPU cluster burns through rollout generation. A team of researchers from NVIDIA proposes a precise fix by integrating speculative decoding into the\u2026","_links":{"self":[{"href":"https:\/\/youzum.net\/fr\/wp-json\/wp\/v2\/posts\/87594","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/fr\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/fr\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/fr\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/fr\/wp-json\/wp\/v2\/comments?post=87594"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/fr\/wp-json\/wp\/v2\/posts\/87594\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/youzum.net\/fr\/wp-json\/wp\/v2\/media\/87595"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/fr\/wp-json\/wp\/v2\/media?parent=87594"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/fr\/wp-json\/wp\/v2\/categories?post=87594"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/fr\/wp-json\/wp\/v2\/tags?post=87594"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}