{"id":44765,"date":"2025-10-16T07:02:47","date_gmt":"2025-10-16T07:02:47","guid":{"rendered":"https:\/\/youzum.net\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/"},"modified":"2025-10-16T07:02:47","modified_gmt":"2025-10-16T07:02:47","slug":"qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration","status":"publish","type":"post","link":"https:\/\/youzum.net\/es\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/","title":{"rendered":"QeRL: NVFP4-Quantized Reinforcement Learning (RL) Brings 32B LLM Training to a Single H100\u2014While Improving Exploration"},"content":{"rendered":"<p><strong>What would you build if you could run <\/strong><strong>Reinforcement Learning<\/strong> (<strong>RL) post-training on a 32B LLM in 4-bit NVFP4\u2014on a single H100\u2014with BF16-level accuracy and 1.2\u20131.5\u00d7 step speedups? <\/strong>NVIDIA researchers (with collaborators from MIT, HKU, and Tsinghua) have open-sourced <strong>QeRL (Quantization-enhanced Reinforcement Learning)<\/strong>, a training framework that pushes <strong>Reinforcement Learning<\/strong> (RL) post-training into <strong>4-bit FP4 (NVFP4)<\/strong> while keeping gradient math in higher precision via LoRA. The research team reports <strong>&gt;1.5\u00d7 speedups in the rollout phase<\/strong>, <strong>~1.8\u00d7 end-to-end vs QLoRA<\/strong> in one setting, and the <strong>first demonstration of RL training for a 32B policy on a single H100-80GB GPU<\/strong>. <\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"348\" data-attachment-id=\"75414\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/10\/15\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/screenshot-2025-10-15-at-9-18-13-pm\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.18.13-PM.png\" data-orig-size=\"1614,548\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2025-10-15 at 9.18.13\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.18.13-PM-300x102.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.18.13-PM-1024x348.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.18.13-PM-1024x348.png\" alt=\"\" class=\"wp-image-75414\" \/><figcaption class=\"wp-element-caption\">https:\/\/arxiv.org\/pdf\/2510.11696<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>What QeRL changes in the Reinforcement Learning (RL) loop<\/strong>?<\/h3>\n<p>Most RLHF\/GRPO\/DAPO pipelines spend the bulk of wall-clock time in <strong>rollouts<\/strong> (token generation). QeRL shifts the policy\u2019s <strong>weight path to NVFP4 (FP4) with dual-level scaling<\/strong> and keeps <strong>logits\/gradients in higher precision via LoRA<\/strong>, so backprop remains stable while the sampling path hits hardware-efficient FP4\u00d7BF16 kernels (Marlin). The result is faster prefill\/decoding during rollouts without maintaining a separate full-precision policy. <\/p>\n<p>Mechanically, the research team integrates <strong>Marlin-based FP4 kernels<\/strong> in both rollout and prefill, while LoRA limits trainable parameters. This directly targets the stage that dominates RL cost and latency for long reasoning traces.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img decoding=\"async\" width=\"1024\" height=\"598\" data-attachment-id=\"75416\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/10\/15\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/screenshot-2025-10-15-at-9-18-53-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.18.53-PM-1.png\" data-orig-size=\"1582,924\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2025-10-15 at 9.18.53\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.18.53-PM-1-300x175.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.18.53-PM-1-1024x598.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.18.53-PM-1-1024x598.png\" alt=\"\" class=\"wp-image-75416\" \/><figcaption class=\"wp-element-caption\">https:\/\/arxiv.org\/pdf\/2510.11696<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Quantization as exploration, made schedulable<\/strong><\/h3>\n<p>A core empirical finding: <strong>deterministic FP4 quantization raises policy entropy<\/strong>, flattening token distributions early in training and <strong>improving exploration<\/strong> versus 16-bit LoRA and NF4-based QLoRA baselines. To control that effect over time, QeRL introduces <strong>Adaptive Quantization Noise (AQN)<\/strong>\u2014<strong>channel-wise Gaussian perturbations<\/strong> mapped into <strong>LayerNorm scale parameters<\/strong> and annealed with an <strong>exponential schedule<\/strong>. This keeps kernel fusion intact (no extra weight tensors) while transitioning from exploration to exploitation.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img decoding=\"async\" width=\"1024\" height=\"313\" data-attachment-id=\"75418\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/10\/15\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/screenshot-2025-10-15-at-9-19-27-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.19.27-PM-1.png\" data-orig-size=\"1536,470\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2025-10-15 at 9.19.27\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.19.27-PM-1-300x92.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.19.27-PM-1-1024x313.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.19.27-PM-1-1024x313.png\" alt=\"\" class=\"wp-image-75418\" \/><\/figure>\n<\/div>\n<p>In ablations, QeRL shows <strong>faster reward growth<\/strong> and <strong>higher final scores<\/strong> on math-reasoning tasks under both <strong>GRPO<\/strong> and <strong>DAPO<\/strong>, aligning with the hypothesis that structured noise in parameter space can be a useful exploration driver in RL, even though such noise is typically detrimental in supervised fine-tuning.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Reported results<\/strong><\/h3>\n<p>On <strong>Qwen2.5<\/strong> backbone model, the research team show that <strong>NVFP4+LoRA<\/strong> outperforms <strong>vanilla LoRA<\/strong> and <strong>QLoRA<\/strong> in rollout throughput and overall training time, with <strong>&gt;2\u00d7 rollout throughput on 14B\/32B<\/strong> models against QLoRA and <strong>~1.8\u00d7 end-to-end vs QLoRA<\/strong> in a representative setup. They also demonstrate <strong>training a 32B policy with GRPO on a single H100-80GB<\/strong>, enabled by the lower memory footprint of weight-only FP4.<\/p>\n<p>Accuracy is competitive with higher-precision baselines. For a <strong>7B<\/strong> model, the research team reports <strong>GSM8K = 90.8%<\/strong> and <strong>MATH500 = 77.4%<\/strong>, <strong>surpassing 16-bit LoRA and QLoRA<\/strong> under their setup and <strong>matching full-parameter fine-tuning<\/strong>. Across broader math benchmarks (e.g., BigMath), QeRL maintains parity or advantage, while converging faster due to improved exploration. <\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"712\" data-attachment-id=\"75419\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/10\/15\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/screenshot-2025-10-15-at-9-20-16-pm\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.20.16-PM.png\" data-orig-size=\"1594,1108\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2025-10-15 at 9.20.16\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.20.16-PM-300x209.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.20.16-PM-1024x712.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.20.16-PM-1024x712.png\" alt=\"\" class=\"wp-image-75419\" \/><figcaption class=\"wp-element-caption\">https:\/\/arxiv.org\/pdf\/2510.11696<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>What this is\u2014and isn\u2019t?<\/strong><\/h3>\n<p>QeRL is <strong>weight-only FP4<\/strong> with <strong>LoRA updates<\/strong>; it does <strong>not<\/strong> claim FP4 precision for logits\/gradients. The benefits concentrate in <strong>rollout\/prefill throughput<\/strong> and <strong>memory footprint<\/strong>, with empirical evidence that <strong>quantization-induced entropy<\/strong> aids RL exploration when <strong>AQN<\/strong> modulates it over training. Generalization to modalities beyond math-reasoning tasks or to safety\/tool-use RL depends on reward design and sequence lengths.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li>QeRL combines NVFP4 4-bit weight quantization with LoRA to accelerate the rollout phase and cut memory, enabling RL for a 32B LLM on a single H100-80GB.<\/li>\n<li>Quantization acts as exploration: FP4 increases policy entropy, while Adaptive Quantization Noise (AQN) schedules channel-wise noise via LayerNorm scales.<\/li>\n<li>Reported efficiency: &gt;1.5\u00d7 rollout speedups vs 16-bit LoRA and ~1.8\u00d7 end-to-end vs QLoRA; &gt;2\u00d7 rollout throughput vs QLoRA on 14B\/32B setups.<\/li>\n<li>Accuracy holds: Qwen2.5-7B reaches 90.8% on GSM8K and 77.4% on MATH500, matching full-parameter fine-tuning under the paper\u2019s setup. <\/li>\n<li>NVFP4 is a hardware-optimized 4-bit floating format with two-level scaling (FP8 E4M3 block scalers + FP32 tensor scale), enabling efficient Marlin-based kernels. <\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\"><strong>Editorial Comments<\/strong><\/h3>\n<p>QeRL speeds up the RL rollout stage. It quantizes weights to NVFP4 and keeps updates and logits in higher precision using LoRA. It reports &gt;1.5\u00d7 rollout speedups and can train a 32B policy on a single H100-80GB GPU. It adds Adaptive Quantization Noise to make exploration a controlled signal during training. Results are shown mainly on math-reasoning tasks using GRPO and DAPO. The gains rely on NVFP4 kernel support such as Marlin.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/github.com\/NVlabs\/QeRL\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a>\u00a0<\/strong>and<strong>\u00a0<\/strong><a href=\"https:\/\/arxiv.org\/abs\/2510.11696\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>Paper<\/strong>.<\/a> Feel free to check out our\u00a0<strong><mark><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub Page for Tutorials, Codes and Notebooks<\/a><\/mark><\/strong>.\u00a0Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">100k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2025\/10\/15\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/\">QeRL: NVFP4-Quantized Reinforcement Learning (RL) Brings 32B LLM Training to a Single H100\u2014While Improving Exploration<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>What would you build if you could run Reinforcement Learning (RL) post-training on a 32B LLM in 4-bit NVFP4\u2014on a single H100\u2014with BF16-level accuracy and 1.2\u20131.5\u00d7 step speedups? NVIDIA researchers (with collaborators from MIT, HKU, and Tsinghua) have open-sourced QeRL (Quantization-enhanced Reinforcement Learning), a training framework that pushes Reinforcement Learning (RL) post-training into 4-bit FP4 (NVFP4) while keeping gradient math in higher precision via LoRA. The research team reports &gt;1.5\u00d7 speedups in the rollout phase, ~1.8\u00d7 end-to-end vs QLoRA in one setting, and the first demonstration of RL training for a 32B policy on a single H100-80GB GPU. https:\/\/arxiv.org\/pdf\/2510.11696 What QeRL changes in the Reinforcement Learning (RL) loop? Most RLHF\/GRPO\/DAPO pipelines spend the bulk of wall-clock time in rollouts (token generation). QeRL shifts the policy\u2019s weight path to NVFP4 (FP4) with dual-level scaling and keeps logits\/gradients in higher precision via LoRA, so backprop remains stable while the sampling path hits hardware-efficient FP4\u00d7BF16 kernels (Marlin). The result is faster prefill\/decoding during rollouts without maintaining a separate full-precision policy. Mechanically, the research team integrates Marlin-based FP4 kernels in both rollout and prefill, while LoRA limits trainable parameters. This directly targets the stage that dominates RL cost and latency for long reasoning traces. https:\/\/arxiv.org\/pdf\/2510.11696 Quantization as exploration, made schedulable A core empirical finding: deterministic FP4 quantization raises policy entropy, flattening token distributions early in training and improving exploration versus 16-bit LoRA and NF4-based QLoRA baselines. To control that effect over time, QeRL introduces Adaptive Quantization Noise (AQN)\u2014channel-wise Gaussian perturbations mapped into LayerNorm scale parameters and annealed with an exponential schedule. This keeps kernel fusion intact (no extra weight tensors) while transitioning from exploration to exploitation. In ablations, QeRL shows faster reward growth and higher final scores on math-reasoning tasks under both GRPO and DAPO, aligning with the hypothesis that structured noise in parameter space can be a useful exploration driver in RL, even though such noise is typically detrimental in supervised fine-tuning. Reported results On Qwen2.5 backbone model, the research team show that NVFP4+LoRA outperforms vanilla LoRA and QLoRA in rollout throughput and overall training time, with &gt;2\u00d7 rollout throughput on 14B\/32B models against QLoRA and ~1.8\u00d7 end-to-end vs QLoRA in a representative setup. They also demonstrate training a 32B policy with GRPO on a single H100-80GB, enabled by the lower memory footprint of weight-only FP4. Accuracy is competitive with higher-precision baselines. For a 7B model, the research team reports GSM8K = 90.8% and MATH500 = 77.4%, surpassing 16-bit LoRA and QLoRA under their setup and matching full-parameter fine-tuning. Across broader math benchmarks (e.g., BigMath), QeRL maintains parity or advantage, while converging faster due to improved exploration. https:\/\/arxiv.org\/pdf\/2510.11696 What this is\u2014and isn\u2019t? QeRL is weight-only FP4 with LoRA updates; it does not claim FP4 precision for logits\/gradients. The benefits concentrate in rollout\/prefill throughput and memory footprint, with empirical evidence that quantization-induced entropy aids RL exploration when AQN modulates it over training. Generalization to modalities beyond math-reasoning tasks or to safety\/tool-use RL depends on reward design and sequence lengths. Key Takeaways QeRL combines NVFP4 4-bit weight quantization with LoRA to accelerate the rollout phase and cut memory, enabling RL for a 32B LLM on a single H100-80GB. Quantization acts as exploration: FP4 increases policy entropy, while Adaptive Quantization Noise (AQN) schedules channel-wise noise via LayerNorm scales. Reported efficiency: &gt;1.5\u00d7 rollout speedups vs 16-bit LoRA and ~1.8\u00d7 end-to-end vs QLoRA; &gt;2\u00d7 rollout throughput vs QLoRA on 14B\/32B setups. Accuracy holds: Qwen2.5-7B reaches 90.8% on GSM8K and 77.4% on MATH500, matching full-parameter fine-tuning under the paper\u2019s setup. NVFP4 is a hardware-optimized 4-bit floating format with two-level scaling (FP8 E4M3 block scalers + FP32 tensor scale), enabling efficient Marlin-based kernels. Editorial Comments QeRL speeds up the RL rollout stage. It quantizes weights to NVFP4 and keeps updates and logits in higher precision using LoRA. It reports &gt;1.5\u00d7 rollout speedups and can train a 32B policy on a single H100-80GB GPU. It adds Adaptive Quantization Noise to make exploration a controlled signal during training. Results are shown mainly on math-reasoning tasks using GRPO and DAPO. The gains rely on NVFP4 kernel support such as Marlin. Check out the\u00a0FULL CODES here\u00a0and\u00a0Paper. Feel free to check out our\u00a0GitHub Page for Tutorials, Codes and Notebooks.\u00a0Also,\u00a0feel free to follow us on\u00a0Twitter\u00a0and don\u2019t forget to join our\u00a0100k+ ML SubReddit\u00a0and Subscribe to\u00a0our Newsletter. Wait! are you on telegram?\u00a0now you can join us on telegram as well. The post QeRL: NVFP4-Quantized Reinforcement Learning (RL) Brings 32B LLM Training to a Single H100\u2014While Improving Exploration appeared first on MarkTechPost.<\/p>","protected":false},"author":2,"featured_media":44766,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-44765","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>QeRL: NVFP4-Quantized Reinforcement Learning (RL) Brings 32B LLM Training to a Single H100\u2014While Improving Exploration - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/es\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/\" \/>\n<meta property=\"og:locale\" content=\"es_ES\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"QeRL: NVFP4-Quantized Reinforcement Learning (RL) Brings 32B LLM Training to a Single H100\u2014While Improving Exploration - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/es\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-10-16T07:02:47+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Escrito por\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Tiempo de lectura\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutos\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"QeRL: NVFP4-Quantized Reinforcement Learning (RL) Brings 32B LLM Training to a Single H100\u2014While Improving Exploration\",\"datePublished\":\"2025-10-16T07:02:47+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/\"},\"wordCount\":807,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.18.13-PM-1024x348-MgRyHc.webp\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"es\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/\",\"url\":\"https:\/\/youzum.net\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/\",\"name\":\"QeRL: NVFP4-Quantized Reinforcement Learning (RL) Brings 32B LLM Training to a Single H100\u2014While Improving Exploration - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.18.13-PM-1024x348-MgRyHc.webp\",\"datePublished\":\"2025-10-16T07:02:47+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/#breadcrumb\"},\"inLanguage\":\"es\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"es\",\"@id\":\"https:\/\/youzum.net\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/#primaryimage\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.18.13-PM-1024x348-MgRyHc.webp\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.18.13-PM-1024x348-MgRyHc.webp\",\"width\":1024,\"height\":348},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"QeRL: NVFP4-Quantized Reinforcement Learning (RL) Brings 32B LLM Training to a Single H100\u2014While Improving Exploration\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"es\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"es\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"es\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/es\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"QeRL: NVFP4-Quantized Reinforcement Learning (RL) Brings 32B LLM Training to a Single H100\u2014While Improving Exploration - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/es\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/","og_locale":"es_ES","og_type":"article","og_title":"QeRL: NVFP4-Quantized Reinforcement Learning (RL) Brings 32B LLM Training to a Single H100\u2014While Improving Exploration - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/es\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2025-10-16T07:02:47+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Escrito por":"admin NU","Tiempo de lectura":"4 minutos"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"QeRL: NVFP4-Quantized Reinforcement Learning (RL) Brings 32B LLM Training to a Single H100\u2014While Improving Exploration","datePublished":"2025-10-16T07:02:47+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/"},"wordCount":807,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.18.13-PM-1024x348-MgRyHc.webp","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"es","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/","url":"https:\/\/youzum.net\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/","name":"QeRL: NVFP4-Quantized Reinforcement Learning (RL) Brings 32B LLM Training to a Single H100\u2014While Improving Exploration - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.18.13-PM-1024x348-MgRyHc.webp","datePublished":"2025-10-16T07:02:47+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/#breadcrumb"},"inLanguage":"es","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/"]}]},{"@type":"ImageObject","inLanguage":"es","@id":"https:\/\/youzum.net\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/#primaryimage","url":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.18.13-PM-1024x348-MgRyHc.webp","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.18.13-PM-1024x348-MgRyHc.webp","width":1024,"height":348},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/qerl-nvfp4-quantized-reinforcement-learning-rl-brings-32b-llm-training-to-a-single-h100-while-improving-exploration\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"QeRL: NVFP4-Quantized Reinforcement Learning (RL) Brings 32B LLM Training to a Single H100\u2014While Improving Exploration"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"es"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"es","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"es","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/es\/members\/adminnu\/"}]}},"rttpg_featured_image_url":{"full":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.18.13-PM-1024x348-MgRyHc.webp",1024,348,false],"landscape":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.18.13-PM-1024x348-MgRyHc.webp",1024,348,false],"portraits":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.18.13-PM-1024x348-MgRyHc.webp",1024,348,false],"thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.18.13-PM-1024x348-MgRyHc-150x150.webp",150,150,true],"medium":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.18.13-PM-1024x348-MgRyHc-300x102.webp",300,102,true],"large":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.18.13-PM-1024x348-MgRyHc.webp",1024,348,false],"1536x1536":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.18.13-PM-1024x348-MgRyHc.webp",1024,348,false],"2048x2048":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.18.13-PM-1024x348-MgRyHc.webp",1024,348,false],"trp-custom-language-flag":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.18.13-PM-1024x348-MgRyHc-18x6.webp",18,6,true],"woocommerce_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.18.13-PM-1024x348-MgRyHc-300x300.webp",300,300,true],"woocommerce_single":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.18.13-PM-1024x348-MgRyHc-600x204.webp",600,204,true],"woocommerce_gallery_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-15-at-9.18.13-PM-1024x348-MgRyHc-100x100.webp",100,100,true]},"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/es\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/es\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/es\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/es\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/es\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"What would you build if you could run Reinforcement Learning (RL) post-training on a 32B LLM in 4-bit NVFP4\u2014on a single H100\u2014with BF16-level accuracy and 1.2\u20131.5\u00d7 step speedups? NVIDIA researchers (with collaborators from MIT, HKU, and Tsinghua) have open-sourced QeRL (Quantization-enhanced Reinforcement Learning), a training framework that pushes Reinforcement Learning (RL) post-training into 4-bit FP4&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/posts\/44765","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/comments?post=44765"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/posts\/44765\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/media\/44766"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/media?parent=44765"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/categories?post=44765"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/tags?post=44765"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}