{"id":35034,"date":"2025-08-30T06:53:28","date_gmt":"2025-08-30T06:53:28","guid":{"rendered":"https:\/\/youzum.net\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/"},"modified":"2025-08-30T06:53:28","modified_gmt":"2025-08-30T06:53:28","slug":"microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance","status":"publish","type":"post","link":"https:\/\/youzum.net\/th\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/","title":{"rendered":"Microsoft AI Introduces rStar2-Agent: A 14B Math Reasoning Model Trained with Agentic Reinforcement Learning to Achieve Frontier-Level Performance"},"content":{"rendered":"<div class=\"wp-block-yoast-seo-table-of-contents yoast-table-of-contents\">\n<h3><em><strong>Table of contents<\/strong><\/em><\/h3>\n<ul>\n<li><a href=\"https:\/\/www.marktechpost.com\/2025\/08\/29\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/#the-problem-with-thinking-longer\" data-level=\"3\">The Problem with \u201cThinking Longer\u201d<\/a><\/li>\n<li><a href=\"https:\/\/www.marktechpost.com\/2025\/08\/29\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/#the-agentic-approach\" data-level=\"3\">The Agentic Approach<\/a><\/li>\n<li><a href=\"https:\/\/www.marktechpost.com\/2025\/08\/29\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/#infrastructure-challenges-and-solutions\" data-level=\"3\">Infrastructure Challenges and Solutions<\/a><\/li>\n<li><a href=\"https:\/\/www.marktechpost.com\/2025\/08\/29\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/#grpo-roc-learning-from-high-quality-examples\" data-level=\"3\">GRPO-RoC: Learning from High-Quality Examples<\/a><\/li>\n<li><a href=\"https:\/\/www.marktechpost.com\/2025\/08\/29\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/#training-strategy-from-simple-to-complex\" data-level=\"3\">Training Strategy: From Simple to Complex<\/a><\/li>\n<li><a href=\"https:\/\/www.marktechpost.com\/2025\/08\/29\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/#breakthrough-results\" data-level=\"3\">Breakthrough Results<\/a><\/li>\n<li><a href=\"https:\/\/www.marktechpost.com\/2025\/08\/29\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/#understanding-the-mechanisms\" data-level=\"3\">Understanding the Mechanisms<\/a><\/li>\n<li><a href=\"https:\/\/www.marktechpost.com\/2025\/08\/29\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/#implications-for-ai-development\" data-level=\"3\">\u0e2a\u0e23\u0e38\u0e1b<\/a><\/li>\n<\/ul>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>The Problem with \u201cThinking Longer\u201d<\/strong><\/h3>\n<p>Large language models have made impressive strides in mathematical reasoning by extending their Chain-of-Thought (CoT) processes\u2014essentially \u201cthinking longer\u201d through more detailed reasoning steps. However, this approach has fundamental limitations. When models encounter subtle errors in their reasoning chains, they often compound these mistakes rather than detecting and correcting them. Internal self-reflection frequently fails, especially when the initial reasoning approach is fundamentally flawed.<\/p>\n<p>Microsoft\u2019s new research report introduces rStar2-Agent, that takes a different approach: instead of just thinking longer, it teaches models to think smarter by actively using coding tools to verify, explore, and refine their reasoning process.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"609\" data-attachment-id=\"74145\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/08\/29\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/screenshot-2025-08-29-at-11-39-21-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.39.21-PM-1.png\" data-orig-size=\"1894,1126\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2025-08-29 at 11.39.21\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.39.21-PM-1-300x178.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.39.21-PM-1-1024x609.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.39.21-PM-1-1024x609.png\" alt=\"\" class=\"wp-image-74145\" \/><figcaption class=\"wp-element-caption\">https:\/\/arxiv.org\/abs\/2508.20722<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>The Agentic Approach<\/strong><\/h3>\n<p>rStar2-Agent represents a shift toward agentic reinforcement learning, where a 14B parameter model interacts with a Python execution environment throughout its reasoning process. Rather than relying solely on internal reflection, the model can write code, execute it, analyze the results, and adjust its approach based on concrete feedback.<\/p>\n<p>This creates a dynamic problem-solving process. When the model encounters a complex mathematical problem, it might generate initial reasoning, write Python code to test hypotheses, analyze execution results, and iterate toward a solution. The approach mirrors how human mathematicians often work\u2014using computational tools to verify intuitions and explore different solution paths.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Infrastructure Challenges and Solutions<\/strong><\/h3>\n<p>Scaling agentic RL presents significant technical hurdles. During training, a single batch can generate tens of thousands of concurrent code execution requests, creating bottlenecks that can stall GPU utilization. The researchers addressed this with two key infrastructure innovations.<\/p>\n<p>First, they built a distributed code execution service capable of handling 45,000 concurrent tool calls with sub-second latency. The system isolates code execution from the main training process while maintaining high throughput through careful load balancing across CPU workers.<\/p>\n<p>Second, they developed a dynamic rollout scheduler that allocates computational work based on real-time GPU cache availability rather than static assignment. This prevents GPU idle time caused by uneven workload distribution\u2014a common problem when some reasoning traces require significantly more computation than others.<\/p>\n<p>These infrastructure improvements enabled the entire training process to complete in just one week using 64 AMD MI300X GPUs, demonstrating that frontier-level reasoning capabilities don\u2019t require massive computational resources when efficiently orchestrated.<\/p>\n<h3 class=\"wp-block-heading\"><strong>GRPO-RoC: Learning from High-Quality Examples<\/strong><\/h3>\n<p>The core algorithmic innovation is Group Relative Policy Optimization with Resampling on Correct (GRPO-RoC). Traditional reinforcement learning in this context faces a quality problem: models receive positive rewards for correct final answers even when their reasoning process includes multiple code errors or inefficient tool usage.<\/p>\n<p><strong>GRPO-RoC addresses this by implementing an asymmetric sampling strategy. During training, the algorithm:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Oversamples<\/strong> initial rollouts to create a larger pool of reasoning traces<\/li>\n<li><strong>Preserves diversity<\/strong> in failed attempts to maintain learning from various error modes<\/li>\n<li><strong>Filters positive examples<\/strong> to emphasize traces with minimal tool errors and cleaner formatting<\/li>\n<\/ul>\n<p>This approach ensures the model learns from high-quality successful reasoning while still exposure to diverse failure patterns. The result is more efficient tool usage and shorter, more focused reasoning traces.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img decoding=\"async\" width=\"1024\" height=\"637\" data-attachment-id=\"74141\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/08\/29\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/screenshot-2025-08-29-at-11-38-02-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.38.02-PM-1.png\" data-orig-size=\"2080,1294\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2025-08-29 at 11.38.02\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.38.02-PM-1-300x187.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.38.02-PM-1-1024x637.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.38.02-PM-1-1024x637.png\" alt=\"\" class=\"wp-image-74141\" \/><figcaption class=\"wp-element-caption\">https:\/\/arxiv.org\/abs\/2508.20722<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Training Strategy: From Simple to Complex<\/strong><\/h3>\n<p>The training process unfolds in three carefully designed stages, starting with non-reasoning supervised fine-tuning that focuses purely on instruction following and tool formatting\u2014deliberately avoiding complex reasoning examples that might create early biases.<\/p>\n<p><strong>Stage 1<\/strong> constrains responses to 8,000 tokens, forcing the model to develop concise reasoning strategies. Despite this limitation, performance jumps dramatically\u2014from near-zero to over 70% on challenging benchmarks.<\/p>\n<p><strong>Stage 2<\/strong> extends the token limit to 12,000, allowing for more complex reasoning while maintaining the efficiency gains from the first stage.<\/p>\n<p><strong>Stage 3<\/strong> shifts focus to the most difficult problems by filtering out those the model has already mastered, ensuring continued learning from challenging cases.<\/p>\n<p>This progression from concise to extended reasoning, combined with increasing problem difficulty, maximizes learning efficiency while minimizing computational overhead.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Breakthrough Results<\/strong><\/h3>\n<p>The results are striking. rStar2-Agent-14B achieves 80.6% accuracy on AIME24 and 69.8% on AIME25, surpassing much larger models including the 671B parameter DeepSeek-R1. Perhaps more importantly, it accomplishes this with significantly shorter reasoning traces\u2014averaging around 10,000 tokens compared to over 17,000 for comparable models.<\/p>\n<p>The efficiency gains extend beyond mathematics. Despite training exclusively on math problems, the model demonstrates strong transfer learning, outperforming specialized models on scientific reasoning benchmarks and maintaining competitive performance on general alignment tasks.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img decoding=\"async\" width=\"1024\" height=\"460\" data-attachment-id=\"74143\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/08\/29\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/screenshot-2025-08-29-at-11-38-56-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.38.56-PM-1.png\" data-orig-size=\"2072,930\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2025-08-29 at 11.38.56\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.38.56-PM-1-300x135.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.38.56-PM-1-1024x460.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.38.56-PM-1-1024x460.png\" alt=\"\" class=\"wp-image-74143\" \/><figcaption class=\"wp-element-caption\">https:\/\/arxiv.org\/abs\/2508.20722<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Understanding the Mechanisms<\/strong><\/h3>\n<p>Analysis of the trained model reveals fascinating behavioral patterns. High-entropy tokens in reasoning traces fall into two categories: traditional \u201cforking tokens\u201d that trigger self-reflection and exploration, and a new category of \u201creflection tokens\u201d that emerge specifically in response to tool feedback.<\/p>\n<p>These reflection tokens represent a form of environment-driven reasoning where the model carefully analyzes code execution results, diagnoses errors, and adjusts its approach accordingly. This creates more sophisticated problem-solving behavior than pure CoT reasoning can achieve.<\/p>\n<h3 class=\"wp-block-heading\"><strong>\u0e2a\u0e23\u0e38\u0e1b<\/strong><\/h3>\n<p>rStar2-Agent demonstrates that moderate-sized models can achieve frontier-level reasoning through sophisticated training rather than brute-force scaling. The approach suggests a more sustainable path toward advanced AI capabilities\u2014one that emphasizes efficiency, tool integration, and smart training strategies over raw computational power.<\/p>\n<p>The success of this agentic approach also points toward future AI systems that can seamlessly integrate multiple tools and environments, moving beyond static text generation toward dynamic, interactive problem-solving capabilities.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/arxiv.org\/abs\/2508.20722\" target=\"_blank\" rel=\"noreferrer noopener\">Paper<\/a>\u00a0and\u00a0<a href=\"https:\/\/github.com\/microsoft\/rStar\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub Page<\/a>.<\/strong>\u00a0Feel free to check out our\u00a0<strong><mark><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub Page for Tutorials, Codes and Notebooks<\/a><\/mark><\/strong>.\u00a0Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">100k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>.<\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2025\/08\/29\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/\">Microsoft AI Introduces rStar2-Agent: A 14B Math Reasoning Model Trained with Agentic Reinforcement Learning to Achieve Frontier-Level Performance<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Table of contents The Problem with \u201cThinking Longer\u201d The Agentic Approach Infrastructure Challenges and Solutions GRPO-RoC: Learning from High-Quality Examples Training Strategy: From Simple to Complex Breakthrough Results Understanding the Mechanisms Summary The Problem with \u201cThinking Longer\u201d Large language models have made impressive strides in mathematical reasoning by extending their Chain-of-Thought (CoT) processes\u2014essentially \u201cthinking longer\u201d through more detailed reasoning steps. However, this approach has fundamental limitations. When models encounter subtle errors in their reasoning chains, they often compound these mistakes rather than detecting and correcting them. Internal self-reflection frequently fails, especially when the initial reasoning approach is fundamentally flawed. Microsoft\u2019s new research report introduces rStar2-Agent, that takes a different approach: instead of just thinking longer, it teaches models to think smarter by actively using coding tools to verify, explore, and refine their reasoning process. https:\/\/arxiv.org\/abs\/2508.20722 The Agentic Approach rStar2-Agent represents a shift toward agentic reinforcement learning, where a 14B parameter model interacts with a Python execution environment throughout its reasoning process. Rather than relying solely on internal reflection, the model can write code, execute it, analyze the results, and adjust its approach based on concrete feedback. This creates a dynamic problem-solving process. When the model encounters a complex mathematical problem, it might generate initial reasoning, write Python code to test hypotheses, analyze execution results, and iterate toward a solution. The approach mirrors how human mathematicians often work\u2014using computational tools to verify intuitions and explore different solution paths. Infrastructure Challenges and Solutions Scaling agentic RL presents significant technical hurdles. During training, a single batch can generate tens of thousands of concurrent code execution requests, creating bottlenecks that can stall GPU utilization. The researchers addressed this with two key infrastructure innovations. First, they built a distributed code execution service capable of handling 45,000 concurrent tool calls with sub-second latency. The system isolates code execution from the main training process while maintaining high throughput through careful load balancing across CPU workers. Second, they developed a dynamic rollout scheduler that allocates computational work based on real-time GPU cache availability rather than static assignment. This prevents GPU idle time caused by uneven workload distribution\u2014a common problem when some reasoning traces require significantly more computation than others. These infrastructure improvements enabled the entire training process to complete in just one week using 64 AMD MI300X GPUs, demonstrating that frontier-level reasoning capabilities don\u2019t require massive computational resources when efficiently orchestrated. GRPO-RoC: Learning from High-Quality Examples The core algorithmic innovation is Group Relative Policy Optimization with Resampling on Correct (GRPO-RoC). Traditional reinforcement learning in this context faces a quality problem: models receive positive rewards for correct final answers even when their reasoning process includes multiple code errors or inefficient tool usage. GRPO-RoC addresses this by implementing an asymmetric sampling strategy. During training, the algorithm: Oversamples initial rollouts to create a larger pool of reasoning traces Preserves diversity in failed attempts to maintain learning from various error modes Filters positive examples to emphasize traces with minimal tool errors and cleaner formatting This approach ensures the model learns from high-quality successful reasoning while still exposure to diverse failure patterns. The result is more efficient tool usage and shorter, more focused reasoning traces. https:\/\/arxiv.org\/abs\/2508.20722 Training Strategy: From Simple to Complex The training process unfolds in three carefully designed stages, starting with non-reasoning supervised fine-tuning that focuses purely on instruction following and tool formatting\u2014deliberately avoiding complex reasoning examples that might create early biases. Stage 1 constrains responses to 8,000 tokens, forcing the model to develop concise reasoning strategies. Despite this limitation, performance jumps dramatically\u2014from near-zero to over 70% on challenging benchmarks. Stage 2 extends the token limit to 12,000, allowing for more complex reasoning while maintaining the efficiency gains from the first stage. Stage 3 shifts focus to the most difficult problems by filtering out those the model has already mastered, ensuring continued learning from challenging cases. This progression from concise to extended reasoning, combined with increasing problem difficulty, maximizes learning efficiency while minimizing computational overhead. Breakthrough Results The results are striking. rStar2-Agent-14B achieves 80.6% accuracy on AIME24 and 69.8% on AIME25, surpassing much larger models including the 671B parameter DeepSeek-R1. Perhaps more importantly, it accomplishes this with significantly shorter reasoning traces\u2014averaging around 10,000 tokens compared to over 17,000 for comparable models. The efficiency gains extend beyond mathematics. Despite training exclusively on math problems, the model demonstrates strong transfer learning, outperforming specialized models on scientific reasoning benchmarks and maintaining competitive performance on general alignment tasks. https:\/\/arxiv.org\/abs\/2508.20722 Understanding the Mechanisms Analysis of the trained model reveals fascinating behavioral patterns. High-entropy tokens in reasoning traces fall into two categories: traditional \u201cforking tokens\u201d that trigger self-reflection and exploration, and a new category of \u201creflection tokens\u201d that emerge specifically in response to tool feedback. These reflection tokens represent a form of environment-driven reasoning where the model carefully analyzes code execution results, diagnoses errors, and adjusts its approach accordingly. This creates more sophisticated problem-solving behavior than pure CoT reasoning can achieve. Summary rStar2-Agent demonstrates that moderate-sized models can achieve frontier-level reasoning through sophisticated training rather than brute-force scaling. The approach suggests a more sustainable path toward advanced AI capabilities\u2014one that emphasizes efficiency, tool integration, and smart training strategies over raw computational power. The success of this agentic approach also points toward future AI systems that can seamlessly integrate multiple tools and environments, moving beyond static text generation toward dynamic, interactive problem-solving capabilities. Check out the\u00a0Paper\u00a0and\u00a0GitHub Page.\u00a0Feel free to check out our\u00a0GitHub Page for Tutorials, Codes and Notebooks.\u00a0Also,\u00a0feel free to follow us on\u00a0Twitter\u00a0and don\u2019t forget to join our\u00a0100k+ ML SubReddit\u00a0and Subscribe to\u00a0our Newsletter. The post Microsoft AI Introduces rStar2-Agent: A 14B Math Reasoning Model Trained with Agentic Reinforcement Learning to Achieve Frontier-Level Performance appeared first on MarkTechPost.<\/p>","protected":false},"author":2,"featured_media":35035,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-35034","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Microsoft AI Introduces rStar2-Agent: A 14B Math Reasoning Model Trained with Agentic Reinforcement Learning to Achieve Frontier-Level Performance - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/th\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/\" \/>\n<meta property=\"og:locale\" content=\"th_TH\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Microsoft AI Introduces rStar2-Agent: A 14B Math Reasoning Model Trained with Agentic Reinforcement Learning to Achieve Frontier-Level Performance - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/th\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-08-30T06:53:28+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 \u0e19\u0e32\u0e17\u0e35\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"Microsoft AI Introduces rStar2-Agent: A 14B Math Reasoning Model Trained with Agentic Reinforcement Learning to Achieve Frontier-Level Performance\",\"datePublished\":\"2025-08-30T06:53:28+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/\"},\"wordCount\":984,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.39.21-PM-1-1024x609-Q8CSpk.png\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"th\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/\",\"url\":\"https:\/\/youzum.net\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/\",\"name\":\"Microsoft AI Introduces rStar2-Agent: A 14B Math Reasoning Model Trained with Agentic Reinforcement Learning to Achieve Frontier-Level Performance - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.39.21-PM-1-1024x609-Q8CSpk.png\",\"datePublished\":\"2025-08-30T06:53:28+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/#breadcrumb\"},\"inLanguage\":\"th\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/youzum.net\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/#primaryimage\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.39.21-PM-1-1024x609-Q8CSpk.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.39.21-PM-1-1024x609-Q8CSpk.png\",\"width\":1024,\"height\":609},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Microsoft AI Introduces rStar2-Agent: A 14B Math Reasoning Model Trained with Agentic Reinforcement Learning to Achieve Frontier-Level Performance\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"th\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/th\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Microsoft AI Introduces rStar2-Agent: A 14B Math Reasoning Model Trained with Agentic Reinforcement Learning to Achieve Frontier-Level Performance - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/th\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/","og_locale":"th_TH","og_type":"article","og_title":"Microsoft AI Introduces rStar2-Agent: A 14B Math Reasoning Model Trained with Agentic Reinforcement Learning to Achieve Frontier-Level Performance - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/th\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2025-08-30T06:53:28+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Written by":"admin NU","Est. reading time":"5 \u0e19\u0e32\u0e17\u0e35"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"Microsoft AI Introduces rStar2-Agent: A 14B Math Reasoning Model Trained with Agentic Reinforcement Learning to Achieve Frontier-Level Performance","datePublished":"2025-08-30T06:53:28+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/"},"wordCount":984,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.39.21-PM-1-1024x609-Q8CSpk.png","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"th","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/","url":"https:\/\/youzum.net\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/","name":"Microsoft AI Introduces rStar2-Agent: A 14B Math Reasoning Model Trained with Agentic Reinforcement Learning to Achieve Frontier-Level Performance - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.39.21-PM-1-1024x609-Q8CSpk.png","datePublished":"2025-08-30T06:53:28+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/#breadcrumb"},"inLanguage":"th","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/"]}]},{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/youzum.net\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/#primaryimage","url":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.39.21-PM-1-1024x609-Q8CSpk.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.39.21-PM-1-1024x609-Q8CSpk.png","width":1024,"height":609},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/microsoft-ai-introduces-rstar2-agent-a-14b-math-reasoning-model-trained-with-agentic-reinforcement-learning-to-achieve-frontier-level-performance\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"Microsoft AI Introduces rStar2-Agent: A 14B Math Reasoning Model Trained with Agentic Reinforcement Learning to Achieve Frontier-Level Performance"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"th"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/th\/members\/adminnu\/"}]}},"rttpg_featured_image_url":{"full":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.39.21-PM-1-1024x609-Q8CSpk.png",1024,609,false],"landscape":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.39.21-PM-1-1024x609-Q8CSpk.png",1024,609,false],"portraits":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.39.21-PM-1-1024x609-Q8CSpk.png",1024,609,false],"thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.39.21-PM-1-1024x609-Q8CSpk-150x150.png",150,150,true],"medium":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.39.21-PM-1-1024x609-Q8CSpk-300x178.png",300,178,true],"large":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.39.21-PM-1-1024x609-Q8CSpk.png",1024,609,false],"1536x1536":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.39.21-PM-1-1024x609-Q8CSpk.png",1024,609,false],"2048x2048":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.39.21-PM-1-1024x609-Q8CSpk.png",1024,609,false],"trp-custom-language-flag":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.39.21-PM-1-1024x609-Q8CSpk-18x12.png",18,12,true],"woocommerce_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.39.21-PM-1-1024x609-Q8CSpk-300x300.png",300,300,true],"woocommerce_single":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.39.21-PM-1-1024x609-Q8CSpk-600x357.png",600,357,true],"woocommerce_gallery_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-29-at-11.39.21-PM-1-1024x609-Q8CSpk-100x100.png",100,100,true]},"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/th\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/th\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"Table of contents The Problem with \u201cThinking Longer\u201d The Agentic Approach Infrastructure Challenges and Solutions GRPO-RoC: Learning from High-Quality Examples Training Strategy: From Simple to Complex Breakthrough Results Understanding the Mechanisms Summary The Problem with \u201cThinking Longer\u201d Large language models have made impressive strides in mathematical reasoning by extending their Chain-of-Thought (CoT) processes\u2014essentially \u201cthinking longer\u201d&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts\/35034","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/comments?post=35034"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts\/35034\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/media\/35035"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/media?parent=35034"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/categories?post=35034"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/tags?post=35034"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}