{"id":23300,"date":"2025-07-05T05:11:27","date_gmt":"2025-07-05T05:11:27","guid":{"rendered":"https:\/\/youzum.net\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/"},"modified":"2025-07-05T05:11:27","modified_gmt":"2025-07-05T05:11:27","slug":"can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains","status":"publish","type":"post","link":"https:\/\/youzum.net\/th\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/","title":{"rendered":"Can We Improve Llama 3\u2019s Reasoning Through Post-Training Alone? ASTRO Shows +16% to +20% Benchmark Gains"},"content":{"rendered":"<p>Improving the reasoning capabilities of large language models (LLMs) without architectural changes is a core challenge in advancing AI alignment and usability. Researchers at Meta AI and the University of Washington have introduced <strong>ASTRO<\/strong>\u2014<strong>Autoregressive Search-Taught Reasoner<\/strong>\u2014a novel post-training framework designed to enhance reasoning in <strong>Llama-3.1-70B-Instruct<\/strong>. ASTRO is unique in teaching models to perform <em>in-context search<\/em>, <em>self-reflection<\/em>, and <em>backtracking<\/em>, mechanisms often associated with human problem-solving and traditional symbolic search algorithms. Through this approach, ASTRO boosts Llama 3\u2019s math performance on several competitive benchmarks with significant improvements:<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>MATH 500<\/strong>: 65.8% \u279d <strong>81.8%<\/strong><\/li>\n<li><strong>AMC 2023<\/strong>: 37.5% \u279d <strong>64.4%<\/strong><\/li>\n<li><strong>AIME 2024<\/strong>: 10.0% \u279d <strong>30.0%<\/strong><\/li>\n<\/ul>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"407\" data-attachment-id=\"72428\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/07\/04\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/screenshot-2025-07-04-at-10-17-45-am-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-04-at-10.17.45\u202fAM-1.png\" data-orig-size=\"1996,794\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2025-07-04 at 10.17.45\u202fAM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-04-at-10.17.45\u202fAM-1-300x119.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-04-at-10.17.45\u202fAM-1-1024x407.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-04-at-10.17.45%E2%80%AFAM-1-1024x407.png\" alt=\"\" class=\"wp-image-72428\" \/><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Search-Guided Chain-of-Thought Generation<\/strong><\/h3>\n<p>ASTRO\u2019s methodology begins with a <strong>Monte Carlo Tree Search (MCTS)<\/strong> over mathematical problem-solving trajectories. This search explores both correct and incorrect reasoning paths. The key innovation is <strong>procedure cloning<\/strong>: entire search trees are linearized into <strong>long chain-of-thoughts (CoT)<\/strong> that naturally encode both failures and recoveries via <em>self-reflection<\/em> and <em>backtracking<\/em>. These linearized traces are rewritten in natural language and used as the basis for supervised fine-tuning (SFT).<\/p>\n<p>This results in a model that doesn\u2019t just solve problems step-by-step but reevaluates its trajectory\u2014often backtracking after self-assessment to correct intermediate reasoning mistakes. For instance, the model may interject with phrases like \u201cLet\u2019s go back to where we set up the equation\u201d when its internal confidence drops.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Supervised Fine-Tuning: Injecting Search Priors<\/strong><\/h3>\n<p>ASTRO fine-tunes Llama-3.1-70B-Instruct on 36.1K curated CoT solutions from MATH, AMC\/AIME, and AoPS-style datasets. The model trained with ASTRO-SFT achieves:<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>MATH 500<\/strong>: 69.6%<\/li>\n<li><strong>AMC 2023<\/strong>: 51.9%<\/li>\n<li><strong>AIME 2024<\/strong>: 16.3%<\/li>\n<\/ul>\n<p>These scores are competitive with or exceed those of baseline and SPOC\/Step-KTO variants trained without explicit search priors. Importantly, even SFT alone\u2014without reinforcement learning\u2014yields performance boosts by exposing the model to search-structured reasoning data.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img decoding=\"async\" width=\"1024\" height=\"467\" data-attachment-id=\"72430\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/07\/04\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/screenshot-2025-07-04-at-10-18-35-am-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-04-at-10.18.35\u202fAM-1.png\" data-orig-size=\"1562,712\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2025-07-04 at 10.18.35\u202fAM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-04-at-10.18.35\u202fAM-1-300x137.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-04-at-10.18.35\u202fAM-1-1024x467.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-04-at-10.18.35%E2%80%AFAM-1-1024x467.png\" alt=\"\" class=\"wp-image-72430\" \/><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Reinforcement Learning with Search-Aware Initialization<\/strong><\/h3>\n<p>ASTRO proceeds to reinforcement learning (RL) by initializing with the SFT checkpoint and running an RL loop using a modified <strong>Group Relative Policy Optimization (GRPO)<\/strong>. Unlike standard preference-based RL, ASTRO employs <strong>verifiable reward signals<\/strong> (+1 for correct, -1 for incorrect) on 8.7K moderately difficult prompts. During training, the model\u2019s CoT generation grows longer\u2014from ~1.8K to ~6K tokens\u2014demonstrating deeper internal exploration.<\/p>\n<p>The resulting <strong>ASTRO-RL<\/strong> model achieves:<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>MATH 500<\/strong>: <strong>81.8%<\/strong><\/li>\n<li><strong>AMC 2023<\/strong>: <strong>64.4%<\/strong><\/li>\n<li><strong>AIME 2024<\/strong>: <strong>30.0%<\/strong><\/li>\n<\/ul>\n<p>These results rival or exceed models with larger parameter counts and confirm the importance of ASTRO\u2019s search-aware initialization.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Backtracking Behavior Correlates with Reasoning Success<\/strong><\/h3>\n<p>A striking empirical observation is the <strong>positive correlation<\/strong> between backtracking frequency and performance. As training progresses, ASTRO-RL exhibits more self-corrective actions and deeper exploration. Pearson correlation coefficients across benchmarks exceed 0.8, indicating that self-reflection and backtracking are not merely cosmetic behaviors but functionally tied to better accuracy.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Comparative Insights and Broader Impact<\/strong><\/h3>\n<p>Control experiments comparing ASTRO with models trained on direct CoT solutions (no search priors) reveal that even when trained on the <em>same<\/em> problem sets and search trees, ASTRO consistently outperforms. For instance, ASTRO-RL beats Direct-RL by:<\/p>\n<ul class=\"wp-block-list\">\n<li>+2% on MATH 500<\/li>\n<li>+3.9% on AMC 2023<\/li>\n<li>+2.9% on AIME 2024<\/li>\n<\/ul>\n<p>Moreover, ASTRO\u2019s outputs can be visualized as <strong>directed graphs<\/strong>, with nodes as reasoning steps and edges capturing transitions, reflections, and corrections\u2014facilitating better interpretability.<\/p>\n<h3 class=\"wp-block-heading\"><strong>ASTRO Key Takeaways Table<\/strong><\/h3>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img decoding=\"async\" width=\"1024\" height=\"389\" data-attachment-id=\"72424\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/07\/04\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/image-72\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/07\/image.png\" data-orig-size=\"2536,964\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"image\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/07\/image-300x114.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/07\/image-1024x389.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/07\/image-1024x389.png\" alt=\"\" class=\"wp-image-72424\" \/><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Conclusion<\/strong><\/h3>\n<p>ASTRO demonstrates that LLMs like Llama 3 can learn to reason more effectively\u2014not through larger models or longer pretraining, but via principled post-training techniques. By mimicking search algorithms in natural language, ASTRO enables models to <em>think before answering<\/em>, <em>doubt their own steps<\/em>, and <em>correct themselves mid-reasoning<\/em>. This framework sets a new benchmark for fine-tuning open LLMs to approach human-like reasoning through search-inspired behaviors.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the<strong>\u00a0<em><a href=\"https:\/\/arxiv.org\/abs\/2507.00417\" target=\"_blank\" rel=\"noreferrer noopener\">Paper<\/a>.<\/em><\/strong>\u00a0All credit for this research goes to the researchers of this project. Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">100k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.airesearchinsights.com\/subscribe\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>.<\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2025\/07\/04\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/\">Can We Improve Llama 3\u2019s Reasoning Through Post-Training Alone? ASTRO Shows +16% to +20% Benchmark Gains<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Improving the reasoning capabilities of large language models (LLMs) without architectural changes is a core challenge in advancing AI alignment and usability. Researchers at Meta AI and the University of Washington have introduced ASTRO\u2014Autoregressive Search-Taught Reasoner\u2014a novel post-training framework designed to enhance reasoning in Llama-3.1-70B-Instruct. ASTRO is unique in teaching models to perform in-context search, self-reflection, and backtracking, mechanisms often associated with human problem-solving and traditional symbolic search algorithms. Through this approach, ASTRO boosts Llama 3\u2019s math performance on several competitive benchmarks with significant improvements: MATH 500: 65.8% \u279d 81.8% AMC 2023: 37.5% \u279d 64.4% AIME 2024: 10.0% \u279d 30.0% Search-Guided Chain-of-Thought Generation ASTRO\u2019s methodology begins with a Monte Carlo Tree Search (MCTS) over mathematical problem-solving trajectories. This search explores both correct and incorrect reasoning paths. The key innovation is procedure cloning: entire search trees are linearized into long chain-of-thoughts (CoT) that naturally encode both failures and recoveries via self-reflection and backtracking. These linearized traces are rewritten in natural language and used as the basis for supervised fine-tuning (SFT). This results in a model that doesn\u2019t just solve problems step-by-step but reevaluates its trajectory\u2014often backtracking after self-assessment to correct intermediate reasoning mistakes. For instance, the model may interject with phrases like \u201cLet\u2019s go back to where we set up the equation\u201d when its internal confidence drops. Supervised Fine-Tuning: Injecting Search Priors ASTRO fine-tunes Llama-3.1-70B-Instruct on 36.1K curated CoT solutions from MATH, AMC\/AIME, and AoPS-style datasets. The model trained with ASTRO-SFT achieves: MATH 500: 69.6% AMC 2023: 51.9% AIME 2024: 16.3% These scores are competitive with or exceed those of baseline and SPOC\/Step-KTO variants trained without explicit search priors. Importantly, even SFT alone\u2014without reinforcement learning\u2014yields performance boosts by exposing the model to search-structured reasoning data. Reinforcement Learning with Search-Aware Initialization ASTRO proceeds to reinforcement learning (RL) by initializing with the SFT checkpoint and running an RL loop using a modified Group Relative Policy Optimization (GRPO). Unlike standard preference-based RL, ASTRO employs verifiable reward signals (+1 for correct, -1 for incorrect) on 8.7K moderately difficult prompts. During training, the model\u2019s CoT generation grows longer\u2014from ~1.8K to ~6K tokens\u2014demonstrating deeper internal exploration. The resulting ASTRO-RL model achieves: MATH 500: 81.8% AMC 2023: 64.4% AIME 2024: 30.0% These results rival or exceed models with larger parameter counts and confirm the importance of ASTRO\u2019s search-aware initialization. Backtracking Behavior Correlates with Reasoning Success A striking empirical observation is the positive correlation between backtracking frequency and performance. As training progresses, ASTRO-RL exhibits more self-corrective actions and deeper exploration. Pearson correlation coefficients across benchmarks exceed 0.8, indicating that self-reflection and backtracking are not merely cosmetic behaviors but functionally tied to better accuracy. Comparative Insights and Broader Impact Control experiments comparing ASTRO with models trained on direct CoT solutions (no search priors) reveal that even when trained on the same problem sets and search trees, ASTRO consistently outperforms. For instance, ASTRO-RL beats Direct-RL by: +2% on MATH 500 +3.9% on AMC 2023 +2.9% on AIME 2024 Moreover, ASTRO\u2019s outputs can be visualized as directed graphs, with nodes as reasoning steps and edges capturing transitions, reflections, and corrections\u2014facilitating better interpretability. ASTRO Key Takeaways Table Conclusion ASTRO demonstrates that LLMs like Llama 3 can learn to reason more effectively\u2014not through larger models or longer pretraining, but via principled post-training techniques. By mimicking search algorithms in natural language, ASTRO enables models to think before answering, doubt their own steps, and correct themselves mid-reasoning. This framework sets a new benchmark for fine-tuning open LLMs to approach human-like reasoning through search-inspired behaviors. Check out the\u00a0Paper.\u00a0All credit for this research goes to the researchers of this project. Also,\u00a0feel free to follow us on\u00a0Twitter\u00a0and don\u2019t forget to join our\u00a0100k+ ML SubReddit\u00a0and Subscribe to\u00a0our Newsletter. The post Can We Improve Llama 3\u2019s Reasoning Through Post-Training Alone? ASTRO Shows +16% to +20% Benchmark Gains appeared first on MarkTechPost.<\/p>","protected":false},"author":2,"featured_media":23301,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-23300","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Can We Improve Llama 3\u2019s Reasoning Through Post-Training Alone? ASTRO Shows +16% to +20% Benchmark Gains - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/th\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/\" \/>\n<meta property=\"og:locale\" content=\"th_TH\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Can We Improve Llama 3\u2019s Reasoning Through Post-Training Alone? ASTRO Shows +16% to +20% Benchmark Gains - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/th\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-07-05T05:11:27+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 \u0e19\u0e32\u0e17\u0e35\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"Can We Improve Llama 3\u2019s Reasoning Through Post-Training Alone? ASTRO Shows +16% to +20% Benchmark Gains\",\"datePublished\":\"2025-07-05T05:11:27+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/\"},\"wordCount\":639,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-04-at-10.17.45E280AFAM-1-1024x407-rG2rc9.png\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"th\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/\",\"url\":\"https:\/\/youzum.net\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/\",\"name\":\"Can We Improve Llama 3\u2019s Reasoning Through Post-Training Alone? ASTRO Shows +16% to +20% Benchmark Gains - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-04-at-10.17.45E280AFAM-1-1024x407-rG2rc9.png\",\"datePublished\":\"2025-07-05T05:11:27+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/#breadcrumb\"},\"inLanguage\":\"th\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/youzum.net\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/#primaryimage\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-04-at-10.17.45E280AFAM-1-1024x407-rG2rc9.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-04-at-10.17.45E280AFAM-1-1024x407-rG2rc9.png\",\"width\":1024,\"height\":407},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Can We Improve Llama 3\u2019s Reasoning Through Post-Training Alone? ASTRO Shows +16% to +20% Benchmark Gains\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"th\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/th\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Can We Improve Llama 3\u2019s Reasoning Through Post-Training Alone? ASTRO Shows +16% to +20% Benchmark Gains - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/th\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/","og_locale":"th_TH","og_type":"article","og_title":"Can We Improve Llama 3\u2019s Reasoning Through Post-Training Alone? ASTRO Shows +16% to +20% Benchmark Gains - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/th\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2025-07-05T05:11:27+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Written by":"admin NU","Est. reading time":"3 \u0e19\u0e32\u0e17\u0e35"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"Can We Improve Llama 3\u2019s Reasoning Through Post-Training Alone? ASTRO Shows +16% to +20% Benchmark Gains","datePublished":"2025-07-05T05:11:27+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/"},"wordCount":639,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-04-at-10.17.45E280AFAM-1-1024x407-rG2rc9.png","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"th","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/","url":"https:\/\/youzum.net\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/","name":"Can We Improve Llama 3\u2019s Reasoning Through Post-Training Alone? ASTRO Shows +16% to +20% Benchmark Gains - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-04-at-10.17.45E280AFAM-1-1024x407-rG2rc9.png","datePublished":"2025-07-05T05:11:27+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/#breadcrumb"},"inLanguage":"th","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/"]}]},{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/youzum.net\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/#primaryimage","url":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-04-at-10.17.45E280AFAM-1-1024x407-rG2rc9.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-04-at-10.17.45E280AFAM-1-1024x407-rG2rc9.png","width":1024,"height":407},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/can-we-improve-llama-3s-reasoning-through-post-training-alone-astro-shows-16-to-20-benchmark-gains\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"Can We Improve Llama 3\u2019s Reasoning Through Post-Training Alone? ASTRO Shows +16% to +20% Benchmark Gains"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"th"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/th\/members\/adminnu\/"}]}},"rttpg_featured_image_url":{"full":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-04-at-10.17.45E280AFAM-1-1024x407-rG2rc9.png",1024,407,false],"landscape":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-04-at-10.17.45E280AFAM-1-1024x407-rG2rc9.png",1024,407,false],"portraits":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-04-at-10.17.45E280AFAM-1-1024x407-rG2rc9.png",1024,407,false],"thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-04-at-10.17.45E280AFAM-1-1024x407-rG2rc9-150x150.png",150,150,true],"medium":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-04-at-10.17.45E280AFAM-1-1024x407-rG2rc9-300x119.png",300,119,true],"large":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-04-at-10.17.45E280AFAM-1-1024x407-rG2rc9.png",1024,407,false],"1536x1536":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-04-at-10.17.45E280AFAM-1-1024x407-rG2rc9.png",1024,407,false],"2048x2048":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-04-at-10.17.45E280AFAM-1-1024x407-rG2rc9.png",1024,407,false],"trp-custom-language-flag":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-04-at-10.17.45E280AFAM-1-1024x407-rG2rc9-18x7.png",18,7,true],"woocommerce_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-04-at-10.17.45E280AFAM-1-1024x407-rG2rc9-300x300.png",300,300,true],"woocommerce_single":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-04-at-10.17.45E280AFAM-1-1024x407-rG2rc9-600x238.png",600,238,true],"woocommerce_gallery_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/07\/Screenshot-2025-07-04-at-10.17.45E280AFAM-1-1024x407-rG2rc9-100x100.png",100,100,true]},"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/th\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/th\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"Improving the reasoning capabilities of large language models (LLMs) without architectural changes is a core challenge in advancing AI alignment and usability. Researchers at Meta AI and the University of Washington have introduced ASTRO\u2014Autoregressive Search-Taught Reasoner\u2014a novel post-training framework designed to enhance reasoning in Llama-3.1-70B-Instruct. ASTRO is unique in teaching models to perform in-context search,&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts\/23300","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/comments?post=23300"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts\/23300\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/media\/23301"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/media?parent=23300"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/categories?post=23300"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/tags?post=23300"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}