{"id":15623,"date":"2025-05-30T03:51:13","date_gmt":"2025-05-30T03:51:13","guid":{"rendered":"https:\/\/youzum.net\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/"},"modified":"2025-05-30T03:51:13","modified_gmt":"2025-05-30T03:51:13","slug":"apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy","status":"publish","type":"post","link":"https:\/\/youzum.net\/th\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/","title":{"rendered":"Apple and Duke Researchers Present a Reinforcement Learning Approach That Enables LLMs to Provide Intermediate Answers, Enhancing Speed and Accuracy"},"content":{"rendered":"<p>Long CoT reasoning improves large language models\u2019 performance on complex tasks but comes with drawbacks. The typical \u201cthink-then-answer\u201d method slows down response times, disrupting real-time interactions like those in chatbots. It also risks inaccuracies, as errors in earlier reasoning steps can lead to a misleading final answer. Unlike humans, who often share partial thoughts or conclusions during conversations, LLMs delay responses until all reasoning is complete. While RL is commonly used to train reasoning models, it mainly rewards final answers, overlooking useful intermediate insights. There is growing interest in teaching models that alternate between thinking and answering, but this remains a challenge.\u00a0<\/p>\n<p>RL has become a popular method to enhance reasoning in LLMs, building on its success in aligning models with human preferences. Two common reward types guide RL: outcome-based rewards (ORM), which focus on the final answer, and process-based rewards (PRM), which provide feedback on intermediate reasoning steps. While PRMs offer more detailed supervision, they often rely on human annotation and additional models, making them complex and prone to issues like reward hacking. Separately, efforts to improve LLM reasoning have explored prompting strategies, structured reasoning, tool integration, and methods to reduce latency and improve efficiency.\u00a0<\/p>\n<p>Researchers from Apple and Duke University introduce Interleaved Reasoning, a new RL approach that enables language models to alternate between thinking and answering when solving complex, multi-step questions. Instead of waiting until the end to respond, models provide informative intermediate answers, which improves feedback for users and guides their reasoning. Using a straightforward rule-based reward, the model is trained to produce helpful reasoning steps, leading to over 80% faster responses and up to 19.3% better accuracy. Trained only on QA and logic datasets, the method demonstrates strong generalization to more challenging benchmarks, such as MATH, GPQA, and MMLU.\u00a0<\/p>\n<p>The study proposes a reinforcement learning framework to train LLMs for Interleaved Reasoning, where models alternate between internal thinking and user-facing intermediate answers. Each intermediate step, or \u201csub-answer,\u201d is shared once the model reaches a meaningful milestone in reasoning. A specialized training template with &lt;think&gt; and &lt;answer&gt; tags is used. The approach utilizes rule-based rewards\u2014specifically, format, final accuracy, and conditional intermediate accuracy\u2014to guide learning. Notably, intermediate rewards are applied only when specific criteria are met, ensuring the model prioritizes overall correctness. They also test different reward schemes, such as all-or-none, partial credit, and time-discounted rewards, to optimize the quality of reasoning.\u00a0<\/p>\n<p>The interleaved reasoning approach was evaluated on both familiar and unfamiliar datasets using Qwen2.5 models (1.5B and 7B). Unlike traditional methods that separate thinking and answering, the interleaved method provides answers incrementally, improving both speed and usefulness. When combined with intermediate rewards, it significantly enhances model performance while reducing response delays by over 80%. Even without exposure to new domains during training, the model adapts well, showing strong generalization. These results highlight the value of interleaved reasoning in making AI systems more responsive and effective in real-world, multi-step reasoning tasks.\u00a0<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"556\" data-attachment-id=\"71670\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/05\/29\/apple-and-duke-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/screenshot-2025-05-29-at-7-57-11-pm\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/05\/Screenshot-2025-05-29-at-7.57.11\u202fPM.png\" data-orig-size=\"1742,946\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2025-05-29 at 7.57.11\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/05\/Screenshot-2025-05-29-at-7.57.11\u202fPM-300x163.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/05\/Screenshot-2025-05-29-at-7.57.11\u202fPM-1024x556.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/05\/Screenshot-2025-05-29-at-7.57.11%E2%80%AFPM-1024x556.png\" alt=\"\" class=\"wp-image-71670\" \/><\/figure>\n<\/div>\n<p>In conclusion, the study explores how interleaved reasoning\u2014where models alternate between reasoning and generating intermediate answers\u2014can significantly improve performance and responsiveness. Using the Qwen2.5-1.5B model, the authors show that providing timely intermediate feedback during training boosts accuracy and accelerates response generation. Different RL strategies were tested, with PPO showing stable results, and conditional, time-discounted rewards proving to be the most effective. The method scales well to complex tasks and outperforms traditional think-then-answer baselines. Unlike token-level reward models, this approach employs simple rule-based rewards after completing full reasoning steps, thereby avoiding reward hacking. Ultimately, interleaved reasoning enhances reasoning quality and efficiency without relying on external tools.\u00a0<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p><strong>Check out the <a href=\"https:\/\/arxiv.org\/abs\/2505.19640\" target=\"_blank\" rel=\"noreferrer noopener\">Paper<em>.<\/em><\/a><\/strong>\u00a0All credit for this research goes to the researchers of this project. Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">95k+ ML SubReddit<\/a><\/strong> and Subscribe to <strong><a href=\"https:\/\/www.airesearchinsights.com\/subscribe\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>.<\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2025\/05\/29\/apple-and-duke-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/\">Apple and Duke Researchers Present a Reinforcement Learning Approach That Enables LLMs to Provide Intermediate Answers, Enhancing Speed and Accuracy<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Long CoT reasoning improves large language models\u2019 performance on complex tasks but comes with drawbacks. The typical \u201cthink-then-answer\u201d method slows down response times, disrupting real-time interactions like those in chatbots. It also risks inaccuracies, as errors in earlier reasoning steps can lead to a misleading final answer. Unlike humans, who often share partial thoughts or conclusions during conversations, LLMs delay responses until all reasoning is complete. While RL is commonly used to train reasoning models, it mainly rewards final answers, overlooking useful intermediate insights. There is growing interest in teaching models that alternate between thinking and answering, but this remains a challenge.\u00a0 RL has become a popular method to enhance reasoning in LLMs, building on its success in aligning models with human preferences. Two common reward types guide RL: outcome-based rewards (ORM), which focus on the final answer, and process-based rewards (PRM), which provide feedback on intermediate reasoning steps. While PRMs offer more detailed supervision, they often rely on human annotation and additional models, making them complex and prone to issues like reward hacking. Separately, efforts to improve LLM reasoning have explored prompting strategies, structured reasoning, tool integration, and methods to reduce latency and improve efficiency.\u00a0 Researchers from Apple and Duke University introduce Interleaved Reasoning, a new RL approach that enables language models to alternate between thinking and answering when solving complex, multi-step questions. Instead of waiting until the end to respond, models provide informative intermediate answers, which improves feedback for users and guides their reasoning. Using a straightforward rule-based reward, the model is trained to produce helpful reasoning steps, leading to over 80% faster responses and up to 19.3% better accuracy. Trained only on QA and logic datasets, the method demonstrates strong generalization to more challenging benchmarks, such as MATH, GPQA, and MMLU.\u00a0 The study proposes a reinforcement learning framework to train LLMs for Interleaved Reasoning, where models alternate between internal thinking and user-facing intermediate answers. Each intermediate step, or \u201csub-answer,\u201d is shared once the model reaches a meaningful milestone in reasoning. A specialized training template with &lt;think&gt; and &lt;answer&gt; tags is used. The approach utilizes rule-based rewards\u2014specifically, format, final accuracy, and conditional intermediate accuracy\u2014to guide learning. Notably, intermediate rewards are applied only when specific criteria are met, ensuring the model prioritizes overall correctness. They also test different reward schemes, such as all-or-none, partial credit, and time-discounted rewards, to optimize the quality of reasoning.\u00a0 The interleaved reasoning approach was evaluated on both familiar and unfamiliar datasets using Qwen2.5 models (1.5B and 7B). Unlike traditional methods that separate thinking and answering, the interleaved method provides answers incrementally, improving both speed and usefulness. When combined with intermediate rewards, it significantly enhances model performance while reducing response delays by over 80%. Even without exposure to new domains during training, the model adapts well, showing strong generalization. These results highlight the value of interleaved reasoning in making AI systems more responsive and effective in real-world, multi-step reasoning tasks.\u00a0 In conclusion, the study explores how interleaved reasoning\u2014where models alternate between reasoning and generating intermediate answers\u2014can significantly improve performance and responsiveness. Using the Qwen2.5-1.5B model, the authors show that providing timely intermediate feedback during training boosts accuracy and accelerates response generation. Different RL strategies were tested, with PPO showing stable results, and conditional, time-discounted rewards proving to be the most effective. The method scales well to complex tasks and outperforms traditional think-then-answer baselines. Unlike token-level reward models, this approach employs simple rule-based rewards after completing full reasoning steps, thereby avoiding reward hacking. Ultimately, interleaved reasoning enhances reasoning quality and efficiency without relying on external tools.\u00a0 Check out the Paper.\u00a0All credit for this research goes to the researchers of this project. Also,\u00a0feel free to follow us on\u00a0Twitter\u00a0and don\u2019t forget to join our\u00a095k+ ML SubReddit and Subscribe to our Newsletter. The post Apple and Duke Researchers Present a Reinforcement Learning Approach That Enables LLMs to Provide Intermediate Answers, Enhancing Speed and Accuracy appeared first on MarkTechPost.<\/p>","protected":false},"author":2,"featured_media":15624,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-15623","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Apple and Duke Researchers Present a Reinforcement Learning Approach That Enables LLMs to Provide Intermediate Answers, Enhancing Speed and Accuracy - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/th\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/\" \/>\n<meta property=\"og:locale\" content=\"th_TH\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Apple and Duke Researchers Present a Reinforcement Learning Approach That Enables LLMs to Provide Intermediate Answers, Enhancing Speed and Accuracy - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/th\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-05-30T03:51:13+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/Screenshot-2025-05-29-at-7.57.11E280AFPM-1024x556-SEJs0U.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1024\" \/>\n\t<meta property=\"og:image:height\" content=\"556\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 \u0e19\u0e32\u0e17\u0e35\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"Apple and Duke Researchers Present a Reinforcement Learning Approach That Enables LLMs to Provide Intermediate Answers, Enhancing Speed and Accuracy\",\"datePublished\":\"2025-05-30T03:51:13+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/\"},\"wordCount\":682,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/Screenshot-2025-05-29-at-7.57.11E280AFPM-1024x556-SEJs0U.png\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"th\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/\",\"url\":\"https:\/\/youzum.net\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/\",\"name\":\"Apple and Duke Researchers Present a Reinforcement Learning Approach That Enables LLMs to Provide Intermediate Answers, Enhancing Speed and Accuracy - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/Screenshot-2025-05-29-at-7.57.11E280AFPM-1024x556-SEJs0U.png\",\"datePublished\":\"2025-05-30T03:51:13+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/#breadcrumb\"},\"inLanguage\":\"th\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/youzum.net\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/#primaryimage\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/Screenshot-2025-05-29-at-7.57.11E280AFPM-1024x556-SEJs0U.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/Screenshot-2025-05-29-at-7.57.11E280AFPM-1024x556-SEJs0U.png\",\"width\":1024,\"height\":556},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Apple and Duke Researchers Present a Reinforcement Learning Approach That Enables LLMs to Provide Intermediate Answers, Enhancing Speed and Accuracy\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"th\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/th\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Apple and Duke Researchers Present a Reinforcement Learning Approach That Enables LLMs to Provide Intermediate Answers, Enhancing Speed and Accuracy - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/th\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/","og_locale":"th_TH","og_type":"article","og_title":"Apple and Duke Researchers Present a Reinforcement Learning Approach That Enables LLMs to Provide Intermediate Answers, Enhancing Speed and Accuracy - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/th\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2025-05-30T03:51:13+00:00","og_image":[{"width":1024,"height":556,"url":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/Screenshot-2025-05-29-at-7.57.11E280AFPM-1024x556-SEJs0U.png","type":"image\/png"}],"author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Written by":"admin NU","Est. reading time":"3 \u0e19\u0e32\u0e17\u0e35"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"Apple and Duke Researchers Present a Reinforcement Learning Approach That Enables LLMs to Provide Intermediate Answers, Enhancing Speed and Accuracy","datePublished":"2025-05-30T03:51:13+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/"},"wordCount":682,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/Screenshot-2025-05-29-at-7.57.11E280AFPM-1024x556-SEJs0U.png","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"th","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/","url":"https:\/\/youzum.net\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/","name":"Apple and Duke Researchers Present a Reinforcement Learning Approach That Enables LLMs to Provide Intermediate Answers, Enhancing Speed and Accuracy - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/Screenshot-2025-05-29-at-7.57.11E280AFPM-1024x556-SEJs0U.png","datePublished":"2025-05-30T03:51:13+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/#breadcrumb"},"inLanguage":"th","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/"]}]},{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/youzum.net\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/#primaryimage","url":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/Screenshot-2025-05-29-at-7.57.11E280AFPM-1024x556-SEJs0U.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/Screenshot-2025-05-29-at-7.57.11E280AFPM-1024x556-SEJs0U.png","width":1024,"height":556},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/apple-and-duke-researchers-present-a-reinforcement-learning-approach-that-enables-llms-to-provide-intermediate-answers-enhancing-speed-and-accuracy\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"Apple and Duke Researchers Present a Reinforcement Learning Approach That Enables LLMs to Provide Intermediate Answers, Enhancing Speed and Accuracy"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"th"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/th\/members\/adminnu\/"}]}},"rttpg_featured_image_url":{"full":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/Screenshot-2025-05-29-at-7.57.11E280AFPM-1024x556-SEJs0U.png",1024,556,false],"landscape":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/Screenshot-2025-05-29-at-7.57.11E280AFPM-1024x556-SEJs0U.png",1024,556,false],"portraits":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/Screenshot-2025-05-29-at-7.57.11E280AFPM-1024x556-SEJs0U.png",1024,556,false],"thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/Screenshot-2025-05-29-at-7.57.11E280AFPM-1024x556-SEJs0U-150x150.png",150,150,true],"medium":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/Screenshot-2025-05-29-at-7.57.11E280AFPM-1024x556-SEJs0U-300x163.png",300,163,true],"large":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/Screenshot-2025-05-29-at-7.57.11E280AFPM-1024x556-SEJs0U.png",1024,556,false],"1536x1536":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/Screenshot-2025-05-29-at-7.57.11E280AFPM-1024x556-SEJs0U.png",1024,556,false],"2048x2048":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/Screenshot-2025-05-29-at-7.57.11E280AFPM-1024x556-SEJs0U.png",1024,556,false],"trp-custom-language-flag":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/Screenshot-2025-05-29-at-7.57.11E280AFPM-1024x556-SEJs0U-18x10.png",18,10,true],"woocommerce_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/Screenshot-2025-05-29-at-7.57.11E280AFPM-1024x556-SEJs0U-300x300.png",300,300,true],"woocommerce_single":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/Screenshot-2025-05-29-at-7.57.11E280AFPM-1024x556-SEJs0U-600x326.png",600,326,true],"woocommerce_gallery_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/05\/Screenshot-2025-05-29-at-7.57.11E280AFPM-1024x556-SEJs0U-100x100.png",100,100,true]},"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/th\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/th\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"Long CoT reasoning improves large language models\u2019 performance on complex tasks but comes with drawbacks. The typical \u201cthink-then-answer\u201d method slows down response times, disrupting real-time interactions like those in chatbots. It also risks inaccuracies, as errors in earlier reasoning steps can lead to a misleading final answer. Unlike humans, who often share partial thoughts or&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts\/15623","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/comments?post=15623"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts\/15623\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/media\/15624"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/media?parent=15623"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/categories?post=15623"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/tags?post=15623"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}