{"id":16046,"date":"2025-06-02T03:46:47","date_gmt":"2025-06-02T03:46:47","guid":{"rendered":"https:\/\/youzum.net\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/"},"modified":"2025-06-02T03:46:47","modified_gmt":"2025-06-02T03:46:47","slug":"enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning","status":"publish","type":"post","link":"https:\/\/youzum.net\/de\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/","title":{"rendered":"Enigmata\u2019s Multi-Stage and Mix-Training Reinforcement Learning Recipe Drives Breakthrough Performance in LLM Puzzle Reasoning"},"content":{"rendered":"<p>Large Reasoning Models (LRMs), trained from LLMs using reinforcement learning (RL), demonstrated great performance in complex reasoning tasks, including mathematics, STEM, and coding. However, existing LRMs face challenges in completing various puzzle tasks that require purely logical reasoning skills, which are easy and obvious for humans. Current methods targeting puzzles focus only on designing benchmarks for evaluation, lacking the training methods and resources for modern LLMs to tackle this challenge. Current puzzle datasets lack diversity and scalability, covering limited puzzle types with little control over generation or difficulty. Moreover, due to the success of the \u201cLLM+RLVR\u201d paradigm, it has become crucial to obtain large, diverse, and challenging sets of verifiable puzzle prompts for training agents.<\/p>\n<p>Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a key method for improving models\u2019 reasoning capabilities, removing the need for reward models by directly assigning rewards based on objectively verifiable answers. Puzzles are particularly well-suited for RLVR. However, most prior RLVR research has overlooked the puzzles\u2019 potential for delivering effective reward signals. In puzzle reasoning of LLMs, existing benchmarks evaluate different types of reasoning, including abstract, deductive, and compositional reasoning. Few benchmarks support scalable generation and difficulty control but lack puzzle diversity. Moreover, the improvement of LLMs\u2019 puzzle-solving abilities mainly falls into two categories: tool integration and RLVR.<\/p>\n<p>Researchers from ByteDance Seed, Fudan University, Tsinghua University, Nanjing University, and Shanghai Jiao Tong University have proposed Enigmata, the first comprehensive toolkit designed for improving LLMs with puzzle reasoning skills. It contains 36 tasks across seven categories, each featuring a generator that produces unlimited examples with controllable difficulty and a rule-based verifier for automatic evaluation. The researchers further developed Enigmata-Eval as a rigorous benchmark and created optimized multi-task RLVR strategies. Puzzle data from Enigmata enhances SoTA performance on advanced math and STEM reasoning tasks like AIME, BeyondAIME, and GPQA when trained on larger models like Seed1.5-Thinking. This shows the generalization benefits of Enigmata.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter is-resized\"><img decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXeSTsYhhWP0wq_0fC37T-lusUpc7ulKYTG_E77XU_jjdgEhq7MRgZWR5Mn2dpgvctiDBDP3fLs6cnqZOgRZbbfK2jd0BaPAddsr-WZ9wW5Oi2864mxLqHMq5dpp-bf_ghLYOLeXuw?key=2D-M9Yfruh7KDe4DF-eCyg\" alt=\"\" \/><\/figure>\n<\/div>\n<p>The Enigmata-Data comprises 36 puzzle tasks organized into 7 primary categories, including Crypto, Arithmetic, Logic, Grid, Graph, Search, and Sequential Puzzle, making it the only dataset having multiple task categories with scalability, automatic verification, and public availability. The data construction follows a three-phase pipeline: Tasks Collection and Design, Auto-Generator and Verifier Development, and Sliding Difficulty Control. Moreover, the Enigmata-Eval is developed by systematically sampling from the broader dataset, aiming to extract 50 instances per difficulty level for each task. The final evaluation set contains 4,758 puzzle instances rather than the theoretical maximum of 5,400, due to inherent constraints, where some tasks generate fewer instances per difficulty level.<\/p>\n<p>The proposed model outperforms most public models on Enigmata-Eval with 32B parameters, showing the effectiveness of the dataset and training recipe. The model stands out on the challenging ARC-AGI benchmark, surpassing strong reasoning models such as Gemini 2.5 Pro, o3-mini, and o1. The Qwen2.5-32B-Enigmata shows outstanding performance in structured reasoning categories, outperforming in Crypto, Arithmetic, and Logic tasks, suggesting effective development of rule-based reasoning capabilities. The model shows competitive performance in search tasks that require strategic exploration and planning capabilities. Moreover, Crypto and Arithmetic tasks tend to provide the highest accuracy, while spatial and sequential tasks remain more difficult.<\/p>\n<p>In this paper, researchers introduced Enigmata, a comprehensive suite for equipping LLMs with advanced puzzle reasoning that integrates seamlessly with RL using verifiable rule-based rewards. The trained Enigmata-Model shows superior performance and robust generalization skills through RLVR training. Experiments reveal that when applied to larger models such as Seed1.5-Thinking (20B\/200B parameters), synthetic puzzle data brings additional benefits in other domains, including mathematics and STEM reasoning over state-of-the-art models. Enigmata provides a solid foundation for the research community to advance reasoning model development, offering a unified framework that effectively bridges logical puzzle-solving with broader reasoning capabilities in LLMs.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p><strong>Check out the\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2505.19914\" target=\"_blank\" rel=\"noreferrer noopener\">Paper<\/a>,\u00a0<a href=\"https:\/\/github.com\/BytedTsinghua-SIA\/Enigmata\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub Page<\/a>\u00a0and\u00a0<a href=\"https:\/\/seed-enigmata.github.io\/\" target=\"_blank\" rel=\"noreferrer noopener\">Project Page<\/a><em>.<\/em><\/strong>\u00a0All credit for this research goes to the researchers of this project. Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">95k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.airesearchinsights.com\/subscribe\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>.<\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2025\/06\/01\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/\">Enigmata\u2019s Multi-Stage and Mix-Training Reinforcement Learning Recipe Drives Breakthrough Performance in LLM Puzzle Reasoning<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Large Reasoning Models (LRMs), trained from LLMs using reinforcement learning (RL), demonstrated great performance in complex reasoning tasks, including mathematics, STEM, and coding. However, existing LRMs face challenges in completing various puzzle tasks that require purely logical reasoning skills, which are easy and obvious for humans. Current methods targeting puzzles focus only on designing benchmarks for evaluation, lacking the training methods and resources for modern LLMs to tackle this challenge. Current puzzle datasets lack diversity and scalability, covering limited puzzle types with little control over generation or difficulty. Moreover, due to the success of the \u201cLLM+RLVR\u201d paradigm, it has become crucial to obtain large, diverse, and challenging sets of verifiable puzzle prompts for training agents. Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a key method for improving models\u2019 reasoning capabilities, removing the need for reward models by directly assigning rewards based on objectively verifiable answers. Puzzles are particularly well-suited for RLVR. However, most prior RLVR research has overlooked the puzzles\u2019 potential for delivering effective reward signals. In puzzle reasoning of LLMs, existing benchmarks evaluate different types of reasoning, including abstract, deductive, and compositional reasoning. Few benchmarks support scalable generation and difficulty control but lack puzzle diversity. Moreover, the improvement of LLMs\u2019 puzzle-solving abilities mainly falls into two categories: tool integration and RLVR. Researchers from ByteDance Seed, Fudan University, Tsinghua University, Nanjing University, and Shanghai Jiao Tong University have proposed Enigmata, the first comprehensive toolkit designed for improving LLMs with puzzle reasoning skills. It contains 36 tasks across seven categories, each featuring a generator that produces unlimited examples with controllable difficulty and a rule-based verifier for automatic evaluation. The researchers further developed Enigmata-Eval as a rigorous benchmark and created optimized multi-task RLVR strategies. Puzzle data from Enigmata enhances SoTA performance on advanced math and STEM reasoning tasks like AIME, BeyondAIME, and GPQA when trained on larger models like Seed1.5-Thinking. This shows the generalization benefits of Enigmata. The Enigmata-Data comprises 36 puzzle tasks organized into 7 primary categories, including Crypto, Arithmetic, Logic, Grid, Graph, Search, and Sequential Puzzle, making it the only dataset having multiple task categories with scalability, automatic verification, and public availability. The data construction follows a three-phase pipeline: Tasks Collection and Design, Auto-Generator and Verifier Development, and Sliding Difficulty Control. Moreover, the Enigmata-Eval is developed by systematically sampling from the broader dataset, aiming to extract 50 instances per difficulty level for each task. The final evaluation set contains 4,758 puzzle instances rather than the theoretical maximum of 5,400, due to inherent constraints, where some tasks generate fewer instances per difficulty level. The proposed model outperforms most public models on Enigmata-Eval with 32B parameters, showing the effectiveness of the dataset and training recipe. The model stands out on the challenging ARC-AGI benchmark, surpassing strong reasoning models such as Gemini 2.5 Pro, o3-mini, and o1. The Qwen2.5-32B-Enigmata shows outstanding performance in structured reasoning categories, outperforming in Crypto, Arithmetic, and Logic tasks, suggesting effective development of rule-based reasoning capabilities. The model shows competitive performance in search tasks that require strategic exploration and planning capabilities. Moreover, Crypto and Arithmetic tasks tend to provide the highest accuracy, while spatial and sequential tasks remain more difficult. In this paper, researchers introduced Enigmata, a comprehensive suite for equipping LLMs with advanced puzzle reasoning that integrates seamlessly with RL using verifiable rule-based rewards. The trained Enigmata-Model shows superior performance and robust generalization skills through RLVR training. Experiments reveal that when applied to larger models such as Seed1.5-Thinking (20B\/200B parameters), synthetic puzzle data brings additional benefits in other domains, including mathematics and STEM reasoning over state-of-the-art models. Enigmata provides a solid foundation for the research community to advance reasoning model development, offering a unified framework that effectively bridges logical puzzle-solving with broader reasoning capabilities in LLMs. Check out the\u00a0Paper,\u00a0GitHub Page\u00a0and\u00a0Project Page.\u00a0All credit for this research goes to the researchers of this project. Also,\u00a0feel free to follow us on\u00a0Twitter\u00a0and don\u2019t forget to join our\u00a095k+ ML SubReddit\u00a0and Subscribe to\u00a0our Newsletter. The post Enigmata\u2019s Multi-Stage and Mix-Training Reinforcement Learning Recipe Drives Breakthrough Performance in LLM Puzzle Reasoning appeared first on MarkTechPost.<\/p>","protected":false},"author":2,"featured_media":16047,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-16046","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Enigmata\u2019s Multi-Stage and Mix-Training Reinforcement Learning Recipe Drives Breakthrough Performance in LLM Puzzle Reasoning - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/de\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Enigmata\u2019s Multi-Stage and Mix-Training Reinforcement Learning Recipe Drives Breakthrough Performance in LLM Puzzle Reasoning - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/de\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-06-02T03:46:47+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/AD_4nXeSTsYhhWP0wq_0fC37T-lusUpc7ulKYTG_E77XU_jjdgEhq7MRgZWR5Mn2dpgvctiDBDP3fLs6cnqZOgRZbbfK2jd0BaPAddsr-WZ9wW5Oi2864mxLqHMq5dpp-bf_ghLYOLeXuw-LZtatO.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1312\" \/>\n\t<meta property=\"og:image:height\" content=\"776\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"3\u00a0Minuten\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"Enigmata\u2019s Multi-Stage and Mix-Training Reinforcement Learning Recipe Drives Breakthrough Performance in LLM Puzzle Reasoning\",\"datePublished\":\"2025-06-02T03:46:47+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/\"},\"wordCount\":700,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/AD_4nXeSTsYhhWP0wq_0fC37T-lusUpc7ulKYTG_E77XU_jjdgEhq7MRgZWR5Mn2dpgvctiDBDP3fLs6cnqZOgRZbbfK2jd0BaPAddsr-WZ9wW5Oi2864mxLqHMq5dpp-bf_ghLYOLeXuw-LZtatO.png\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/\",\"url\":\"https:\/\/youzum.net\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/\",\"name\":\"Enigmata\u2019s Multi-Stage and Mix-Training Reinforcement Learning Recipe Drives Breakthrough Performance in LLM Puzzle Reasoning - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/AD_4nXeSTsYhhWP0wq_0fC37T-lusUpc7ulKYTG_E77XU_jjdgEhq7MRgZWR5Mn2dpgvctiDBDP3fLs6cnqZOgRZbbfK2jd0BaPAddsr-WZ9wW5Oi2864mxLqHMq5dpp-bf_ghLYOLeXuw-LZtatO.png\",\"datePublished\":\"2025-06-02T03:46:47+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/youzum.net\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/#primaryimage\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/AD_4nXeSTsYhhWP0wq_0fC37T-lusUpc7ulKYTG_E77XU_jjdgEhq7MRgZWR5Mn2dpgvctiDBDP3fLs6cnqZOgRZbbfK2jd0BaPAddsr-WZ9wW5Oi2864mxLqHMq5dpp-bf_ghLYOLeXuw-LZtatO.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/AD_4nXeSTsYhhWP0wq_0fC37T-lusUpc7ulKYTG_E77XU_jjdgEhq7MRgZWR5Mn2dpgvctiDBDP3fLs6cnqZOgRZbbfK2jd0BaPAddsr-WZ9wW5Oi2864mxLqHMq5dpp-bf_ghLYOLeXuw-LZtatO.png\",\"width\":1312,\"height\":776},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Enigmata\u2019s Multi-Stage and Mix-Training Reinforcement Learning Recipe Drives Breakthrough Performance in LLM Puzzle Reasoning\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/de\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Enigmata\u2019s Multi-Stage and Mix-Training Reinforcement Learning Recipe Drives Breakthrough Performance in LLM Puzzle Reasoning - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/de\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/","og_locale":"de_DE","og_type":"article","og_title":"Enigmata\u2019s Multi-Stage and Mix-Training Reinforcement Learning Recipe Drives Breakthrough Performance in LLM Puzzle Reasoning - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/de\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2025-06-02T03:46:47+00:00","og_image":[{"width":1312,"height":776,"url":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/AD_4nXeSTsYhhWP0wq_0fC37T-lusUpc7ulKYTG_E77XU_jjdgEhq7MRgZWR5Mn2dpgvctiDBDP3fLs6cnqZOgRZbbfK2jd0BaPAddsr-WZ9wW5Oi2864mxLqHMq5dpp-bf_ghLYOLeXuw-LZtatO.png","type":"image\/png"}],"author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Verfasst von":"admin NU","Gesch\u00e4tzte Lesezeit":"3\u00a0Minuten"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"Enigmata\u2019s Multi-Stage and Mix-Training Reinforcement Learning Recipe Drives Breakthrough Performance in LLM Puzzle Reasoning","datePublished":"2025-06-02T03:46:47+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/"},"wordCount":700,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/AD_4nXeSTsYhhWP0wq_0fC37T-lusUpc7ulKYTG_E77XU_jjdgEhq7MRgZWR5Mn2dpgvctiDBDP3fLs6cnqZOgRZbbfK2jd0BaPAddsr-WZ9wW5Oi2864mxLqHMq5dpp-bf_ghLYOLeXuw-LZtatO.png","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/","url":"https:\/\/youzum.net\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/","name":"Enigmata\u2019s Multi-Stage and Mix-Training Reinforcement Learning Recipe Drives Breakthrough Performance in LLM Puzzle Reasoning - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/AD_4nXeSTsYhhWP0wq_0fC37T-lusUpc7ulKYTG_E77XU_jjdgEhq7MRgZWR5Mn2dpgvctiDBDP3fLs6cnqZOgRZbbfK2jd0BaPAddsr-WZ9wW5Oi2864mxLqHMq5dpp-bf_ghLYOLeXuw-LZtatO.png","datePublished":"2025-06-02T03:46:47+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/youzum.net\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/#primaryimage","url":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/AD_4nXeSTsYhhWP0wq_0fC37T-lusUpc7ulKYTG_E77XU_jjdgEhq7MRgZWR5Mn2dpgvctiDBDP3fLs6cnqZOgRZbbfK2jd0BaPAddsr-WZ9wW5Oi2864mxLqHMq5dpp-bf_ghLYOLeXuw-LZtatO.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/AD_4nXeSTsYhhWP0wq_0fC37T-lusUpc7ulKYTG_E77XU_jjdgEhq7MRgZWR5Mn2dpgvctiDBDP3fLs6cnqZOgRZbbfK2jd0BaPAddsr-WZ9wW5Oi2864mxLqHMq5dpp-bf_ghLYOLeXuw-LZtatO.png","width":1312,"height":776},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/enigmatas-multi-stage-and-mix-training-reinforcement-learning-recipe-drives-breakthrough-performance-in-llm-puzzle-reasoning\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"Enigmata\u2019s Multi-Stage and Mix-Training Reinforcement Learning Recipe Drives Breakthrough Performance in LLM Puzzle Reasoning"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/de\/members\/adminnu\/"}]}},"rttpg_featured_image_url":{"full":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/AD_4nXeSTsYhhWP0wq_0fC37T-lusUpc7ulKYTG_E77XU_jjdgEhq7MRgZWR5Mn2dpgvctiDBDP3fLs6cnqZOgRZbbfK2jd0BaPAddsr-WZ9wW5Oi2864mxLqHMq5dpp-bf_ghLYOLeXuw-LZtatO.png",1312,776,false],"landscape":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/AD_4nXeSTsYhhWP0wq_0fC37T-lusUpc7ulKYTG_E77XU_jjdgEhq7MRgZWR5Mn2dpgvctiDBDP3fLs6cnqZOgRZbbfK2jd0BaPAddsr-WZ9wW5Oi2864mxLqHMq5dpp-bf_ghLYOLeXuw-LZtatO.png",1312,776,false],"portraits":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/AD_4nXeSTsYhhWP0wq_0fC37T-lusUpc7ulKYTG_E77XU_jjdgEhq7MRgZWR5Mn2dpgvctiDBDP3fLs6cnqZOgRZbbfK2jd0BaPAddsr-WZ9wW5Oi2864mxLqHMq5dpp-bf_ghLYOLeXuw-LZtatO.png",1312,776,false],"thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/AD_4nXeSTsYhhWP0wq_0fC37T-lusUpc7ulKYTG_E77XU_jjdgEhq7MRgZWR5Mn2dpgvctiDBDP3fLs6cnqZOgRZbbfK2jd0BaPAddsr-WZ9wW5Oi2864mxLqHMq5dpp-bf_ghLYOLeXuw-LZtatO-150x150.png",150,150,true],"medium":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/AD_4nXeSTsYhhWP0wq_0fC37T-lusUpc7ulKYTG_E77XU_jjdgEhq7MRgZWR5Mn2dpgvctiDBDP3fLs6cnqZOgRZbbfK2jd0BaPAddsr-WZ9wW5Oi2864mxLqHMq5dpp-bf_ghLYOLeXuw-LZtatO-300x177.png",300,177,true],"large":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/AD_4nXeSTsYhhWP0wq_0fC37T-lusUpc7ulKYTG_E77XU_jjdgEhq7MRgZWR5Mn2dpgvctiDBDP3fLs6cnqZOgRZbbfK2jd0BaPAddsr-WZ9wW5Oi2864mxLqHMq5dpp-bf_ghLYOLeXuw-LZtatO-1024x606.png",1024,606,true],"1536x1536":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/AD_4nXeSTsYhhWP0wq_0fC37T-lusUpc7ulKYTG_E77XU_jjdgEhq7MRgZWR5Mn2dpgvctiDBDP3fLs6cnqZOgRZbbfK2jd0BaPAddsr-WZ9wW5Oi2864mxLqHMq5dpp-bf_ghLYOLeXuw-LZtatO.png",1312,776,false],"2048x2048":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/AD_4nXeSTsYhhWP0wq_0fC37T-lusUpc7ulKYTG_E77XU_jjdgEhq7MRgZWR5Mn2dpgvctiDBDP3fLs6cnqZOgRZbbfK2jd0BaPAddsr-WZ9wW5Oi2864mxLqHMq5dpp-bf_ghLYOLeXuw-LZtatO.png",1312,776,false],"trp-custom-language-flag":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/AD_4nXeSTsYhhWP0wq_0fC37T-lusUpc7ulKYTG_E77XU_jjdgEhq7MRgZWR5Mn2dpgvctiDBDP3fLs6cnqZOgRZbbfK2jd0BaPAddsr-WZ9wW5Oi2864mxLqHMq5dpp-bf_ghLYOLeXuw-LZtatO-18x12.png",18,12,true],"woocommerce_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/AD_4nXeSTsYhhWP0wq_0fC37T-lusUpc7ulKYTG_E77XU_jjdgEhq7MRgZWR5Mn2dpgvctiDBDP3fLs6cnqZOgRZbbfK2jd0BaPAddsr-WZ9wW5Oi2864mxLqHMq5dpp-bf_ghLYOLeXuw-LZtatO-300x300.png",300,300,true],"woocommerce_single":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/AD_4nXeSTsYhhWP0wq_0fC37T-lusUpc7ulKYTG_E77XU_jjdgEhq7MRgZWR5Mn2dpgvctiDBDP3fLs6cnqZOgRZbbfK2jd0BaPAddsr-WZ9wW5Oi2864mxLqHMq5dpp-bf_ghLYOLeXuw-LZtatO-600x355.png",600,355,true],"woocommerce_gallery_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/06\/AD_4nXeSTsYhhWP0wq_0fC37T-lusUpc7ulKYTG_E77XU_jjdgEhq7MRgZWR5Mn2dpgvctiDBDP3fLs6cnqZOgRZbbfK2jd0BaPAddsr-WZ9wW5Oi2864mxLqHMq5dpp-bf_ghLYOLeXuw-LZtatO-100x100.png",100,100,true]},"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/de\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/de\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"Large Reasoning Models (LRMs), trained from LLMs using reinforcement learning (RL), demonstrated great performance in complex reasoning tasks, including mathematics, STEM, and coding. However, existing LRMs face challenges in completing various puzzle tasks that require purely logical reasoning skills, which are easy and obvious for humans. Current methods targeting puzzles focus only on designing benchmarks&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts\/16046","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/comments?post=16046"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts\/16046\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/media\/16047"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/media?parent=16046"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/categories?post=16046"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/tags?post=16046"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}