{"id":28286,"date":"2025-07-30T05:48:34","date_gmt":"2025-07-30T05:48:34","guid":{"rendered":"https:\/\/youzum.net\/miromind-m1-advancing-open-source-mathematical-reasoning-via-context-aware-multi-stage-reinforcement-learning\/"},"modified":"2025-07-30T05:48:34","modified_gmt":"2025-07-30T05:48:34","slug":"miromind-m1-advancing-open-source-mathematical-reasoning-via-context-aware-multi-stage-reinforcement-learning","status":"publish","type":"post","link":"https:\/\/youzum.net\/de\/miromind-m1-advancing-open-source-mathematical-reasoning-via-context-aware-multi-stage-reinforcement-learning\/","title":{"rendered":"MiroMind-M1: Advancing Open-Source Mathematical Reasoning via Context-Aware Multi-Stage Reinforcement Learning"},"content":{"rendered":"<p>Large language models (LLMs) have recently demonstrated remarkable progress in multi-step reasoning, establishing mathematical problem-solving as a rigorous benchmark for assessing advanced capabilities. While proprietary models like GPT-4o and Claude Sonnet 4 lead performance, their closed-source nature impedes transparency and reproducibility. Addressing these gaps, <strong>MiroMind AI Released the MiroMind-M1 series, a fully open-source pipeline<\/strong>\u2014spanning datasets, models, training code, and evaluation scripts\u2014that sets new standards for openness and state-of-the-art mathematical reasoning within the Qwen-2.5 model ecosystem.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Architectural Foundation and Motivation<\/strong><\/h3>\n<p><strong>MiroMind-M1<\/strong> is built on the robust Qwen-2.5 backbone, with enhancements geared explicitly for mathematical reasoning. The team adopts a two-stage training protocol:<\/p>\n<ol class=\"wp-block-list\">\n<li><strong>Supervised Fine-Tuning (SFT):<\/strong> The model is fine-tuned on 719K carefully curated and verified mathematical problems, equipping it with strong step-by-step reasoning abilities.<\/li>\n<li><strong>Reinforcement Learning with Verifiable Rewards (RLVR):<\/strong> Next, the model undergoes RL on 62K challenging and rigorously verifiable math problems, leveraging reward signals from a robust external verifier.<\/li>\n<\/ol>\n<p>This approach is motivated by both the need for strong mathematical logic and by the lessons learned from leading RLMs: imitating chain-of-thought exemplars improves general reasoning, while reinforcement learning, guided by precise rewards, further refines accuracy and efficiency.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Data Transparency and Quality<\/strong><\/h3>\n<p>A hallmark of the MiroMind-M1 project is the full openness and cleanliness of its training data:<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>SFT corpus composition:<\/strong> Draws from OpenR1, OpenThoughts, Light-R1, and Synthetic-1, ensuring problems have verified solutions and rich, multi-step reasoning traces.<\/li>\n<li><strong>Stringent deduplication and decontamination:<\/strong> Employs N-gram overlap filtering to eliminate duplication and data leakage with evaluation sets (e.g., AIME24, AIME25, MATH500).<\/li>\n<li><strong>Preference for long trajectories:<\/strong> Experiments show that training on samples with longer reasoning traces consistently yields higher benchmark scores, highlighting the importance of deep semantic content in the reasoning signal.<\/li>\n<\/ul>\n<p>The resulting dataset provides 719K verified training traces\u2014significantly advancing open reproducible research over prior efforts.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Supervised Fine-Tuning: Empirical Excellence<\/strong><\/h3>\n<p>For SFT, MiroMind-SFT-7B is initialized from Qwen2.5-Math-7B and trained with a large context window (max 32,768 tokens) and a no-packing strategy to avoid cross-sample attention contamination. Its performance on key math benchmarks outpaces peer open models:<\/p>\n<figure class=\"wp-block-table\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<th>Model<\/th>\n<th>AIME24<\/th>\n<th>AIME25<\/th>\n<th>MATH500<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>DeepSeek-R1-Distill<\/td>\n<td>55.5<\/td>\n<td>40.4<\/td>\n<td>92.8<\/td>\n<\/tr>\n<tr>\n<td>MiMo-7B-SFT<\/td>\n<td>58.7<\/td>\n<td>44.3<\/td>\n<td>93.0<\/td>\n<\/tr>\n<tr>\n<td><strong>MiroMind-SFT-7B<\/strong><\/td>\n<td>60.4<\/td>\n<td>45.0<\/td>\n<td>94.6<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<p>These results validate the efficacy of the data curation and training design: richer, deeper samples and no-packing lead to consistently superior performance.<\/p>\n<h3 class=\"wp-block-heading\"><strong>CAMPO: Context-Aware Multi-Stage Policy Optimization<\/strong><\/h3>\n<p>A key innovation in MiroMind-M1\u2019s RLVR phase is the <strong>CAMPO<\/strong> algorithm. CAMPO addresses two critical RL challenges\u2014training instability and token inefficiency\u2014by:<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Multi-stage training with expanding context limits:<\/strong> Training starts with constrained output lengths (e.g., 16K tokens), then gradually increases to allow deeper reasoning, balancing efficiency and thoroughness.<\/li>\n<li><strong>Dynamic repetition penalty:<\/strong> A dedicated repetition critic penalizes outputs exhibiting early or excessive repetition, preventing utility collapse and enforcing output diversity.<\/li>\n<li><strong>Accurate external verifier:<\/strong> The reward feedback system is substantially improved to robustly score math answers (including tricky cases with units, \u03c0, and percentages), ensuring training signals are tightly aligned with true correctness.<\/li>\n<\/ul>\n<p>CAMPO not only stabilizes RL dynamics but also results in models that solve problems with fewer, more relevant tokens\u2014accelerating inference and reducing costs without sacrificing accuracy.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Benchmark Performance: State-of-the-Art Efficiency<\/strong><\/h3>\n<p>MiroMind\u2019s open models achieve highly competitive or state-of-the-art results for open Qwen-2.5-based math models (7B\/32B parameters):<\/p>\n<figure class=\"wp-block-table\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<th>Model<\/th>\n<th>AIME24<\/th>\n<th>AIME25<\/th>\n<th>MATH500<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>DeepSeek-R1-7B<\/td>\n<td>55.5<\/td>\n<td>39.2<\/td>\n<td>\u2013<\/td>\n<\/tr>\n<tr>\n<td>MiMo-7B-RL<\/td>\n<td>68.2<\/td>\n<td>55.4<\/td>\n<td>95.8<\/td>\n<\/tr>\n<tr>\n<td>Skywork-OR1-7B<\/td>\n<td>72.2<\/td>\n<td>54.6<\/td>\n<td>\u2013<\/td>\n<\/tr>\n<tr>\n<td><strong>MiroMind-RL-7B<\/strong><\/td>\n<td>73.4<\/td>\n<td>57.8<\/td>\n<td>96.7<\/td>\n<\/tr>\n<tr>\n<td>Skywork-OR1-32B<\/td>\n<td>77.1<\/td>\n<td>68.2<\/td>\n<td>97.5<\/td>\n<\/tr>\n<tr>\n<td><strong>MiroMind-RL-32B<\/strong><\/td>\n<td>77.5<\/td>\n<td>65.6<\/td>\n<td>96.4<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<p>Notably, MiroMind-M1-RL models not only match or exceed peer accuracy, but do so with greater token efficiency\u2014the 32B model produces shorter, more concise solutions without loss of correctness, thanks to CAMPO\u2019s training.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Full Stack and Reproducibility<\/strong><\/h3>\n<p>Every component of the MiroMind-M1 stack is openly released:<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Model weights<\/strong> (SFT and RL checkpoints for both 7B and 32B scales)<\/li>\n<li><strong>Datasets<\/strong> (full 719K SFT, 62K RLVR)<\/li>\n<li><strong>Training scripts<\/strong> (supporting multi-node distributed training on Ray)<\/li>\n<li><strong>Evaluation code<\/strong> (standardized scripts and benchmark configs)<\/li>\n<\/ul>\n<p>Researchers can replicate, audit, and extend MiroMind-M1 from raw data to trained models, advancing reproducibility and accelerating new open research.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Conclusion<\/strong><\/h3>\n<p>MiroMind-M1 demonstrates that with careful data curation, innovative RL algorithms (CAMPO), and radical transparency, open-source language models can rival proprietary systems in advanced mathematical reasoning. This project sets a new bar for reproducibility and collaborative advancement in reasoning LLMs, providing both a high-quality resource and a robust platform for future innovation.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/arxiv.org\/abs\/2507.14683\" target=\"_blank\" rel=\"noreferrer noopener\">Paper<\/a>, <a href=\"https:\/\/github.com\/MiroMindAsia\/MiroMind-M1\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub Page<\/a> and <a href=\"https:\/\/huggingface.co\/miromind-ai\/MiroMind-M1-RL-7B\" target=\"_blank\" rel=\"noreferrer noopener\">Model on Hugging Face<\/a><em>.<\/em><\/strong>\u00a0All credit for this research goes to the researchers of this project. Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">100k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>.<\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2025\/07\/29\/miromind-m1-advancing-open-source-mathematical-reasoning-via-context-aware-multi-stage-reinforcement-learning\/\">MiroMind-M1: Advancing Open-Source Mathematical Reasoning via Context-Aware Multi-Stage Reinforcement Learning<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Large language models (LLMs) have recently demonstrated remarkable progress in multi-step reasoning, establishing mathematical problem-solving as a rigorous benchmark for assessing advanced capabilities. While proprietary models like GPT-4o and Claude Sonnet 4 lead performance, their closed-source nature impedes transparency and reproducibility. Addressing these gaps, MiroMind AI Released the MiroMind-M1 series, a fully open-source pipeline\u2014spanning datasets, models, training code, and evaluation scripts\u2014that sets new standards for openness and state-of-the-art mathematical reasoning within the Qwen-2.5 model ecosystem. Architectural Foundation and Motivation MiroMind-M1 is built on the robust Qwen-2.5 backbone, with enhancements geared explicitly for mathematical reasoning. The team adopts a two-stage training protocol: Supervised Fine-Tuning (SFT): The model is fine-tuned on 719K carefully curated and verified mathematical problems, equipping it with strong step-by-step reasoning abilities. Reinforcement Learning with Verifiable Rewards (RLVR): Next, the model undergoes RL on 62K challenging and rigorously verifiable math problems, leveraging reward signals from a robust external verifier. This approach is motivated by both the need for strong mathematical logic and by the lessons learned from leading RLMs: imitating chain-of-thought exemplars improves general reasoning, while reinforcement learning, guided by precise rewards, further refines accuracy and efficiency. Data Transparency and Quality A hallmark of the MiroMind-M1 project is the full openness and cleanliness of its training data: SFT corpus composition: Draws from OpenR1, OpenThoughts, Light-R1, and Synthetic-1, ensuring problems have verified solutions and rich, multi-step reasoning traces. Stringent deduplication and decontamination: Employs N-gram overlap filtering to eliminate duplication and data leakage with evaluation sets (e.g., AIME24, AIME25, MATH500). Preference for long trajectories: Experiments show that training on samples with longer reasoning traces consistently yields higher benchmark scores, highlighting the importance of deep semantic content in the reasoning signal. The resulting dataset provides 719K verified training traces\u2014significantly advancing open reproducible research over prior efforts. Supervised Fine-Tuning: Empirical Excellence For SFT, MiroMind-SFT-7B is initialized from Qwen2.5-Math-7B and trained with a large context window (max 32,768 tokens) and a no-packing strategy to avoid cross-sample attention contamination. Its performance on key math benchmarks outpaces peer open models: Model AIME24 AIME25 MATH500 DeepSeek-R1-Distill 55.5 40.4 92.8 MiMo-7B-SFT 58.7 44.3 93.0 MiroMind-SFT-7B 60.4 45.0 94.6 These results validate the efficacy of the data curation and training design: richer, deeper samples and no-packing lead to consistently superior performance. CAMPO: Context-Aware Multi-Stage Policy Optimization A key innovation in MiroMind-M1\u2019s RLVR phase is the CAMPO algorithm. CAMPO addresses two critical RL challenges\u2014training instability and token inefficiency\u2014by: Multi-stage training with expanding context limits: Training starts with constrained output lengths (e.g., 16K tokens), then gradually increases to allow deeper reasoning, balancing efficiency and thoroughness. Dynamic repetition penalty: A dedicated repetition critic penalizes outputs exhibiting early or excessive repetition, preventing utility collapse and enforcing output diversity. Accurate external verifier: The reward feedback system is substantially improved to robustly score math answers (including tricky cases with units, \u03c0, and percentages), ensuring training signals are tightly aligned with true correctness. CAMPO not only stabilizes RL dynamics but also results in models that solve problems with fewer, more relevant tokens\u2014accelerating inference and reducing costs without sacrificing accuracy. Benchmark Performance: State-of-the-Art Efficiency MiroMind\u2019s open models achieve highly competitive or state-of-the-art results for open Qwen-2.5-based math models (7B\/32B parameters): Model AIME24 AIME25 MATH500 DeepSeek-R1-7B 55.5 39.2 \u2013 MiMo-7B-RL 68.2 55.4 95.8 Skywork-OR1-7B 72.2 54.6 \u2013 MiroMind-RL-7B 73.4 57.8 96.7 Skywork-OR1-32B 77.1 68.2 97.5 MiroMind-RL-32B 77.5 65.6 96.4 Notably, MiroMind-M1-RL models not only match or exceed peer accuracy, but do so with greater token efficiency\u2014the 32B model produces shorter, more concise solutions without loss of correctness, thanks to CAMPO\u2019s training. Full Stack and Reproducibility Every component of the MiroMind-M1 stack is openly released: Model weights (SFT and RL checkpoints for both 7B and 32B scales) Datasets (full 719K SFT, 62K RLVR) Training scripts (supporting multi-node distributed training on Ray) Evaluation code (standardized scripts and benchmark configs) Researchers can replicate, audit, and extend MiroMind-M1 from raw data to trained models, advancing reproducibility and accelerating new open research. Conclusion MiroMind-M1 demonstrates that with careful data curation, innovative RL algorithms (CAMPO), and radical transparency, open-source language models can rival proprietary systems in advanced mathematical reasoning. This project sets a new bar for reproducibility and collaborative advancement in reasoning LLMs, providing both a high-quality resource and a robust platform for future innovation. Check out the\u00a0Paper, GitHub Page and Model on Hugging Face.\u00a0All credit for this research goes to the researchers of this project. Also,\u00a0feel free to follow us on\u00a0Twitter\u00a0and don\u2019t forget to join our\u00a0100k+ ML SubReddit\u00a0and Subscribe to\u00a0our Newsletter. The post MiroMind-M1: Advancing Open-Source Mathematical Reasoning via Context-Aware Multi-Stage Reinforcement Learning appeared first on MarkTechPost.<\/p>","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-28286","post","type-post","status-publish","format-standard","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>MiroMind-M1: Advancing Open-Source Mathematical Reasoning via Context-Aware Multi-Stage Reinforcement Learning - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/de\/miromind-m1-advancing-open-source-mathematical-reasoning-via-context-aware-multi-stage-reinforcement-learning\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"MiroMind-M1: Advancing Open-Source Mathematical Reasoning via Context-Aware Multi-Stage Reinforcement Learning - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/de\/miromind-m1-advancing-open-source-mathematical-reasoning-via-context-aware-multi-stage-reinforcement-learning\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-07-30T05:48:34+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"4\u00a0Minuten\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/miromind-m1-advancing-open-source-mathematical-reasoning-via-context-aware-multi-stage-reinforcement-learning\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/miromind-m1-advancing-open-source-mathematical-reasoning-via-context-aware-multi-stage-reinforcement-learning\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"MiroMind-M1: Advancing Open-Source Mathematical Reasoning via Context-Aware Multi-Stage Reinforcement Learning\",\"datePublished\":\"2025-07-30T05:48:34+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/miromind-m1-advancing-open-source-mathematical-reasoning-via-context-aware-multi-stage-reinforcement-learning\/\"},\"wordCount\":774,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/miromind-m1-advancing-open-source-mathematical-reasoning-via-context-aware-multi-stage-reinforcement-learning\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/miromind-m1-advancing-open-source-mathematical-reasoning-via-context-aware-multi-stage-reinforcement-learning\/\",\"url\":\"https:\/\/youzum.net\/miromind-m1-advancing-open-source-mathematical-reasoning-via-context-aware-multi-stage-reinforcement-learning\/\",\"name\":\"MiroMind-M1: Advancing Open-Source Mathematical Reasoning via Context-Aware Multi-Stage Reinforcement Learning - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"datePublished\":\"2025-07-30T05:48:34+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/miromind-m1-advancing-open-source-mathematical-reasoning-via-context-aware-multi-stage-reinforcement-learning\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/miromind-m1-advancing-open-source-mathematical-reasoning-via-context-aware-multi-stage-reinforcement-learning\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/miromind-m1-advancing-open-source-mathematical-reasoning-via-context-aware-multi-stage-reinforcement-learning\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"MiroMind-M1: Advancing Open-Source Mathematical Reasoning via Context-Aware Multi-Stage Reinforcement Learning\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/de\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"MiroMind-M1: Advancing Open-Source Mathematical Reasoning via Context-Aware Multi-Stage Reinforcement Learning - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/de\/miromind-m1-advancing-open-source-mathematical-reasoning-via-context-aware-multi-stage-reinforcement-learning\/","og_locale":"de_DE","og_type":"article","og_title":"MiroMind-M1: Advancing Open-Source Mathematical Reasoning via Context-Aware Multi-Stage Reinforcement Learning - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/de\/miromind-m1-advancing-open-source-mathematical-reasoning-via-context-aware-multi-stage-reinforcement-learning\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2025-07-30T05:48:34+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Verfasst von":"admin NU","Gesch\u00e4tzte Lesezeit":"4\u00a0Minuten"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/miromind-m1-advancing-open-source-mathematical-reasoning-via-context-aware-multi-stage-reinforcement-learning\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/miromind-m1-advancing-open-source-mathematical-reasoning-via-context-aware-multi-stage-reinforcement-learning\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"MiroMind-M1: Advancing Open-Source Mathematical Reasoning via Context-Aware Multi-Stage Reinforcement Learning","datePublished":"2025-07-30T05:48:34+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/miromind-m1-advancing-open-source-mathematical-reasoning-via-context-aware-multi-stage-reinforcement-learning\/"},"wordCount":774,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/miromind-m1-advancing-open-source-mathematical-reasoning-via-context-aware-multi-stage-reinforcement-learning\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/miromind-m1-advancing-open-source-mathematical-reasoning-via-context-aware-multi-stage-reinforcement-learning\/","url":"https:\/\/youzum.net\/miromind-m1-advancing-open-source-mathematical-reasoning-via-context-aware-multi-stage-reinforcement-learning\/","name":"MiroMind-M1: Advancing Open-Source Mathematical Reasoning via Context-Aware Multi-Stage Reinforcement Learning - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"datePublished":"2025-07-30T05:48:34+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/miromind-m1-advancing-open-source-mathematical-reasoning-via-context-aware-multi-stage-reinforcement-learning\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/miromind-m1-advancing-open-source-mathematical-reasoning-via-context-aware-multi-stage-reinforcement-learning\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/miromind-m1-advancing-open-source-mathematical-reasoning-via-context-aware-multi-stage-reinforcement-learning\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"MiroMind-M1: Advancing Open-Source Mathematical Reasoning via Context-Aware Multi-Stage Reinforcement Learning"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/de\/members\/adminnu\/"}]}},"rttpg_featured_image_url":null,"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/de\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/de\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"Large language models (LLMs) have recently demonstrated remarkable progress in multi-step reasoning, establishing mathematical problem-solving as a rigorous benchmark for assessing advanced capabilities. While proprietary models like GPT-4o and Claude Sonnet 4 lead performance, their closed-source nature impedes transparency and reproducibility. Addressing these gaps, MiroMind AI Released the MiroMind-M1 series, a fully open-source pipeline\u2014spanning datasets,&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts\/28286","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/comments?post=28286"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts\/28286\/revisions"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/media?parent=28286"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/categories?post=28286"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/tags?post=28286"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}