{"id":30482,"date":"2025-08-09T05:58:36","date_gmt":"2025-08-09T05:58:36","guid":{"rendered":"https:\/\/youzum.net\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/"},"modified":"2025-08-09T05:58:36","modified_gmt":"2025-08-09T05:58:36","slug":"vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning","status":"publish","type":"post","link":"https:\/\/youzum.net\/de\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/","title":{"rendered":"VL-Cogito: Advancing Multimodal Reasoning with Progressive Curriculum Reinforcement Learning"},"content":{"rendered":"<p>Multimodal reasoning, where models integrate and interpret information from multiple sources such as text, images, and diagrams, is a frontier challenge in AI. VL-Cogito is a state-of-the-art Multimodal <a href=\"https:\/\/www.marktechpost.com\/2025\/01\/11\/what-are-large-language-model-llms\/\" target=\"_blank\">Large Language Model<\/a> (MLLM) proposed by DAMO Academy (Alibaba Group) and partners, introducing a robust reinforcement learning pipeline that fundamentally upgrades the reasoning skills of large models across mathematics, science, logic, charts, and general understanding.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Core Innovations<\/strong><\/h3>\n<p>VL-Cogito\u2019s unique approach centers around the <strong>Progressive Curriculum Reinforcement Learning (PCuRL)<\/strong> framework, engineered to systematically overcome the instability and domain gaps endemic to multimodal reasoning. <strong>The framework includes two breakthrough innovations:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Online Difficulty Soft Weighting (ODSW):<\/strong> This mechanism assigns dynamic weights to training samples according to their difficulty and the model\u2019s evolving capabilities. Rather than rigidly filtering out \u201ceasy\u201d or \u201chard\u201d samples, ODSW ensures each prompt contributes appropriately to gradient updates\u2014enabling the model to progress from clear cases to intricate, challenging ones through a continuous curriculum. Three variants tune the focus for easy, medium, or hard stages using a piecewise function based on rollout accuracy, guided by learnability theory and empirical distribution of task difficulty.<\/li>\n<li><strong>Dynamic Length Reward (DyLR):<\/strong> Traditional length rewards in RL-based reasoning models set a static target, which fails to consider task complexity and encourages unnecessary verbosity. DyLR solves this by calculating an ideal target length per prompt, estimated via the average length of correct rollout samples for each question. Short, rapid reasoning is promoted for easy tasks, while complex ones incentivize deeper, multi-step exploration, perfectly balancing efficiency and correctness.<\/li>\n<\/ul>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"465\" data-attachment-id=\"73404\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/08\/08\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/screenshot-2025-08-08-at-10-13-08-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-08-at-10.13.08-PM-1.png\" data-orig-size=\"1474,670\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2025-08-08 at 10.13.08\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-08-at-10.13.08-PM-1-300x136.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-08-at-10.13.08-PM-1-1024x465.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-08-at-10.13.08-PM-1-1024x465.png\" alt=\"\" class=\"wp-image-73404\" \/><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Training Pipeline<\/strong><\/h3>\n<p>VL-Cogito\u2019s RL post-training starts directly from the Qwen2.5-VL-Instruct-7B backbone, with <strong>no initial supervised fine-tuning (SFT) cold start<\/strong> required. <strong>The PCuRL process is explicitly divided into three sequential RL stages: easy, medium, and hard. In each stage:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li>The same dataset is shuffled, exposing the model to various generalization challenges.<\/li>\n<li><strong>ODSW\u2019s weighting function<\/strong> for that stage biases gradient updates towards the target difficulty.<\/li>\n<li>In the hard stage, DyLR is triggered to encourage adaptive reasoning chain expansion.<\/li>\n<\/ul>\n<p><strong>Technical setup details:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li>AdamW optimizer, LR=1e-6, DeepSpeed-ZeRO3.<\/li>\n<li>Rollout batch size: 512; global batch size: 128; sequence length: 4,096; KL divergence loss: 1e-3; 16 response samples per prompt; temperature: 1.0.<\/li>\n<li>Reward hyperparameters: \u03b1=1, \u03b2=0.5, \u03b3=1, w=0.25 (penalty for zero-accuracy prompts).<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\"><strong>Dataset Curation and RL Data Sampling<\/strong><\/h3>\n<p>A meticulously curated training set covers 23 open-source multimodal datasets across six task categories: <strong>Mathematical Reasoning, Logical Reasoning, Counting, Science Reasoning, Chart Understanding, and General Image Understanding.<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li>All samples are reformulated to open-ended QA formats to prevent superficial multiple-choice cue exploitation.<\/li>\n<li>Difficulty sampling: Qwen2.5-VL-7B-Instruct is trialed; any sample passed by it with \u226550% accuracy over 8 runs is dropped, guaranteeing that only genuinely challenging tasks remain.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\"><strong>Evaluation and Benchmark Results<\/strong><\/h3>\n<h4 class=\"wp-block-heading\"><strong>Performance Across Benchmarks<\/strong><\/h4>\n<p>VL-Cogito is benchmarked against both general-purpose and reasoning-oriented MLLMs on a ten-task panel, including datasets like Geometry@3K, MathVerse, MathVista, ChartQA, ScienceQA, MMMU, EMMA, and MMStar.<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Absolute accuracy gains<\/strong> over the backbone: +7.6% on Geometry@3K, +5.5% on MathVista, +4.9% on LogicVista, +2.2% on ScienceQA, +4.5% on EMMA, +3.8% on MMStar.<\/li>\n<li><strong>State-of-the-art results on 6\/10 benchmarks<\/strong>: VL-Cogito either leads or matches top results, especially on rigorous math and scientific tasks. Models \u201ccold-started\u201d with SFT or forced rethinking strategies do not surpass its robust, curriculum-based RL.<\/li>\n<\/ul>\n<figure class=\"wp-block-table\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<th>Model<\/th>\n<th>Geo3K<\/th>\n<th>MathVerse<\/th>\n<th>MathVista<\/th>\n<th>MathVision<\/th>\n<th>LogicVista<\/th>\n<th>ChartQA<\/th>\n<th>SciQA<\/th>\n<th>MMMU<\/th>\n<th>EMMA<\/th>\n<th>MMStar<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>VL-Cogito (7B)<\/td>\n<td>68.7<\/td>\n<td>53.3<\/td>\n<td>74.8<\/td>\n<td>30.7<\/td>\n<td>48.9<\/td>\n<td>83.4<\/td>\n<td>87.6<\/td>\n<td>52.6<\/td>\n<td>29.1<\/td>\n<td>66.3<\/td>\n<\/tr>\n<tr>\n<td>VL-Rethinker (7B)<\/td>\n<td>67.7<\/td>\n<td><strong>54.6<\/strong><\/td>\n<td>73.7<\/td>\n<td><strong>30.1<\/strong><\/td>\n<td>45.7<\/td>\n<td><strong>83.5<\/strong><\/td>\n<td>86.7<\/td>\n<td>52.9<\/td>\n<td>28.6<\/td>\n<td>64.2<\/td>\n<\/tr>\n<tr>\n<td>MM-Eureka (8B)<\/td>\n<td>67.2<\/td>\n<td>52.3<\/td>\n<td>73.4<\/td>\n<td>29.4<\/td>\n<td>47.1<\/td>\n<td>82.7<\/td>\n<td>86.4<\/td>\n<td>52.3<\/td>\n<td>27.4<\/td>\n<td>64.7<\/td>\n<\/tr>\n<tr>\n<td>Qwen2.5-VL (7B)<\/td>\n<td>61.6<\/td>\n<td>50.4<\/td>\n<td>69.3<\/td>\n<td>28.7<\/td>\n<td>44.0<\/td>\n<td>82.4<\/td>\n<td>85.4<\/td>\n<td>50.9<\/td>\n<td>24.6<\/td>\n<td>62.5<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<h4 class=\"wp-block-heading\"><strong>Component-wise Ablation<\/strong><\/h4>\n<ul class=\"wp-block-list\">\n<li><strong>Curriculum RL<\/strong> alone lifts average scores by +0.8% over vanilla GRPO.<\/li>\n<li><strong>Dynamic length reward<\/strong> further boosts performance, especially in hard math domains.<\/li>\n<li><strong>ODSW<\/strong> consistently outperforms binary hard sample filtering, especially when training data is imbalanced or skewed.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\"><strong>Reasoning Efficiency and Training Dynamics<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li><strong>Dynamic rewards<\/strong> yield higher average accuracy and better token efficiency than fixed-length cosine rewards. Adaptive length emerges as longer for math and logic tasks, shorter for science and general understanding, precisely as intended.<\/li>\n<li>PCuRL\u2019s hard stage induces a spike in reasoning length and validation accuracy, surpassing vanilla GRPO whose accuracy plateaus despite static output length.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\"><strong>Case Studies<\/strong><\/h3>\n<p>VL-Cogito exhibits detailed, self-reflective, stepwise reasoning. For math, the model decomposes solutions into granular chains and actively corrects missteps, a behavior instilled by RL verification and advantage estimation[1, Figure 5]. On classification-style problems (e.g., identifying decomposers or skyscrapers in images), it methodically considers each option before boxing the answer, demonstrating strong multimodal comprehension and process reliability.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img decoding=\"async\" width=\"1024\" height=\"880\" data-attachment-id=\"73402\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/08\/08\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/screenshot-2025-08-08-at-10-12-20-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-08-at-10.12.20-PM-1.png\" data-orig-size=\"1518,1304\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2025-08-08 at 10.12.20\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-08-at-10.12.20-PM-1-300x258.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-08-at-10.12.20-PM-1-1024x880.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-08-at-10.12.20-PM-1-1024x880.png\" alt=\"\" class=\"wp-image-73402\" \/><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Insights and Impact<\/strong><\/h3>\n<p><strong>VL-Cogito\u2019s systematic PCuRL pipeline validates several key insights:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Learnability matters:<\/strong> Prompts with intermediate difficulty optimize model progress best.<\/li>\n<li><strong>Exposure to challenge catalyzes deep reasoning:<\/strong> Over-emphasis on easy samples degenerates performance; progressive emphasis on harder samples builds durable analytic depth.<\/li>\n<li><strong>Reward granularity is crucial:<\/strong> Combining correctness, format, and length facilitates nuanced, context-sensitive reasoning outputs.<\/li>\n<li><strong>No-sft cold-start RL is feasible and highly effective:<\/strong> With PCuRL, models need not rely on expensive SFT warm-up.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\"><strong>Conclusion<\/strong><\/h3>\n<p>VL-Cogito\u2019s architecture and training innovations set a new standard for multimodal reasoning across diverse benchmarks. The design and empirical validation of progressive curriculum RL with dynamic length rewards point toward a general roadmap for robust reasoning in multimodal models. <\/p>\n\n<div class=\"wp-block-group\">\n<div class=\"wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained\">\n<div class=\"wp-block-buttons is-horizontal is-content-justification-center is-layout-flex wp-container-core-buttons-is-layout-499968f5 wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link has-luminous-vivid-orange-background-color has-background wp-element-button\" href=\"https:\/\/news.ycombinator.com\/submitlink?u=https:\/\/www.marktechpost.com\/2025\/08\/08\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/\" target=\"_blank\" rel=\"noreferrer noopener\"><img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f1fe.png\" alt=\"\ud83c\uddfe\" class=\"wp-smiley\" \/> Discuss on Hacker News <\/a><\/div>\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link has-vivid-cyan-blue-background-color has-background wp-element-button\" href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\"> <img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f1f7.png\" alt=\"\ud83c\uddf7\" class=\"wp-smiley\" \/> Join our ML Subreddit <\/a><\/div>\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link has-vivid-green-cyan-background-color has-background wp-element-button\" href=\"https:\/\/promotion.marktechpost.com\/\" target=\"_blank\" rel=\"noreferrer noopener\"> <img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f1f8.png\" alt=\"\ud83c\uddf8\" class=\"wp-smiley\" \/> Sponsor us <\/a><\/div>\n<\/div>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/arxiv.org\/abs\/2507.22607\" target=\"_blank\" rel=\"noreferrer noopener\">Paper<\/a>.<\/strong>\u00a0Feel free to check out our\u00a0<strong><mark><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub Page for Tutorials, Codes and Notebooks<\/a><\/mark><\/strong>.\u00a0Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">100k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>.<\/p>\n<\/div>\n<\/div>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2025\/08\/08\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/\">VL-Cogito: Advancing Multimodal Reasoning with Progressive Curriculum Reinforcement Learning<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Multimodal reasoning, where models integrate and interpret information from multiple sources such as text, images, and diagrams, is a frontier challenge in AI. VL-Cogito is a state-of-the-art Multimodal Large Language Model (MLLM) proposed by DAMO Academy (Alibaba Group) and partners, introducing a robust reinforcement learning pipeline that fundamentally upgrades the reasoning skills of large models across mathematics, science, logic, charts, and general understanding. Core Innovations VL-Cogito\u2019s unique approach centers around the Progressive Curriculum Reinforcement Learning (PCuRL) framework, engineered to systematically overcome the instability and domain gaps endemic to multimodal reasoning. The framework includes two breakthrough innovations: Online Difficulty Soft Weighting (ODSW): This mechanism assigns dynamic weights to training samples according to their difficulty and the model\u2019s evolving capabilities. Rather than rigidly filtering out \u201ceasy\u201d or \u201chard\u201d samples, ODSW ensures each prompt contributes appropriately to gradient updates\u2014enabling the model to progress from clear cases to intricate, challenging ones through a continuous curriculum. Three variants tune the focus for easy, medium, or hard stages using a piecewise function based on rollout accuracy, guided by learnability theory and empirical distribution of task difficulty. Dynamic Length Reward (DyLR): Traditional length rewards in RL-based reasoning models set a static target, which fails to consider task complexity and encourages unnecessary verbosity. DyLR solves this by calculating an ideal target length per prompt, estimated via the average length of correct rollout samples for each question. Short, rapid reasoning is promoted for easy tasks, while complex ones incentivize deeper, multi-step exploration, perfectly balancing efficiency and correctness. Training Pipeline VL-Cogito\u2019s RL post-training starts directly from the Qwen2.5-VL-Instruct-7B backbone, with no initial supervised fine-tuning (SFT) cold start required. The PCuRL process is explicitly divided into three sequential RL stages: easy, medium, and hard. In each stage: The same dataset is shuffled, exposing the model to various generalization challenges. ODSW\u2019s weighting function for that stage biases gradient updates towards the target difficulty. In the hard stage, DyLR is triggered to encourage adaptive reasoning chain expansion. Technical setup details: AdamW optimizer, LR=1e-6, DeepSpeed-ZeRO3. Rollout batch size: 512; global batch size: 128; sequence length: 4,096; KL divergence loss: 1e-3; 16 response samples per prompt; temperature: 1.0. Reward hyperparameters: \u03b1=1, \u03b2=0.5, \u03b3=1, w=0.25 (penalty for zero-accuracy prompts). Dataset Curation and RL Data Sampling A meticulously curated training set covers 23 open-source multimodal datasets across six task categories: Mathematical Reasoning, Logical Reasoning, Counting, Science Reasoning, Chart Understanding, and General Image Understanding. All samples are reformulated to open-ended QA formats to prevent superficial multiple-choice cue exploitation. Difficulty sampling: Qwen2.5-VL-7B-Instruct is trialed; any sample passed by it with \u226550% accuracy over 8 runs is dropped, guaranteeing that only genuinely challenging tasks remain. Evaluation and Benchmark Results Performance Across Benchmarks VL-Cogito is benchmarked against both general-purpose and reasoning-oriented MLLMs on a ten-task panel, including datasets like Geometry@3K, MathVerse, MathVista, ChartQA, ScienceQA, MMMU, EMMA, and MMStar. Absolute accuracy gains over the backbone: +7.6% on Geometry@3K, +5.5% on MathVista, +4.9% on LogicVista, +2.2% on ScienceQA, +4.5% on EMMA, +3.8% on MMStar. State-of-the-art results on 6\/10 benchmarks: VL-Cogito either leads or matches top results, especially on rigorous math and scientific tasks. Models \u201ccold-started\u201d with SFT or forced rethinking strategies do not surpass its robust, curriculum-based RL. Model Geo3K MathVerse MathVista MathVision LogicVista ChartQA SciQA MMMU EMMA MMStar VL-Cogito (7B) 68.7 53.3 74.8 30.7 48.9 83.4 87.6 52.6 29.1 66.3 VL-Rethinker (7B) 67.7 54.6 73.7 30.1 45.7 83.5 86.7 52.9 28.6 64.2 MM-Eureka (8B) 67.2 52.3 73.4 29.4 47.1 82.7 86.4 52.3 27.4 64.7 Qwen2.5-VL (7B) 61.6 50.4 69.3 28.7 44.0 82.4 85.4 50.9 24.6 62.5 Component-wise Ablation Curriculum RL alone lifts average scores by +0.8% over vanilla GRPO. Dynamic length reward further boosts performance, especially in hard math domains. ODSW consistently outperforms binary hard sample filtering, especially when training data is imbalanced or skewed. Reasoning Efficiency and Training Dynamics Dynamic rewards yield higher average accuracy and better token efficiency than fixed-length cosine rewards. Adaptive length emerges as longer for math and logic tasks, shorter for science and general understanding, precisely as intended. PCuRL\u2019s hard stage induces a spike in reasoning length and validation accuracy, surpassing vanilla GRPO whose accuracy plateaus despite static output length. Case Studies VL-Cogito exhibits detailed, self-reflective, stepwise reasoning. For math, the model decomposes solutions into granular chains and actively corrects missteps, a behavior instilled by RL verification and advantage estimation[1, Figure 5]. On classification-style problems (e.g., identifying decomposers or skyscrapers in images), it methodically considers each option before boxing the answer, demonstrating strong multimodal comprehension and process reliability. Insights and Impact VL-Cogito\u2019s systematic PCuRL pipeline validates several key insights: Learnability matters: Prompts with intermediate difficulty optimize model progress best. Exposure to challenge catalyzes deep reasoning: Over-emphasis on easy samples degenerates performance; progressive emphasis on harder samples builds durable analytic depth. Reward granularity is crucial: Combining correctness, format, and length facilitates nuanced, context-sensitive reasoning outputs. No-sft cold-start RL is feasible and highly effective: With PCuRL, models need not rely on expensive SFT warm-up. Conclusion VL-Cogito\u2019s architecture and training innovations set a new standard for multimodal reasoning across diverse benchmarks. The design and empirical validation of progressive curriculum RL with dynamic length rewards point toward a general roadmap for robust reasoning in multimodal models. Discuss on Hacker News Join our ML Subreddit Sponsor us Check out the\u00a0Paper.\u00a0Feel free to check out our\u00a0GitHub Page for Tutorials, Codes and Notebooks.\u00a0Also,\u00a0feel free to follow us on\u00a0Twitter\u00a0and don\u2019t forget to join our\u00a0100k+ ML SubReddit\u00a0and Subscribe to\u00a0our Newsletter. The post VL-Cogito: Advancing Multimodal Reasoning with Progressive Curriculum Reinforcement Learning appeared first on MarkTechPost.<\/p>","protected":false},"author":2,"featured_media":30483,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-30482","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>VL-Cogito: Advancing Multimodal Reasoning with Progressive Curriculum Reinforcement Learning - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/de\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"VL-Cogito: Advancing Multimodal Reasoning with Progressive Curriculum Reinforcement Learning - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/de\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-08-09T05:58:36+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"4\u00a0Minuten\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"VL-Cogito: Advancing Multimodal Reasoning with Progressive Curriculum Reinforcement Learning\",\"datePublished\":\"2025-08-09T05:58:36+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/\"},\"wordCount\":887,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-08-at-10.13.08-PM-1-1024x465-Xwl56X.png\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/\",\"url\":\"https:\/\/youzum.net\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/\",\"name\":\"VL-Cogito: Advancing Multimodal Reasoning with Progressive Curriculum Reinforcement Learning - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-08-at-10.13.08-PM-1-1024x465-Xwl56X.png\",\"datePublished\":\"2025-08-09T05:58:36+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/youzum.net\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/#primaryimage\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-08-at-10.13.08-PM-1-1024x465-Xwl56X.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-08-at-10.13.08-PM-1-1024x465-Xwl56X.png\",\"width\":1024,\"height\":465},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"VL-Cogito: Advancing Multimodal Reasoning with Progressive Curriculum Reinforcement Learning\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/de\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"VL-Cogito: Advancing Multimodal Reasoning with Progressive Curriculum Reinforcement Learning - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/de\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/","og_locale":"de_DE","og_type":"article","og_title":"VL-Cogito: Advancing Multimodal Reasoning with Progressive Curriculum Reinforcement Learning - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/de\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2025-08-09T05:58:36+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Verfasst von":"admin NU","Gesch\u00e4tzte Lesezeit":"4\u00a0Minuten"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"VL-Cogito: Advancing Multimodal Reasoning with Progressive Curriculum Reinforcement Learning","datePublished":"2025-08-09T05:58:36+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/"},"wordCount":887,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-08-at-10.13.08-PM-1-1024x465-Xwl56X.png","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/","url":"https:\/\/youzum.net\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/","name":"VL-Cogito: Advancing Multimodal Reasoning with Progressive Curriculum Reinforcement Learning - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-08-at-10.13.08-PM-1-1024x465-Xwl56X.png","datePublished":"2025-08-09T05:58:36+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/youzum.net\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/#primaryimage","url":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-08-at-10.13.08-PM-1-1024x465-Xwl56X.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-08-at-10.13.08-PM-1-1024x465-Xwl56X.png","width":1024,"height":465},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/vl-cogito-advancing-multimodal-reasoning-with-progressive-curriculum-reinforcement-learning\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"VL-Cogito: Advancing Multimodal Reasoning with Progressive Curriculum Reinforcement Learning"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/de\/members\/adminnu\/"}]}},"rttpg_featured_image_url":{"full":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-08-at-10.13.08-PM-1-1024x465-Xwl56X.png",1024,465,false],"landscape":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-08-at-10.13.08-PM-1-1024x465-Xwl56X.png",1024,465,false],"portraits":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-08-at-10.13.08-PM-1-1024x465-Xwl56X.png",1024,465,false],"thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-08-at-10.13.08-PM-1-1024x465-Xwl56X-150x150.png",150,150,true],"medium":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-08-at-10.13.08-PM-1-1024x465-Xwl56X-300x136.png",300,136,true],"large":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-08-at-10.13.08-PM-1-1024x465-Xwl56X.png",1024,465,false],"1536x1536":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-08-at-10.13.08-PM-1-1024x465-Xwl56X.png",1024,465,false],"2048x2048":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-08-at-10.13.08-PM-1-1024x465-Xwl56X.png",1024,465,false],"trp-custom-language-flag":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-08-at-10.13.08-PM-1-1024x465-Xwl56X-18x8.png",18,8,true],"woocommerce_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-08-at-10.13.08-PM-1-1024x465-Xwl56X-300x300.png",300,300,true],"woocommerce_single":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-08-at-10.13.08-PM-1-1024x465-Xwl56X-600x272.png",600,272,true],"woocommerce_gallery_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/08\/Screenshot-2025-08-08-at-10.13.08-PM-1-1024x465-Xwl56X-100x100.png",100,100,true]},"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/de\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/de\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"Multimodal reasoning, where models integrate and interpret information from multiple sources such as text, images, and diagrams, is a frontier challenge in AI. VL-Cogito is a state-of-the-art Multimodal Large Language Model (MLLM) proposed by DAMO Academy (Alibaba Group) and partners, introducing a robust reinforcement learning pipeline that fundamentally upgrades the reasoning skills of large models&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts\/30482","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/comments?post=30482"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts\/30482\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/media\/30483"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/media?parent=30482"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/categories?post=30482"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/tags?post=30482"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}