{"id":50950,"date":"2025-11-12T08:02:27","date_gmt":"2025-11-12T08:02:27","guid":{"rendered":"https:\/\/youzum.net\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/"},"modified":"2025-11-12T08:02:27","modified_gmt":"2025-11-12T08:02:27","slug":"baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family","status":"publish","type":"post","link":"https:\/\/youzum.net\/fr\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/","title":{"rendered":"Baidu Releases ERNIE-4.5-VL-28B-A3B-Thinking: An Open-Source and Compact Multimodal Reasoning Model Under the ERNIE-4.5 Family"},"content":{"rendered":"<p>How can we get large model level multimodal reasoning for documents, charts and videos while running only a 3B class model in production? Baidu has added a new model to the ERNIE-4.5 open source family. ERNIE-4.5-VL-28B-A3B-Thinking is a vision language model that focuses on document, chart and video understanding with a small active parameter budget.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img fetchpriority=\"high\" decoding=\"async\" width=\"2560\" height=\"1116\" data-attachment-id=\"76183\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/11\/11\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/image-215\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/11\/image-20-scaled.png\" data-orig-size=\"2560,1116\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"image\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/11\/image-20-300x131.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/11\/image-20-1024x447.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/11\/image-20-scaled.png\" alt=\"\" class=\"wp-image-76183\" \/><figcaption class=\"wp-element-caption\">https:\/\/huggingface.co\/baidu\/ERNIE-4.5-VL-28B-A3B-Thinking<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Architecture and training setup<\/strong><\/h3>\n<p>ERNIE-4.5-VL-28B-A3B-Thinking is built on the ERNIE-4.5-VL-28B-A3B Mixture of Experts architecture. The family uses a heterogeneous multimodal MoE design with shared parameters across text and vision plus modality specific experts. At the model level, it has 30B total parameters, while the architecture is in the 28B-VL branch, and only 3B parameters are activated per token through an A3B routing scheme. This gives the compute and memory profile of a 3B class model while keeping a larger capacity pool for reasoning.<\/p>\n<p>The model goes through an additional mid training stage on a large visual language reasoning corpus. This stage is designed to improve representation power and semantic alignment between visual and language modalities, which matters for dense text in documents and fine structures in charts. On top of that, ERNIE-4.5-VL-28B-A3B-Thinking uses multimodal reinforcement learning on verifiable tasks, with GSPO and IcePop strategies and dynamic difficulty sampling to stabilize MoE training and push the model toward hard examples.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Key capabilities<\/strong><\/h3>\n<p>Baidu researchers position this model as a lightweight multimodal reasoning engine that can activate only 3B parameters while approaching the behavior of larger flagship systems on internal benchmarks. Officially listed capabilities include visual reasoning, STEM reasoning, visual grounding, Thinking with Images, tool utilization and video understanding.<\/p>\n<p>Thinking with Images is at the core. The model can zoom into regions, reason on cropped views and then integrate those local observations into a final answer. Tool utilization extends this with calls to tools such as image search when internal knowledge is not enough. Both features are exposed as part of the reasoning parser and tool call parser path in deployment.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Performance and positioning<\/strong><\/h3>\n<p>The lightweight vision language model ERNIE-4.5-VL-28B-A3B achieves competitive or superior performance compared to Qwen-2.5-VL-7B and Qwen-2.5-VL-32B on many benchmarks, while using fewer activation parameters. ERNIE-4.5-VL models also support both thinking and non thinking modes, with the thinking mode improving reasoning centered tasks while keeping strong perception quality.<\/p>\n<p>For the specific Thinking variant, Baidu researchers describe ERNIE-4.5-VL-28B-A3B-Thinking as closely matching the performance of industry flagship models across internal multimodal benchmarks. <\/p>\n<h3 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h3>\n<ol class=\"wp-block-list\">\n<li>ERNIE-4.5-VL-28B-A3B-Thinking uses a Mixture of Experts architecture with about 30B total parameters and only 3B active parameters per token to deliver efficient multimodal reasoning.<\/li>\n<li>The model is optimized for document, chart and video understanding through an additional visual language reasoning mid training stage and multimodal reinforcement learning using GSPO, IcePop and dynamic difficulty sampling.<\/li>\n<li>Thinking with Images lets the model iteratively zoom into image regions and reason over crops, while tool utilization enables calls to external tools such as image search for long tail recognition.<\/li>\n<li>It demonstrate strong performance on analytics style charts, STEM circuit problems, visual grounding with JSON bounding boxes and video segment localization with timestamped answers.<\/li>\n<li>The model is released under Apache License 2.0, supports deployment via transformers, vLLM and FastDeploy, and can be fine tuned with ERNIEKit using SFT, LoRA and DPO for commercial multimodal applications.<\/li>\n<\/ol>\n<h3 class=\"wp-block-heading\"><strong>Comparison Table<\/strong><\/h3>\n<figure class=\"wp-block-table is-style-stripes\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<th>Model<\/th>\n<th>Training stage<\/th>\n<th>Total \/ active parameters<\/th>\n<th>Modalities<\/th>\n<th>Context length (tokens)<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>ERNIE-4.5-VL-28B-A3B-Base<\/strong><\/td>\n<td>Pretraining<\/td>\n<td>28B total, 3B active per token<\/td>\n<td>Text, Vision<\/td>\n<td>131,072<\/td>\n<\/tr>\n<tr>\n<td><strong>ERNIE-4.5-VL-28B-A3B (PT)<\/strong><\/td>\n<td>Posttraining chat model<\/td>\n<td>28B total, 3B active per token<\/td>\n<td>Text, Vision<\/td>\n<td>131,072<\/td>\n<\/tr>\n<tr>\n<td><strong>ERNIE-4.5-VL-28B-A3B-Thinking<\/strong><\/td>\n<td>Reasoning oriented mid training on ERNIE-4.5-VL-28B-A3B<\/td>\n<td>28B architecture, 3B active per token, HF model size 30B params<\/td>\n<td>Text, Vision<\/td>\n<td>131,072 (FastDeploy example uses 131,072 max model length)<\/td>\n<\/tr>\n<tr>\n<td><strong>Qwen2.5-VL-7B-Instruct<\/strong><\/td>\n<td>Posttraining vision language model<\/td>\n<td>\u22488B total (7B class)<\/td>\n<td>Text, Image, Video<\/td>\n<td>32,768 text positions in config (max_position_embeddings)<\/td>\n<\/tr>\n<tr>\n<td><strong>Qwen2.5-VL-32B-Instruct<\/strong><\/td>\n<td>Posttraining plus reinforcement tuned large VL model<\/td>\n<td>33B total<\/td>\n<td>Text, Image, Video<\/td>\n<td>32,768 text positions (same Qwen2.5-VLTextConfig family)<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<h3 class=\"wp-block-heading\"><strong>Editorial Comments<\/strong><\/h3>\n<p>ERNIE-4.5-VL-28B-A3B-Thinking is a practical release for teams that want multimodal reasoning on documents, charts and videos with only 3B activated parameters, while still using a Mixture-of-Experts architecture with about 30B total parameters and Apache License 2.0. It connects Thinking with Images, tool utilization and multimodal reinforcement learning into a deployable stack that directly targets real world analytics and understanding workloads. <\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/github.com\/PaddlePaddle\/ERNIE\" target=\"_blank\" rel=\"noreferrer noopener\">Repo<\/a>, <a href=\"https:\/\/huggingface.co\/baidu\/ERNIE-4.5-VL-28B-A3B-Thinking\" target=\"_blank\" rel=\"noreferrer noopener\">Model Weights<\/a> <\/strong>and<strong> <a href=\"https:\/\/yiyan.baidu.com\/blog\/posts\/ernie-4.5-vl-28b-a3b-thinking\/\" target=\"_blank\" rel=\"noreferrer noopener\">Technical details<\/a><\/strong>.\u00a0Feel free to check out our\u00a0<strong><mark><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub Page for Tutorials, Codes and Notebooks<\/a><\/mark><\/strong>.\u00a0Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">100k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2025\/11\/11\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/\">Baidu Releases ERNIE-4.5-VL-28B-A3B-Thinking: An Open-Source and Compact Multimodal Reasoning Model Under the ERNIE-4.5 Family<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>How can we get large model level multimodal reasoning for documents, charts and videos while running only a 3B class model in production? Baidu has added a new model to the ERNIE-4.5 open source family. ERNIE-4.5-VL-28B-A3B-Thinking is a vision language model that focuses on document, chart and video understanding with a small active parameter budget. https:\/\/huggingface.co\/baidu\/ERNIE-4.5-VL-28B-A3B-Thinking Architecture and training setup ERNIE-4.5-VL-28B-A3B-Thinking is built on the ERNIE-4.5-VL-28B-A3B Mixture of Experts architecture. The family uses a heterogeneous multimodal MoE design with shared parameters across text and vision plus modality specific experts. At the model level, it has 30B total parameters, while the architecture is in the 28B-VL branch, and only 3B parameters are activated per token through an A3B routing scheme. This gives the compute and memory profile of a 3B class model while keeping a larger capacity pool for reasoning. The model goes through an additional mid training stage on a large visual language reasoning corpus. This stage is designed to improve representation power and semantic alignment between visual and language modalities, which matters for dense text in documents and fine structures in charts. On top of that, ERNIE-4.5-VL-28B-A3B-Thinking uses multimodal reinforcement learning on verifiable tasks, with GSPO and IcePop strategies and dynamic difficulty sampling to stabilize MoE training and push the model toward hard examples. Key capabilities Baidu researchers position this model as a lightweight multimodal reasoning engine that can activate only 3B parameters while approaching the behavior of larger flagship systems on internal benchmarks. Officially listed capabilities include visual reasoning, STEM reasoning, visual grounding, Thinking with Images, tool utilization and video understanding. Thinking with Images is at the core. The model can zoom into regions, reason on cropped views and then integrate those local observations into a final answer. Tool utilization extends this with calls to tools such as image search when internal knowledge is not enough. Both features are exposed as part of the reasoning parser and tool call parser path in deployment. Performance and positioning The lightweight vision language model ERNIE-4.5-VL-28B-A3B achieves competitive or superior performance compared to Qwen-2.5-VL-7B and Qwen-2.5-VL-32B on many benchmarks, while using fewer activation parameters. ERNIE-4.5-VL models also support both thinking and non thinking modes, with the thinking mode improving reasoning centered tasks while keeping strong perception quality. For the specific Thinking variant, Baidu researchers describe ERNIE-4.5-VL-28B-A3B-Thinking as closely matching the performance of industry flagship models across internal multimodal benchmarks. Key Takeaways ERNIE-4.5-VL-28B-A3B-Thinking uses a Mixture of Experts architecture with about 30B total parameters and only 3B active parameters per token to deliver efficient multimodal reasoning. The model is optimized for document, chart and video understanding through an additional visual language reasoning mid training stage and multimodal reinforcement learning using GSPO, IcePop and dynamic difficulty sampling. Thinking with Images lets the model iteratively zoom into image regions and reason over crops, while tool utilization enables calls to external tools such as image search for long tail recognition. It demonstrate strong performance on analytics style charts, STEM circuit problems, visual grounding with JSON bounding boxes and video segment localization with timestamped answers. The model is released under Apache License 2.0, supports deployment via transformers, vLLM and FastDeploy, and can be fine tuned with ERNIEKit using SFT, LoRA and DPO for commercial multimodal applications. Comparison Table Model Training stage Total \/ active parameters Modalities Context length (tokens) ERNIE-4.5-VL-28B-A3B-Base Pretraining 28B total, 3B active per token Text, Vision 131,072 ERNIE-4.5-VL-28B-A3B (PT) Posttraining chat model 28B total, 3B active per token Text, Vision 131,072 ERNIE-4.5-VL-28B-A3B-Thinking Reasoning oriented mid training on ERNIE-4.5-VL-28B-A3B 28B architecture, 3B active per token, HF model size 30B params Text, Vision 131,072 (FastDeploy example uses 131,072 max model length) Qwen2.5-VL-7B-Instruct Posttraining vision language model \u22488B total (7B class) Text, Image, Video 32,768 text positions in config (max_position_embeddings) Qwen2.5-VL-32B-Instruct Posttraining plus reinforcement tuned large VL model 33B total Text, Image, Video 32,768 text positions (same Qwen2.5-VLTextConfig family) Editorial Comments ERNIE-4.5-VL-28B-A3B-Thinking is a practical release for teams that want multimodal reasoning on documents, charts and videos with only 3B activated parameters, while still using a Mixture-of-Experts architecture with about 30B total parameters and Apache License 2.0. It connects Thinking with Images, tool utilization and multimodal reinforcement learning into a deployable stack that directly targets real world analytics and understanding workloads. Check out the\u00a0Repo, Model Weights and Technical details.\u00a0Feel free to check out our\u00a0GitHub Page for Tutorials, Codes and Notebooks.\u00a0Also,\u00a0feel free to follow us on\u00a0Twitter\u00a0and don\u2019t forget to join our\u00a0100k+ ML SubReddit\u00a0and Subscribe to\u00a0our Newsletter. Wait! are you on telegram?\u00a0now you can join us on telegram as well. The post Baidu Releases ERNIE-4.5-VL-28B-A3B-Thinking: An Open-Source and Compact Multimodal Reasoning Model Under the ERNIE-4.5 Family appeared first on MarkTechPost.<\/p>","protected":false},"author":2,"featured_media":50951,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-50950","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Baidu Releases ERNIE-4.5-VL-28B-A3B-Thinking: An Open-Source and Compact Multimodal Reasoning Model Under the ERNIE-4.5 Family - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/fr\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/\" \/>\n<meta property=\"og:locale\" content=\"fr_FR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Baidu Releases ERNIE-4.5-VL-28B-A3B-Thinking: An Open-Source and Compact Multimodal Reasoning Model Under the ERNIE-4.5 Family - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/fr\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-11-12T08:02:27+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"\u00c9crit par\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Dur\u00e9e de lecture estim\u00e9e\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"Baidu Releases ERNIE-4.5-VL-28B-A3B-Thinking: An Open-Source and Compact Multimodal Reasoning Model Under the ERNIE-4.5 Family\",\"datePublished\":\"2025-11-12T08:02:27+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/\"},\"wordCount\":850,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/11\/image-20-scaled-ZZ14vA.png\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"fr-FR\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/\",\"url\":\"https:\/\/youzum.net\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/\",\"name\":\"Baidu Releases ERNIE-4.5-VL-28B-A3B-Thinking: An Open-Source and Compact Multimodal Reasoning Model Under the ERNIE-4.5 Family - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/11\/image-20-scaled-ZZ14vA.png\",\"datePublished\":\"2025-11-12T08:02:27+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/#breadcrumb\"},\"inLanguage\":\"fr-FR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\/\/youzum.net\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/#primaryimage\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/11\/image-20-scaled-ZZ14vA.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/11\/image-20-scaled-ZZ14vA.png\",\"width\":2560,\"height\":1116},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Baidu Releases ERNIE-4.5-VL-28B-A3B-Thinking: An Open-Source and Compact Multimodal Reasoning Model Under the ERNIE-4.5 Family\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"fr-FR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/fr\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Baidu Releases ERNIE-4.5-VL-28B-A3B-Thinking: An Open-Source and Compact Multimodal Reasoning Model Under the ERNIE-4.5 Family - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/fr\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/","og_locale":"fr_FR","og_type":"article","og_title":"Baidu Releases ERNIE-4.5-VL-28B-A3B-Thinking: An Open-Source and Compact Multimodal Reasoning Model Under the ERNIE-4.5 Family - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/fr\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2025-11-12T08:02:27+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"\u00c9crit par":"admin NU","Dur\u00e9e de lecture estim\u00e9e":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"Baidu Releases ERNIE-4.5-VL-28B-A3B-Thinking: An Open-Source and Compact Multimodal Reasoning Model Under the ERNIE-4.5 Family","datePublished":"2025-11-12T08:02:27+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/"},"wordCount":850,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/11\/image-20-scaled-ZZ14vA.png","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"fr-FR","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/","url":"https:\/\/youzum.net\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/","name":"Baidu Releases ERNIE-4.5-VL-28B-A3B-Thinking: An Open-Source and Compact Multimodal Reasoning Model Under the ERNIE-4.5 Family - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/11\/image-20-scaled-ZZ14vA.png","datePublished":"2025-11-12T08:02:27+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/#breadcrumb"},"inLanguage":"fr-FR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/"]}]},{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/youzum.net\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/#primaryimage","url":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/11\/image-20-scaled-ZZ14vA.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/11\/image-20-scaled-ZZ14vA.png","width":2560,"height":1116},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/baidu-releases-ernie-4-5-vl-28b-a3b-thinking-an-open-source-and-compact-multimodal-reasoning-model-under-the-ernie-4-5-family\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"Baidu Releases ERNIE-4.5-VL-28B-A3B-Thinking: An Open-Source and Compact Multimodal Reasoning Model Under the ERNIE-4.5 Family"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"fr-FR"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/fr\/members\/adminnu\/"}]}},"rttpg_featured_image_url":{"full":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/11\/image-20-scaled-ZZ14vA.png",2560,1116,false],"landscape":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/11\/image-20-scaled-ZZ14vA.png",2560,1116,false],"portraits":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/11\/image-20-scaled-ZZ14vA.png",2560,1116,false],"thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/11\/image-20-scaled-ZZ14vA-150x150.png",150,150,true],"medium":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/11\/image-20-scaled-ZZ14vA-300x131.png",300,131,true],"large":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/11\/image-20-scaled-ZZ14vA-1024x446.png",1024,446,true],"1536x1536":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/11\/image-20-scaled-ZZ14vA-1536x670.png",1536,670,true],"2048x2048":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/11\/image-20-scaled-ZZ14vA-2048x893.png",2048,893,true],"trp-custom-language-flag":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/11\/image-20-scaled-ZZ14vA-18x8.png",18,8,true],"woocommerce_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/11\/image-20-scaled-ZZ14vA-300x300.png",300,300,true],"woocommerce_single":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/11\/image-20-scaled-ZZ14vA-600x262.png",600,262,true],"woocommerce_gallery_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/11\/image-20-scaled-ZZ14vA-100x100.png",100,100,true]},"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/fr\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/fr\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/fr\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/fr\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/fr\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"How can we get large model level multimodal reasoning for documents, charts and videos while running only a 3B class model in production? Baidu has added a new model to the ERNIE-4.5 open source family. ERNIE-4.5-VL-28B-A3B-Thinking is a vision language model that focuses on document, chart and video understanding with a small active parameter budget.\u2026","_links":{"self":[{"href":"https:\/\/youzum.net\/fr\/wp-json\/wp\/v2\/posts\/50950","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/fr\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/fr\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/fr\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/fr\/wp-json\/wp\/v2\/comments?post=50950"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/fr\/wp-json\/wp\/v2\/posts\/50950\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/youzum.net\/fr\/wp-json\/wp\/v2\/media\/50951"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/fr\/wp-json\/wp\/v2\/media?parent=50950"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/fr\/wp-json\/wp\/v2\/categories?post=50950"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/fr\/wp-json\/wp\/v2\/tags?post=50950"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}