{"id":46839,"date":"2025-10-25T07:40:28","date_gmt":"2025-10-25T07:40:28","guid":{"rendered":"https:\/\/youzum.net\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/"},"modified":"2025-10-25T07:40:28","modified_gmt":"2025-10-25T07:40:28","slug":"liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices","status":"publish","type":"post","link":"https:\/\/youzum.net\/th\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/","title":{"rendered":"Liquid AI\u2019s LFM2-VL-3B Brings a 3B Parameter Vision Language Model (VLM) to Edge-Class Devices"},"content":{"rendered":"<p>Liquid AI released LFM2-VL-3B, a 3B parameter vision language model for image text to text tasks. It extends the LFM2-VL family beyond the 450M and 1.6B variants. The model targets higher accuracy while preserving the speed profile of the LFM2 architecture. It is available on LEAP and Hugging Face under the LFM Open License v1.0. <\/p>\n<h3 class=\"wp-block-heading\"><strong>Model overview and interface<\/strong><\/h3>\n<p>LFM2-VL-3B accepts interleaved image and text inputs and produces text outputs. The model exposes a ChatML like template. The processor inserts an <code>&lt;image&gt;<\/code> sentinel that is replaced with encoded image tokens at run time. The default text context length is 32,768 tokens. These details help devs reproduce evaluations and integrate the model with existing multimodal pipelines.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1562\" height=\"1118\" data-attachment-id=\"75691\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/10\/24\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/screenshot-2025-10-24-at-1-45-41-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-24-at-1.45.41-PM-1.png\" data-orig-size=\"1562,1118\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2025-10-24 at 1.45.41\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-24-at-1.45.41-PM-1-300x215.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-24-at-1.45.41-PM-1-1024x733.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-24-at-1.45.41-PM-1.png\" alt=\"\" class=\"wp-image-75691\" \/><figcaption class=\"wp-element-caption\">https:\/\/www.liquid.ai\/blog\/lfm2-vl-3b-a-new-efficient-vision-language-for-the-edge<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Architecture<\/strong><\/h3>\n<p>The stack pairs a language tower with a shape aware vision tower and a projector. The language tower is LFM2-2.6B, a hybrid convolution plus attention backbone. The vision tower is SigLIP2 NaFlex at 400M parameters, it preserves native aspect ratios and avoids distortion. The connector is a 2 layer MLP with pixel unshuffle, it compresses image tokens before fusion with the language space. This design lets users cap vision token budgets without retraining the model.<\/p>\n<p>The encoder processes native resolutions up to 512\u00d7512. Larger inputs are split into non overlapping 512\u00d7512 patches. A thumbnail pathway provides global context during tiling. The efficient token mapping is documented with concrete examples, a 256\u00d7384 image maps to 96 tokens, a 1000\u00d73000 image maps to 1,020 tokens. The model card exposes user controls for minimum and maximum image tokens and the tiling switch. These controls tune speed and quality at inference time.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Inference settings<\/strong><\/h3>\n<p>The Hugging Face model card provides recommended parameters. Text generation uses temperature 0.1, min p 0.15, and a repetition penalty of 1.05. Vision settings use min image tokens 64, max image tokens 256, and image splitting enabled. The processor applies the chat template and the image sentinel automatically. The example uses <code>AutoModelForImageTextToText<\/code> and <code>AutoProcessor<\/code> with <code>bfloat16<\/code> precision. <\/p>\n<h3 class=\"wp-block-heading\"><strong>How is it trained?<\/strong><\/h3>\n<p>Liquid AI describes a staged approach. The team performs joint mid training that adjusts the text to image ratio over time. The model then undergoes supervised fine tuning focused on image understanding. The data sources are large scale open datasets plus in house synthetic vision data for task coverage. <\/p>\n<h3 class=\"wp-block-heading\"><strong>Benchmarks<\/strong><\/h3>\n<p>The research team reports competitive results among lightweight open VLMs. On MM-IFEval the model reaches 51.83. On RealWorldQA it reaches 71.37. On MMBench dev en it reaches 79.81. The POPE score is 89.01. The table notes that scores for other systems were computed with VLMEvalKit. The table excludes Qwen3-VL-2B because that system was released one day earlier. <\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img decoding=\"async\" width=\"1666\" height=\"762\" data-attachment-id=\"75693\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/10\/24\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/screenshot-2025-10-24-at-1-49-55-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-24-at-1.49.55-PM-1.png\" data-orig-size=\"1666,762\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2025-10-24 at 1.49.55\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-24-at-1.49.55-PM-1-300x137.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-24-at-1.49.55-PM-1-1024x468.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-24-at-1.49.55-PM-1.png\" alt=\"\" class=\"wp-image-75693\" \/><figcaption class=\"wp-element-caption\">https:\/\/www.liquid.ai\/blog\/lfm2-vl-3b-a-new-efficient-vision-language-for-the-edge<\/figcaption><\/figure>\n<\/div>\n<p>The language capability remains close to the LFM2-2.6B backbone. The research team cites 30 percent on GPQA and 63 percent on MMLU. This matters when perception tasks include knowledge queries. The team also states expanded multilingual visual understanding across English, Japanese, French, Spanish, German, Italian, Portuguese, Arabic, Chinese, and Korean. <\/p>\n<h3 class=\"wp-block-heading\"><strong>Why edge users should care?<\/strong><\/h3>\n<p>The architecture keeps compute and memory within small device budgets. Image tokens are compressible and user constrained, so throughput is predictable. SigLIP2 400M NaFlex encoder preserves aspect ratios, which helps fine grained perception. The projector reduces tokens at the connector, which improves tokens per second. The research team also published a GGUF build for on device runtimes. These properties are useful for robotics, mobile, and industrial clients that need local processing and strict data boundaries.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h3>\n<ol class=\"wp-block-list\">\n<li><strong>Compact multimodal stack<\/strong>: 3B parameter LFM2-VL-3B pairs an LFM2-2.6B language tower with a 400M SigLIP2 NaFlex vision encoder and a 2-layer MLP projector for image-token fusion. NaFlex preserves native aspect ratios. <\/li>\n<li><strong>Resolution handling and token budgets<\/strong>: Images run natively up to 512\u00d7512, larger inputs tile into non overlapping 512\u00d7512 patches with a thumbnail pathway for global context. Documented token mappings include 256\u00d7384 \u2192 96 tokens and 1000\u00d73000 \u2192 1,020 tokens. <\/li>\n<li><strong>Inference interface<\/strong>: ChatML-like prompting with an <code>&lt;image&gt;<\/code> sentinel, default text context 32,768 tokens, recommended decoding settings, and processor-level controls for image splitting enable reproducible evaluation and easy integration in multimodal pipelines. <\/li>\n<li><strong>Measured performance<\/strong>: Reported results include MM-IFEval 51.83, RealWorldQA 71.37, MMBench-dev-en 79.81, and POPE 89.01. Language-only signals from the backbone are about 30% GPQA and 63% MMLU, useful for mixed perception plus knowledge workloads. <\/li>\n<\/ol>\n<h3 class=\"wp-block-heading\"><strong>Editorial Comments<\/strong><\/h3>\n<p>LFM2-VL-3B is a practical step for edge multimodal workloads, the 3B stack pairs LFM2-2.6B with a 400M SigLIP2 NaFlex encoder and an efficient projector, which lowers image token counts for predictable latency. Native resolution processing with 512 by 512 tiling and token caps gives deterministic budgets. Reported scores on MM-IFEval, RealWorldQA, MMBench, and POPE are competitive for this size. Open weights, a GGUF build, and LEAP access reduce integration friction. Overall, this is an edge ready VLM release with clear controls and transparent benchmarks.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/huggingface.co\/LiquidAI\/LFM2-VL-3B\" target=\"_blank\" rel=\"noreferrer noopener\">Model on HF<\/a> <\/strong>and<strong> <a href=\"https:\/\/www.liquid.ai\/blog\/lfm2-vl-3b-a-new-efficient-vision-language-for-the-edge\" target=\"_blank\" rel=\"noreferrer noopener\">Technical details<\/a><\/strong>. Feel free to check out our\u00a0<strong><mark><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub Page for Tutorials, Codes and Notebooks<\/a><\/mark><\/strong>.\u00a0Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">100k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2025\/10\/24\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/\">Liquid AI\u2019s LFM2-VL-3B Brings a 3B Parameter Vision Language Model (VLM) to Edge-Class Devices<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Liquid AI released LFM2-VL-3B, a 3B parameter vision language model for image text to text tasks. It extends the LFM2-VL family beyond the 450M and 1.6B variants. The model targets higher accuracy while preserving the speed profile of the LFM2 architecture. It is available on LEAP and Hugging Face under the LFM Open License v1.0. Model overview and interface LFM2-VL-3B accepts interleaved image and text inputs and produces text outputs. The model exposes a ChatML like template. The processor inserts an &lt;image&gt; sentinel that is replaced with encoded image tokens at run time. The default text context length is 32,768 tokens. These details help devs reproduce evaluations and integrate the model with existing multimodal pipelines. https:\/\/www.liquid.ai\/blog\/lfm2-vl-3b-a-new-efficient-vision-language-for-the-edge Architecture The stack pairs a language tower with a shape aware vision tower and a projector. The language tower is LFM2-2.6B, a hybrid convolution plus attention backbone. The vision tower is SigLIP2 NaFlex at 400M parameters, it preserves native aspect ratios and avoids distortion. The connector is a 2 layer MLP with pixel unshuffle, it compresses image tokens before fusion with the language space. This design lets users cap vision token budgets without retraining the model. The encoder processes native resolutions up to 512\u00d7512. Larger inputs are split into non overlapping 512\u00d7512 patches. A thumbnail pathway provides global context during tiling. The efficient token mapping is documented with concrete examples, a 256\u00d7384 image maps to 96 tokens, a 1000\u00d73000 image maps to 1,020 tokens. The model card exposes user controls for minimum and maximum image tokens and the tiling switch. These controls tune speed and quality at inference time. Inference settings The Hugging Face model card provides recommended parameters. Text generation uses temperature 0.1, min p 0.15, and a repetition penalty of 1.05. Vision settings use min image tokens 64, max image tokens 256, and image splitting enabled. The processor applies the chat template and the image sentinel automatically. The example uses AutoModelForImageTextToText and AutoProcessor with bfloat16 precision. How is it trained? Liquid AI describes a staged approach. The team performs joint mid training that adjusts the text to image ratio over time. The model then undergoes supervised fine tuning focused on image understanding. The data sources are large scale open datasets plus in house synthetic vision data for task coverage. Benchmarks The research team reports competitive results among lightweight open VLMs. On MM-IFEval the model reaches 51.83. On RealWorldQA it reaches 71.37. On MMBench dev en it reaches 79.81. The POPE score is 89.01. The table notes that scores for other systems were computed with VLMEvalKit. The table excludes Qwen3-VL-2B because that system was released one day earlier. https:\/\/www.liquid.ai\/blog\/lfm2-vl-3b-a-new-efficient-vision-language-for-the-edge The language capability remains close to the LFM2-2.6B backbone. The research team cites 30 percent on GPQA and 63 percent on MMLU. This matters when perception tasks include knowledge queries. The team also states expanded multilingual visual understanding across English, Japanese, French, Spanish, German, Italian, Portuguese, Arabic, Chinese, and Korean. Why edge users should care? The architecture keeps compute and memory within small device budgets. Image tokens are compressible and user constrained, so throughput is predictable. SigLIP2 400M NaFlex encoder preserves aspect ratios, which helps fine grained perception. The projector reduces tokens at the connector, which improves tokens per second. The research team also published a GGUF build for on device runtimes. These properties are useful for robotics, mobile, and industrial clients that need local processing and strict data boundaries. Key Takeaways Compact multimodal stack: 3B parameter LFM2-VL-3B pairs an LFM2-2.6B language tower with a 400M SigLIP2 NaFlex vision encoder and a 2-layer MLP projector for image-token fusion. NaFlex preserves native aspect ratios. Resolution handling and token budgets: Images run natively up to 512\u00d7512, larger inputs tile into non overlapping 512\u00d7512 patches with a thumbnail pathway for global context. Documented token mappings include 256\u00d7384 \u2192 96 tokens and 1000\u00d73000 \u2192 1,020 tokens. Inference interface: ChatML-like prompting with an &lt;image&gt; sentinel, default text context 32,768 tokens, recommended decoding settings, and processor-level controls for image splitting enable reproducible evaluation and easy integration in multimodal pipelines. Measured performance: Reported results include MM-IFEval 51.83, RealWorldQA 71.37, MMBench-dev-en 79.81, and POPE 89.01. Language-only signals from the backbone are about 30% GPQA and 63% MMLU, useful for mixed perception plus knowledge workloads. Editorial Comments LFM2-VL-3B is a practical step for edge multimodal workloads, the 3B stack pairs LFM2-2.6B with a 400M SigLIP2 NaFlex encoder and an efficient projector, which lowers image token counts for predictable latency. Native resolution processing with 512 by 512 tiling and token caps gives deterministic budgets. Reported scores on MM-IFEval, RealWorldQA, MMBench, and POPE are competitive for this size. Open weights, a GGUF build, and LEAP access reduce integration friction. Overall, this is an edge ready VLM release with clear controls and transparent benchmarks. Check out the\u00a0Model on HF and Technical details. Feel free to check out our\u00a0GitHub Page for Tutorials, Codes and Notebooks.\u00a0Also,\u00a0feel free to follow us on\u00a0Twitter\u00a0and don\u2019t forget to join our\u00a0100k+ ML SubReddit\u00a0and Subscribe to\u00a0our Newsletter. Wait! are you on telegram?\u00a0now you can join us on telegram as well. The post Liquid AI\u2019s LFM2-VL-3B Brings a 3B Parameter Vision Language Model (VLM) to Edge-Class Devices appeared first on MarkTechPost.<\/p>","protected":false},"author":2,"featured_media":46840,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-46839","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Liquid AI\u2019s LFM2-VL-3B Brings a 3B Parameter Vision Language Model (VLM) to Edge-Class Devices - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/th\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/\" \/>\n<meta property=\"og:locale\" content=\"th_TH\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Liquid AI\u2019s LFM2-VL-3B Brings a 3B Parameter Vision Language Model (VLM) to Edge-Class Devices - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/th\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-10-25T07:40:28+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 \u0e19\u0e32\u0e17\u0e35\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"Liquid AI\u2019s LFM2-VL-3B Brings a 3B Parameter Vision Language Model (VLM) to Edge-Class Devices\",\"datePublished\":\"2025-10-25T07:40:28+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/\"},\"wordCount\":879,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-24-at-1.45.41-PM-1-GEBd49.webp\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"th\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/\",\"url\":\"https:\/\/youzum.net\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/\",\"name\":\"Liquid AI\u2019s LFM2-VL-3B Brings a 3B Parameter Vision Language Model (VLM) to Edge-Class Devices - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-24-at-1.45.41-PM-1-GEBd49.webp\",\"datePublished\":\"2025-10-25T07:40:28+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/#breadcrumb\"},\"inLanguage\":\"th\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/youzum.net\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/#primaryimage\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-24-at-1.45.41-PM-1-GEBd49.webp\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-24-at-1.45.41-PM-1-GEBd49.webp\",\"width\":1562,\"height\":1118},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Liquid AI\u2019s LFM2-VL-3B Brings a 3B Parameter Vision Language Model (VLM) to Edge-Class Devices\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"th\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/th\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Liquid AI\u2019s LFM2-VL-3B Brings a 3B Parameter Vision Language Model (VLM) to Edge-Class Devices - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/th\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/","og_locale":"th_TH","og_type":"article","og_title":"Liquid AI\u2019s LFM2-VL-3B Brings a 3B Parameter Vision Language Model (VLM) to Edge-Class Devices - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/th\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2025-10-25T07:40:28+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Written by":"admin NU","Est. reading time":"4 \u0e19\u0e32\u0e17\u0e35"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"Liquid AI\u2019s LFM2-VL-3B Brings a 3B Parameter Vision Language Model (VLM) to Edge-Class Devices","datePublished":"2025-10-25T07:40:28+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/"},"wordCount":879,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-24-at-1.45.41-PM-1-GEBd49.webp","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"th","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/","url":"https:\/\/youzum.net\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/","name":"Liquid AI\u2019s LFM2-VL-3B Brings a 3B Parameter Vision Language Model (VLM) to Edge-Class Devices - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-24-at-1.45.41-PM-1-GEBd49.webp","datePublished":"2025-10-25T07:40:28+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/#breadcrumb"},"inLanguage":"th","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/"]}]},{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/youzum.net\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/#primaryimage","url":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-24-at-1.45.41-PM-1-GEBd49.webp","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-24-at-1.45.41-PM-1-GEBd49.webp","width":1562,"height":1118},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/liquid-ais-lfm2-vl-3b-brings-a-3b-parameter-vision-language-model-vlm-to-edge-class-devices\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"Liquid AI\u2019s LFM2-VL-3B Brings a 3B Parameter Vision Language Model (VLM) to Edge-Class Devices"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"th"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/th\/members\/adminnu\/"}]}},"rttpg_featured_image_url":{"full":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-24-at-1.45.41-PM-1-GEBd49.webp",1562,1118,false],"landscape":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-24-at-1.45.41-PM-1-GEBd49.webp",1562,1118,false],"portraits":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-24-at-1.45.41-PM-1-GEBd49.webp",1562,1118,false],"thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-24-at-1.45.41-PM-1-GEBd49-150x150.webp",150,150,true],"medium":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-24-at-1.45.41-PM-1-GEBd49-300x215.webp",300,215,true],"large":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-24-at-1.45.41-PM-1-GEBd49-1024x733.webp",1024,733,true],"1536x1536":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-24-at-1.45.41-PM-1-GEBd49-1536x1099.webp",1536,1099,true],"2048x2048":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-24-at-1.45.41-PM-1-GEBd49.webp",1562,1118,false],"trp-custom-language-flag":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-24-at-1.45.41-PM-1-GEBd49-18x12.webp",18,12,true],"woocommerce_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-24-at-1.45.41-PM-1-GEBd49-300x300.webp",300,300,true],"woocommerce_single":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-24-at-1.45.41-PM-1-GEBd49-600x429.webp",600,429,true],"woocommerce_gallery_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-24-at-1.45.41-PM-1-GEBd49-100x100.webp",100,100,true]},"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/th\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/th\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"Liquid AI released LFM2-VL-3B, a 3B parameter vision language model for image text to text tasks. It extends the LFM2-VL family beyond the 450M and 1.6B variants. The model targets higher accuracy while preserving the speed profile of the LFM2 architecture. It is available on LEAP and Hugging Face under the LFM Open License v1.0.&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts\/46839","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/comments?post=46839"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts\/46839\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/media\/46840"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/media?parent=46839"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/categories?post=46839"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/tags?post=46839"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}