{"id":100518,"date":"2026-06-28T18:37:16","date_gmt":"2026-06-28T18:37:16","guid":{"rendered":"https:\/\/youzum.net\/liquid-ai-ships-lfm2-5-230m-with-llama-cpp-mlx-vllm-sglang-and-onnx-support-for-on-device-inference\/"},"modified":"2026-06-28T18:37:16","modified_gmt":"2026-06-28T18:37:16","slug":"liquid-ai-ships-lfm2-5-230m-with-llama-cpp-mlx-vllm-sglang-and-onnx-support-for-on-device-inference","status":"publish","type":"post","link":"https:\/\/youzum.net\/es\/liquid-ai-ships-lfm2-5-230m-with-llama-cpp-mlx-vllm-sglang-and-onnx-support-for-on-device-inference\/","title":{"rendered":"Liquid AI Ships LFM2.5-230M with llama.cpp, MLX, vLLM, SGLang, and ONNX Support for On-Device Inference"},"content":{"rendered":"<p class=\"wp-block-paragraph\">Liquid AI shipped <strong><a href=\"https:\/\/huggingface.co\/LiquidAI\/LFM2.5-230M\" target=\"_blank\" rel=\"noreferrer noopener\">LFM2.5-230M<\/a><\/strong>, it\u2019s the company\u2019s smallest model to date. The release targets a specific job: running agentic tasks on phones, robots, and automation devices. Both the base and instruction-tuned checkpoints are open-weight on Hugging Face.<\/p>\n<p class=\"wp-block-paragraph\">The pitch is narrow on purpose. This is not a general reasoning model. It is built for data extraction and tool use on edge hardware.<\/p>\n<h2 class=\"wp-block-heading\"><strong>TL;DR<\/strong><\/h2>\n<ul class=\"wp-block-list\">\n<li>Liquid AI\u2019s LFM2.5-230M is its smallest model yet: 230M params, open-weight, built on LFM2.<\/li>\n<li>Runs on-device at 213 tok\/s on a Galaxy S25 Ultra and 42 on a Raspberry Pi 5.<\/li>\n<li>Beats larger models (Qwen3.5-0.8B, Gemma 3 1B) on instruction following and data extraction.<\/li>\n<li>Tuned for tool use and extraction; not for math, code generation, or creative writing.<\/li>\n<li>Day-one support across llama.cpp, MLX, vLLM, SGLang, and ONNX, with a 293\u2013375 MB footprint.<\/li>\n<\/ul>\n<h2 class=\"wp-block-heading\"><strong>What is LFM2.5-230M?<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">LFM2.5-230M is a 230-million-parameter, text-only model. It is built on the LFM2 architecture. The model has 14 layers total. Eight are double-gated LIV convolution blocks. The remaining six are grouped-query attention (GQA) blocks. The hybrid layout targets fast CPU inference.<\/p>\n<p class=\"wp-block-paragraph\">The context length is 32,768 tokens. The vocabulary size is 65,536. The knowledge cutoff is mid-2024. It supports ten languages, including English, Chinese, Arabic, and Japanese.<\/p>\n<p class=\"wp-block-paragraph\">Liquid AI team ships two checkpoints. LFM2.5-230M-Base is the pre-trained model for fine-tuning. LFM2.5-230M is the general-purpose instruction-tuned version. The license is lfm1.0.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Training and Post-Training<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">The model was pre-trained on 19 trillion tokens. That total includes a 32K context extension phase. The post-training recipe then runs in three stages.<\/p>\n<p class=\"wp-block-paragraph\">First comes supervised fine-tuning with distillation from the larger LFM2.5-350M. Second is direct preference optimization (DPO). Third is multi-domain reinforcement learning. This preserves flexibility for downstream specialization.<\/p>\n<p class=\"wp-block-paragraph\">The distillation step is what keeps a 230M model competitive with larger checkpoints. It inherits behavior from the bigger LFM2.5-350M on targeted tasks.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Benchmark<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">Liquid AI team evaluated LFM2.5-230M across ten benchmarks. They span knowledge, instruction following, data extraction, and tool use. <\/p>\n<p class=\"wp-block-paragraph\">The instruction-following results support that. On IFEval, LFM2.5-230M scores 71.71. That beats Qwen3.5-0.8B (59.94) and Gemma 3 1B IT (63.49). On IFBench it scores 38.40, ahead of both. On CaseReportBench, a clinical data-extraction test, it scores 22.51.<\/p>\n<figure class=\"wp-block-table is-style-stripes\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<th>Model<\/th>\n<th>Params<\/th>\n<th>IFEval<\/th>\n<th>IFBench<\/th>\n<th>CaseReportBench<\/th>\n<th>BFCLv4<\/th>\n<th>MMLU-Pro<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>LFM2.5-230M<\/strong><\/td>\n<td>230M<\/td>\n<td>71.71<\/td>\n<td>38.40<\/td>\n<td>22.51<\/td>\n<td>21.03<\/td>\n<td>20.25<\/td>\n<\/tr>\n<tr>\n<td>LFM2.5-350M<\/td>\n<td>350M<\/td>\n<td>76.96<\/td>\n<td>40.69<\/td>\n<td>32.45<\/td>\n<td>21.86<\/td>\n<td>20.01<\/td>\n<\/tr>\n<tr>\n<td>Granite 4.0-H-350M<\/td>\n<td>350M<\/td>\n<td>61.27<\/td>\n<td>17.22<\/td>\n<td>12.44<\/td>\n<td>13.28<\/td>\n<td>13.14<\/td>\n<\/tr>\n<tr>\n<td>Qwen3.5-0.8B (Instruct)<\/td>\n<td>800M<\/td>\n<td>59.94<\/td>\n<td>22.87<\/td>\n<td>13.83<\/td>\n<td>18.70<\/td>\n<td>37.42<\/td>\n<\/tr>\n<tr>\n<td>Gemma 3 1B IT<\/td>\n<td>1B<\/td>\n<td>63.49<\/td>\n<td>20.33<\/td>\n<td>2.28<\/td>\n<td>7.17<\/td>\n<td>14.04<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<p class=\"wp-block-paragraph\">LFM2.5-230M leads on instruction following and data extraction. It trails on broad knowledge: MMLU-Pro is 20.25, behind Qwen3.5-0.8B\u2019s 37.42. It is also weak on some agentic tool use. On \u03c4\u00b2-Bench Telecom it scores just 5.26.<\/p>\n<p class=\"wp-block-paragraph\">Liquid AI is direct about the limits. It does not recommend the model for reasoning-heavy workloads. That means advanced math, code generation, and creative writing.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Use Cases With Examples<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">The model fits two jobs well. <\/p>\n<ul class=\"wp-block-list\">\n<li>The first is large-scale data extraction pipelines. Picture a pipeline parsing 100,000 clinical reports into structured fields. A 4-bit build with a 293\u2013375 MB memory footprint runs that on commodity CPUs. You extract locally, with no per-token API bill.<\/li>\n<\/ul>\n<ul class=\"wp-block-list\">\n<li>The second job is lightweight on-device agentic workloads. Think a home automation hub that turns speech into tool calls. Or a phone assistant that routes a request to the right function.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">As an early signal, Liquid AI deployed the model on a Unitree G1 humanoid robot. It ran entirely on the robot\u2019s onboard NVIDIA Jetson Orin. There the model acted as a skill-selection layer. It turned one natural-language instruction into a sequence of tool calls. Those calls invoked low-level skills from NVIDIA\u2019s SONIC framework.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Tool Use: How It Works<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">LFM2.5 supports function calling in four steps. You define tools as JSON in the system prompt. The model writes a Pythonic function call between special tokens. You execute the call and return the result. The model then writes a plain-text answer.<\/p>\n<p class=\"wp-block-paragraph\">By default the call is a Python list. It sits between the <code>&lt;|tool_call_start|&gt;<\/code> and <code>&lt;|tool_call_end|&gt;<\/code> tokens. Here is the documented pattern, with the tool JSON abbreviated:<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">&lt;|im_start|&gt;system\nList of tools: [{\"name\": \"get_candidate_status\",\n  \"parameters\": {\"candidate_id\": {\"type\": \"string\"}}}]&lt;|im_end|&gt;\n&lt;|im_start|&gt;user\nWhat is the current status of candidate ID 12345?&lt;|im_end|&gt;\n&lt;|im_start|&gt;assistant\n&lt;|tool_call_start|&gt;[get_candidate_status(candidate_id=\"12345\")]&lt;|tool_call_end|&gt;Checking the current status of candidate ID 12345.&lt;|im_end|&gt;<\/code><\/pre>\n<\/div>\n<\/div>\n<p class=\"wp-block-paragraph\">You can also force JSON-formatted calls through the system prompt.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Running It: A Minimal Example<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">The model works with Transformers 5.0.0 and up. The recommended generation settings are temperature 0.1, top_k 50, and repetition_penalty 1.05. Note the <code>do_sample=True<\/code> flag, which is required for those sampling settings to apply.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">from transformers import AutoModelForCausalLM, AutoTokenizer\n\nmodel_id = \"LiquidAI\/LFM2.5-230M\"\nmodel = AutoModelForCausalLM.from_pretrained(\n    model_id,\n    device_map=\"auto\",\n    dtype=\"bfloat16\",\n)\ntokenizer = AutoTokenizer.from_pretrained(model_id)\n\ninputs = tokenizer.apply_chat_template(\n    [{\"role\": \"user\", \"content\": \"What is C. elegans?\"}],\n    add_generation_prompt=True,\n    tokenize=True,\n    return_dict=True,\n    return_tensors=\"pt\",\n).to(model.device)\n\noutput = model.generate(\n    **inputs,\n    do_sample=True,\n    temperature=0.1,\n    top_k=50,\n    repetition_penalty=1.05,\n    max_new_tokens=512,\n)\nprint(tokenizer.decode(output[0][inputs[\"input_ids\"].shape[-1]:], skip_special_tokens=True))<\/code><\/pre>\n<\/div>\n<\/div>\n<p class=\"wp-block-paragraph\">Liquid AI also publishes fine-tuning recipes. They cover SFT, DPO, and GRPO with LoRA, via Unsloth and TRL. Each ships as a Colab notebook.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Interactive Explainer<\/strong><\/h2>\n<p><!-- LFM2.5-230M Interactive Demo \u2014 paste into a WordPress \"Custom HTML\" block --><\/p>\n<p class=\"wp-block-paragraph\">\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<\/p><p class=\"wp-block-paragraph\">\n<\/p><p class=\"wp-block-paragraph\">Check out the\u00a0<strong><a href=\"https:\/\/huggingface.co\/LiquidAI\/LFM2.5-230M\" target=\"_blank\" rel=\"noreferrer noopener\">Model weight on HF<\/a><\/strong>, <strong><a href=\"https:\/\/www.liquid.ai\/blog\/lfm2-5-230m\" target=\"_blank\" rel=\"noreferrer noopener\">Technical details<\/a><\/strong> and<strong><a href=\"https:\/\/docs.liquid.ai\/lfm\/models\/complete-library\" target=\"_blank\" rel=\"noreferrer noopener\"> Docs<\/a><\/strong>.<strong>\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">150k+ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p class=\"wp-block-paragraph\">Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.?\u00a0<strong><a href=\"https:\/\/forms.gle\/wbash1wF6efRj8G58\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Connect with us<\/mark><\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/06\/27\/liquid-ai-ships-lfm2-5-230m-with-llama-cpp-mlx-vllm-sglang-and-onnx-support-for-on-device-inference\/\">Liquid AI Ships LFM2.5-230M with llama.cpp, MLX, vLLM, SGLang, and ONNX Support for On-Device Inference<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Liquid AI shipped LFM2.5-230M, it\u2019s the company\u2019s smallest model to date. The release targets a specific job: running agentic tasks on phones, robots, and automation devices. Both the base and instruction-tuned checkpoints are open-weight on Hugging Face. The pitch is narrow on purpose. This is not a general reasoning model. It is built for data extraction and tool use on edge hardware. TL;DR Liquid AI\u2019s LFM2.5-230M is its smallest model yet: 230M params, open-weight, built on LFM2. Runs on-device at 213 tok\/s on a Galaxy S25 Ultra and 42 on a Raspberry Pi 5. Beats larger models (Qwen3.5-0.8B, Gemma 3 1B) on instruction following and data extraction. Tuned for tool use and extraction; not for math, code generation, or creative writing. Day-one support across llama.cpp, MLX, vLLM, SGLang, and ONNX, with a 293\u2013375 MB footprint. What is LFM2.5-230M? LFM2.5-230M is a 230-million-parameter, text-only model. It is built on the LFM2 architecture. The model has 14 layers total. Eight are double-gated LIV convolution blocks. The remaining six are grouped-query attention (GQA) blocks. The hybrid layout targets fast CPU inference. The context length is 32,768 tokens. The vocabulary size is 65,536. The knowledge cutoff is mid-2024. It supports ten languages, including English, Chinese, Arabic, and Japanese. Liquid AI team ships two checkpoints. LFM2.5-230M-Base is the pre-trained model for fine-tuning. LFM2.5-230M is the general-purpose instruction-tuned version. The license is lfm1.0. Training and Post-Training The model was pre-trained on 19 trillion tokens. That total includes a 32K context extension phase. The post-training recipe then runs in three stages. First comes supervised fine-tuning with distillation from the larger LFM2.5-350M. Second is direct preference optimization (DPO). Third is multi-domain reinforcement learning. This preserves flexibility for downstream specialization. The distillation step is what keeps a 230M model competitive with larger checkpoints. It inherits behavior from the bigger LFM2.5-350M on targeted tasks. Benchmark Liquid AI team evaluated LFM2.5-230M across ten benchmarks. They span knowledge, instruction following, data extraction, and tool use. The instruction-following results support that. On IFEval, LFM2.5-230M scores 71.71. That beats Qwen3.5-0.8B (59.94) and Gemma 3 1B IT (63.49). On IFBench it scores 38.40, ahead of both. On CaseReportBench, a clinical data-extraction test, it scores 22.51. Model Params IFEval IFBench CaseReportBench BFCLv4 MMLU-Pro LFM2.5-230M 230M 71.71 38.40 22.51 21.03 20.25 LFM2.5-350M 350M 76.96 40.69 32.45 21.86 20.01 Granite 4.0-H-350M 350M 61.27 17.22 12.44 13.28 13.14 Qwen3.5-0.8B (Instruct) 800M 59.94 22.87 13.83 18.70 37.42 Gemma 3 1B IT 1B 63.49 20.33 2.28 7.17 14.04 LFM2.5-230M leads on instruction following and data extraction. It trails on broad knowledge: MMLU-Pro is 20.25, behind Qwen3.5-0.8B\u2019s 37.42. It is also weak on some agentic tool use. On \u03c4\u00b2-Bench Telecom it scores just 5.26. Liquid AI is direct about the limits. It does not recommend the model for reasoning-heavy workloads. That means advanced math, code generation, and creative writing. Use Cases With Examples The model fits two jobs well. The first is large-scale data extraction pipelines. Picture a pipeline parsing 100,000 clinical reports into structured fields. A 4-bit build with a 293\u2013375 MB memory footprint runs that on commodity CPUs. You extract locally, with no per-token API bill. The second job is lightweight on-device agentic workloads. Think a home automation hub that turns speech into tool calls. Or a phone assistant that routes a request to the right function. As an early signal, Liquid AI deployed the model on a Unitree G1 humanoid robot. It ran entirely on the robot\u2019s onboard NVIDIA Jetson Orin. There the model acted as a skill-selection layer. It turned one natural-language instruction into a sequence of tool calls. Those calls invoked low-level skills from NVIDIA\u2019s SONIC framework. Tool Use: How It Works LFM2.5 supports function calling in four steps. You define tools as JSON in the system prompt. The model writes a Pythonic function call between special tokens. You execute the call and return the result. The model then writes a plain-text answer. By default the call is a Python list. It sits between the &lt;|tool_call_start|&gt; and &lt;|tool_call_end|&gt; tokens. Here is the documented pattern, with the tool JSON abbreviated: Copy CodeCopiedUse a different Browser &lt;|im_start|&gt;system List of tools: [{&#8220;name&#8221;: &#8220;get_candidate_status&#8221;, &#8220;parameters&#8221;: {&#8220;candidate_id&#8221;: {&#8220;type&#8221;: &#8220;string&#8221;}}}]&lt;|im_end|&gt; &lt;|im_start|&gt;user What is the current status of candidate ID 12345?&lt;|im_end|&gt; &lt;|im_start|&gt;assistant &lt;|tool_call_start|&gt;[get_candidate_status(candidate_id=&#8221;12345&#8243;)]&lt;|tool_call_end|&gt;Checking the current status of candidate ID 12345.&lt;|im_end|&gt; You can also force JSON-formatted calls through the system prompt. Running It: A Minimal Example The model works with Transformers 5.0.0 and up. The recommended generation settings are temperature 0.1, top_k 50, and repetition_penalty 1.05. Note the do_sample=True flag, which is required for those sampling settings to apply. Copy CodeCopiedUse a different Browser from transformers import AutoModelForCausalLM, AutoTokenizer model_id = &#8220;LiquidAI\/LFM2.5-230M&#8221; model = AutoModelForCausalLM.from_pretrained( model_id, device_map=&#8221;auto&#8221;, dtype=&#8221;bfloat16&#8243;, ) tokenizer = AutoTokenizer.from_pretrained(model_id) inputs = tokenizer.apply_chat_template( [{&#8220;role&#8221;: &#8220;user&#8221;, &#8220;content&#8221;: &#8220;What is C. elegans?&#8221;}], add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors=&#8221;pt&#8221;, ).to(model.device) output = model.generate( **inputs, do_sample=True, temperature=0.1, top_k=50, repetition_penalty=1.05, max_new_tokens=512, ) print(tokenizer.decode(output[0][inputs[&#8220;input_ids&#8221;].shape[-1]:], skip_special_tokens=True)) Liquid AI also publishes fine-tuning recipes. They cover SFT, DPO, and GRPO with LoRA, via Unsloth and TRL. Each ships as a Colab notebook. Interactive Explainer Check out the\u00a0Model weight on HF, Technical details and Docs.\u00a0Also,\u00a0feel free to follow us on\u00a0Twitter\u00a0and don\u2019t forget to join our\u00a0150k+ML SubReddit\u00a0and Subscribe to\u00a0our Newsletter. Wait! are you on telegram?\u00a0now you can join us on telegram as well. Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.?\u00a0Connect with us The post Liquid AI Ships LFM2.5-230M with llama.cpp, MLX, vLLM, SGLang, and ONNX Support for On-Device Inference appeared first on MarkTechPost.<\/p>","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-100518","post","type-post","status-publish","format-standard","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Liquid AI Ships LFM2.5-230M with llama.cpp, MLX, vLLM, SGLang, and ONNX Support for On-Device Inference - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/es\/liquid-ai-ships-lfm2-5-230m-with-llama-cpp-mlx-vllm-sglang-and-onnx-support-for-on-device-inference\/\" \/>\n<meta property=\"og:locale\" content=\"es_ES\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Liquid AI Ships LFM2.5-230M with llama.cpp, MLX, vLLM, SGLang, and ONNX Support for On-Device Inference - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/es\/liquid-ai-ships-lfm2-5-230m-with-llama-cpp-mlx-vllm-sglang-and-onnx-support-for-on-device-inference\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2026-06-28T18:37:16+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Escrito por\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Tiempo de lectura\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutos\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/liquid-ai-ships-lfm2-5-230m-with-llama-cpp-mlx-vllm-sglang-and-onnx-support-for-on-device-inference\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/liquid-ai-ships-lfm2-5-230m-with-llama-cpp-mlx-vllm-sglang-and-onnx-support-for-on-device-inference\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"Liquid AI Ships LFM2.5-230M with llama.cpp, MLX, vLLM, SGLang, and ONNX Support for On-Device Inference\",\"datePublished\":\"2026-06-28T18:37:16+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/liquid-ai-ships-lfm2-5-230m-with-llama-cpp-mlx-vllm-sglang-and-onnx-support-for-on-device-inference\/\"},\"wordCount\":866,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"es\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/liquid-ai-ships-lfm2-5-230m-with-llama-cpp-mlx-vllm-sglang-and-onnx-support-for-on-device-inference\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/liquid-ai-ships-lfm2-5-230m-with-llama-cpp-mlx-vllm-sglang-and-onnx-support-for-on-device-inference\/\",\"url\":\"https:\/\/youzum.net\/liquid-ai-ships-lfm2-5-230m-with-llama-cpp-mlx-vllm-sglang-and-onnx-support-for-on-device-inference\/\",\"name\":\"Liquid AI Ships LFM2.5-230M with llama.cpp, MLX, vLLM, SGLang, and ONNX Support for On-Device Inference - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"datePublished\":\"2026-06-28T18:37:16+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/liquid-ai-ships-lfm2-5-230m-with-llama-cpp-mlx-vllm-sglang-and-onnx-support-for-on-device-inference\/#breadcrumb\"},\"inLanguage\":\"es\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/liquid-ai-ships-lfm2-5-230m-with-llama-cpp-mlx-vllm-sglang-and-onnx-support-for-on-device-inference\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/liquid-ai-ships-lfm2-5-230m-with-llama-cpp-mlx-vllm-sglang-and-onnx-support-for-on-device-inference\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Liquid AI Ships LFM2.5-230M with llama.cpp, MLX, vLLM, SGLang, and ONNX Support for On-Device Inference\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"es\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"es\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"es\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/es\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Liquid AI Ships LFM2.5-230M with llama.cpp, MLX, vLLM, SGLang, and ONNX Support for On-Device Inference - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/es\/liquid-ai-ships-lfm2-5-230m-with-llama-cpp-mlx-vllm-sglang-and-onnx-support-for-on-device-inference\/","og_locale":"es_ES","og_type":"article","og_title":"Liquid AI Ships LFM2.5-230M with llama.cpp, MLX, vLLM, SGLang, and ONNX Support for On-Device Inference - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/es\/liquid-ai-ships-lfm2-5-230m-with-llama-cpp-mlx-vllm-sglang-and-onnx-support-for-on-device-inference\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2026-06-28T18:37:16+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Escrito por":"admin NU","Tiempo de lectura":"5 minutos"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/liquid-ai-ships-lfm2-5-230m-with-llama-cpp-mlx-vllm-sglang-and-onnx-support-for-on-device-inference\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/liquid-ai-ships-lfm2-5-230m-with-llama-cpp-mlx-vllm-sglang-and-onnx-support-for-on-device-inference\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"Liquid AI Ships LFM2.5-230M with llama.cpp, MLX, vLLM, SGLang, and ONNX Support for On-Device Inference","datePublished":"2026-06-28T18:37:16+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/liquid-ai-ships-lfm2-5-230m-with-llama-cpp-mlx-vllm-sglang-and-onnx-support-for-on-device-inference\/"},"wordCount":866,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"es","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/liquid-ai-ships-lfm2-5-230m-with-llama-cpp-mlx-vllm-sglang-and-onnx-support-for-on-device-inference\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/liquid-ai-ships-lfm2-5-230m-with-llama-cpp-mlx-vllm-sglang-and-onnx-support-for-on-device-inference\/","url":"https:\/\/youzum.net\/liquid-ai-ships-lfm2-5-230m-with-llama-cpp-mlx-vllm-sglang-and-onnx-support-for-on-device-inference\/","name":"Liquid AI Ships LFM2.5-230M with llama.cpp, MLX, vLLM, SGLang, and ONNX Support for On-Device Inference - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"datePublished":"2026-06-28T18:37:16+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/liquid-ai-ships-lfm2-5-230m-with-llama-cpp-mlx-vllm-sglang-and-onnx-support-for-on-device-inference\/#breadcrumb"},"inLanguage":"es","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/liquid-ai-ships-lfm2-5-230m-with-llama-cpp-mlx-vllm-sglang-and-onnx-support-for-on-device-inference\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/liquid-ai-ships-lfm2-5-230m-with-llama-cpp-mlx-vllm-sglang-and-onnx-support-for-on-device-inference\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"Liquid AI Ships LFM2.5-230M with llama.cpp, MLX, vLLM, SGLang, and ONNX Support for On-Device Inference"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"es"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"es","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"es","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/es\/members\/adminnu\/"}]}},"rttpg_featured_image_url":null,"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/es\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/es\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/es\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/es\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/es\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"Liquid AI shipped LFM2.5-230M, it\u2019s the company\u2019s smallest model to date. The release targets a specific job: running agentic tasks on phones, robots, and automation devices. Both the base and instruction-tuned checkpoints are open-weight on Hugging Face. The pitch is narrow on purpose. This is not a general reasoning model. It is built for data&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/posts\/100518","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/comments?post=100518"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/posts\/100518\/revisions"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/media?parent=100518"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/categories?post=100518"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/es\/wp-json\/wp\/v2\/tags?post=100518"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}