YouZum

Committee

AI, Committee, 新闻, Uncategorized

Sentient AI Releases ROMA: An Open-Source and AGI Focused Meta-Agent Framework for Building AI Agents with Hierarchical Task Execution

Sentient AI has released ROMA (Recursive Open Meta-Agent), an open-source meta-agent framework for building high-performance multi-agent systems. ROMA structures agentic workflows as a hierarchical, recursive task tree: parent nodes break a complex goal into subtasks, pass them down to child nodes as context, and later aggregate their solutions as results flow back up—making the context flow transparent and fully traceable across node transitions. Architecture: Atomize → Plan → Execute → Aggregate ROMA defines a minimal, recursive control loop. A node first atomizes a request (atomic or not). If non-atomic, a planner decomposes it into subtasks; otherwise, an executor runs the task via an LLM, a tool/API, or even a nested agent. An aggregator then merges child outputs into the parent’s answer. This decision loop repeats for each subtask, producing a dependency-aware tree that executes independent branches in parallel and enforces left-to-right ordering when a subtask depends on a previous sibling. https://blog.sentient.xyz/posts/recursive-open-meta-agent Information moves top-down as tasks are broken down and bottom-up as results are aggregated. ROMA also permits human checkpoints at any node (e.g., to confirm a plan or fact-check a critical hop) and surfaces stage tracing—inputs/outputs per node—so developers can debug and refine prompts, tools, and routing policies with visibility into every transition. This addresses the common observability gap in agent frameworks. Developer Surface and Stack ROMA provides a setup.sh quick start with Docker Setup (Recommended) or Native Setup, plus flags for E2B sandbox integration (–e2b, –test-e2b). The stack lists Backend: Python 3.12+ with FastAPI/Flask, Frontend: React + TypeScript with real-time WebSocket, LLM Support: any provider via LiteLLM, and Code Execution: E2B sandboxes. Data paths support enterprise S3 mounting with goofys FUSE, path-injection checks, and secure AWS credential handling, keeping leaf skills swappable while the meta-architecture manages the task graph and dependencies. In development, you can wire ROMA to closed or open LLMs, local models, deterministic tools, or other agents without touching the meta-layer; inputs/outputs are defined with Pydantic for structured, auditable I/O during runs and tracing. Why the Recursion Matters? ROMA structures work as a hierarchical, recursive task tree: parent nodes break a complex goal into subtasks, pass them down as context, and later aggregate child solutions as results flow back up. This recursive breakdown confines context to what each node requires, curbing prompt sprawl, while stage-level tracing (with structured Pydantic I/O) makes the flow transparent and fully traceable, so failures are diagnosable rather than black-box. Independent siblings can run in parallel and dependency edges impose sequencing, turning model/prompt/tool choices into controlled, observable components within the plan-execute-aggregate loop. Benchmarks: ROMA Search To validate the architecture, Sentient built ROMA Search, an internet search agent implemented on the ROMA scaffold (no domain-specific “deep research” heuristics claimed). On SEALQA (Seal-0)—a subset designed to stress multi-source reasoning—ROMA Search is reported at 45.6% accuracy, exceeding Kimi Researcher (36%) and Gemini 2.5 Pro (19.8%). The ROMA also reports state-of-the-art on FRAMES (multi-step reasoning) and near-SOTA on SimpleQA (factual retrieval). As with all vendor-published results, treat these as directional until independently reproduced, but they show the architecture is competitive across reasoning-heavy and fact-centric tasks. https://blog.sentient.xyz/posts/recursive-open-meta-agent https://blog.sentient.xyz/posts/recursive-open-meta-agent https://blog.sentient.xyz/posts/recursive-open-meta-agent For additional context on SEALQA, the benchmark targets search-augmented reasoning where web results can be conflicting or noisy. Seal-0 focuses on questions that challenge current systems, aligning with ROMA’s emphasis on robust decomposition and verification steps. Where ROMA Fits? ROMA positions itself as the backbone for open-source meta-agents: it provides a hierarchical, recursive task tree in which parent nodes decompose goals into subtasks, pass context down to child nodes (agents/tools), and later aggregate results as they flow back up. The design emphasizes transparency via stage tracing and supports human-in-the-loop checkpoints, while its modular nodes let builders plug in any model, tool, or agent and exploit parallelization for independent branches. This makes multi-step workloads—ranging from financial analysis to creative generation—easier to engineer with explicit context flow and observable execution. Editorial Comments ROMA is not another “agent wrapper,” but it looks like a disciplined recursive scaffold: Atomizer → Planner → Executor → Aggregator, traced at every hop, parallel where safe, sequential where required. The early ROMA Search results are promising and align with the framework’s goals, but the more important outcome is developer control—clear task graphs, typed interfaces, and transparent context flow—so teams can iterate quickly and verify each stage. With Apache-2.0 licensing and an implementation that already includes FastAPI/React tooling, LiteLLM integration, and sandboxed execution paths, ROMA is a practical base for building long-horizon agent systems with measurable, inspectable behavior. Check out the Codes and Technical Details.. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well. The post Sentient AI Releases ROMA: An Open-Source and AGI Focused Meta-Agent Framework for Building AI Agents with Hierarchical Task Execution appeared first on MarkTechPost.

Sentient AI Releases ROMA: An Open-Source and AGI Focused Meta-Agent Framework for Building AI Agents with Hierarchical Task Execution Read Post »

AI, Committee, 新闻, Uncategorized

Agentic Context Engineering (ACE): Self-Improving LLMs via Evolving Contexts, Not Fine-Tuning

TL;DR: A team of researchers from Stanford University, SambaNova Systems and UC Berkeley introduce ACE framework that improves LLM performance by editing and growing the input context instead of updating model weights. Context is treated as a living “playbook” maintained by three roles—Generator, Reflector, Curator—with small delta items merged incrementally to avoid brevity bias and context collapse. Reported gains: +10.6% on AppWorld agent tasks, +8.6% on finance reasoning, and ~86.9% average latency reduction vs strong context-adaptation baselines. On the AppWorld leaderboard snapshot (Sept 20, 2025), ReAct+ACE (59.4%) ≈ IBM CUGA (60.3%, GPT-4.1) while using DeepSeek-V3.1. https://arxiv.org/pdf/2510.04618 What ACE changes? ACE positions “context engineering” as a first-class alternative to parameter updates. Instead of compressing instructions into short prompts, ACE accumulates and organizes domain-specific tactics over time, arguing that higher context density improves agentic tasks where tools, multi-turn state, and failure modes matter. Method: Generator → Reflector → Curator Generator executes tasks and produces trajectories (reasoning/tool calls), exposing helpful vs harmful moves. Reflector distills concrete lessons from those traces. Curator converts lessons into typed delta items (with helpful/harmful counters) and merges them deterministically, with de-duplication and pruning to keep the playbook targeted. Two design choices—incremental delta updates and grow-and-refine—preserve useful history and prevent “context collapse” from monolithic rewrites. To isolate context effects, the research team fixes the same base LLM (non-thinking DeepSeek-V3.1) across all three roles. Benchmarks AppWorld (agents): Built on the official ReAct baseline, ReAct+ACE outperforms strong baselines (ICL, GEPA, Dynamic Cheatsheet), with +10.6% average over selected baselines and ~+7.6% over Dynamic Cheatsheet in online adaptation. On the Sept 20, 2025 leaderboard, ReAct+ACE 59.4% vs IBM CUGA 60.3% (GPT-4.1); ACE surpasses CUGA on the harder test-challenge split, while using a smaller open-source base model. Finance (XBRL): On FiNER token tagging and XBRL Formula numerical reasoning, ACE reports +8.6% average over baselines with ground-truth labels for offline adaptation; it also works with execution-only feedback, though quality of signals matters. https://arxiv.org/pdf/2510.04618 https://arxiv.org/pdf/2510.04618 Cost and latency ACE’s non-LLM merges plus localized updates reduce adaptation overhead substantially: Offline (AppWorld): −82.3% latency and −75.1% rollouts vs GEPA. Online (FiNER): −91.5% latency and −83.6% token cost vs Dynamic Cheatsheet. https://arxiv.org/pdf/2510.04618 Key Takeaways ACE = context-first adaptation: Improves LLMs by incrementally editing an evolving “playbook” (delta items) curated by Generator→Reflector→Curator, using the same base LLM (non-thinking DeepSeek-V3.1) to isolate context effects and avoid collapse from monolithic rewrites. Measured gains: ReAct+ACE reports +10.6% over strong baselines on AppWorld and achieves 59.4% vs IBM CUGA 60.3% (GPT-4.1) on the Sept 20, 2025 leaderboard snapshot; finance benchmarks (FiNER + XBRL Formula) show +8.6% average over baselines. Lower overhead than reflective-rewrite baselines: ACE reduces adaptation latency by ~82–92% and rollouts/token cost by ~75–84%, contrasting with Dynamic Cheatsheet’s persistent memory and GEPA’s Pareto prompt evolution approaches. Conclusion ACE positions context engineering as a first-class alternative to weight updates: maintain a persistent, curated playbook that accumulates task-specific tactics, yielding measurable gains on AppWorld and finance reasoning while cutting adaptation latency and token rollouts versus reflective-rewrite baselines. The approach is practical—deterministic merges, delta items, and long-context–aware serving—and its limits are clear: outcomes track feedback quality and task complexity. If adopted, agent stacks may “self-tune” primarily through evolving context rather than new checkpoints. Check out the PAPER here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well. The post Agentic Context Engineering (ACE): Self-Improving LLMs via Evolving Contexts, Not Fine-Tuning appeared first on MarkTechPost.

Agentic Context Engineering (ACE): Self-Improving LLMs via Evolving Contexts, Not Fine-Tuning Read Post »

AI, Committee, 新闻, Uncategorized

The Download: our bodies’ memories, and Traton’s electric trucks

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. How do our bodies remember? “Like riding a bike” is shorthand for the remarkable way that our bodies remember how to move. Most of the time when we talk about muscle memory, we’re not talking about the muscles themselves but about the memory of a coordinated movement pattern that lives in the motor neurons, which control our muscles. Yet in recent years, scientists have discovered that our muscles themselves have a memory for movement and exercise. And the more we move, as with riding a bike or other kinds of exercise, the more those cells begin to make a memory of that exercise. Read the full story. —Bonnie Tsui This piece is part of MIT Technology Review Explains: our series untangling the complex, messy world of technology to help you understand what’s coming next. You can read more from the series here. This story is also from our forthcoming print issue, which is all about the body. If you haven’t already, subscribe now to receive future issues once they land. Plus, you’ll also receive a free digital report on nuclear power. 2025 climate tech companies to watch: Traton and its electric trucks Every day, trucks carry many millions of tons of cargo down roads and highways around the world. Nearly all run on diesel and make up one of the largest commercial sources of carbon emissions. Traton, a subsidiary of Volkswagen, is producing zero-emission trucks that could help clean up this sector, while also investing in a Europe-wide advanced charging network so other manufacturers can more easily follow suit. Read the full story. —Amy Nordrum Traton is one of our 10 climate tech companies to watch—our annual list of some of the most promising climate tech firms on the planet. Check out the rest of the list here. This test could reveal the health of your immune system We know surprisingly little about our immune health. The vast array of cells, proteins, and biomolecules that works to defend us from disease is mind-bogglingly complicated. Immunologists are still getting to grips with how it all works. Now, a new test is being developed to measure immune health, one that even gives you a score. But that’s a difficult thing to do, for several reasons. Read the full story. —Jessica Hamzelou This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, sign up here. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 China is cracking down on imports of Nvidia’s AI chips Customs officers are combing shipments looking for the company’s China-specific chips. (FT $)+ US officials are investigating a firm that’s suspected of helping China sidestep export restrictions. (NYT $) 2 Tesla’s ‘full self-driving’ feature is under investigationAfter multiple reports of vehicles using it ran red lights. (WP $)+ The company is slashing its prices to compete with Chinese giant BYD. (Rest of World)+ Elon Musk will still receive billions, even if he fails to achieve his ambitions goals. (Reuters) 3 A data hoarder has created a searchable database of Epstein filesMaking it simple to find mentions of specific people and locations. (404 Media) 4 OpenAI says GPT-5 is its least-biased model yetEven when proceeding with “challenging, emotionally charged prompts.” (Axios) 5 The developers behind ICE-tracking apps aren’t giving upThey’re fighting Apple’s decision to remove their creations from its app store. (Wired $)+ Another effort to track ICE raids was just taken offline. (MIT Technology Review) 6 The world’s biodiversity crisis is worseningMore than half of all bird species are in decline. (The Guardian)+ The short, strange history of gene de-extinction. (MIT Technology Review) 7 YouTube is extending an olive branch to banned creatorsIt’s overturned a lifetime ban policy to give the people behind previously-banned channels a second chance. (CNBC)+ But users kicked off for copyright infringement or extremism aren’t eligible. (Bloomberg $) 8 This startup wants to bring self-flying planes to our skies  Starting with military cargo flights. (WSJ $) 9 Your plumber might be using ChatGPTThey’re increasingly using the chatbot to troubleshoot on the ground. (CNN) 10 Do robots really need hands?Maybe not, but that’s not standing in the way of researchers trying to recreate them. (Fast Company $)+ Will we ever trust robots? (MIT Technology Review) Quote of the day “Social media is a complete dumpster.” —Hany Farid, a professor of computer science at the University of California, Berkeley, describes the proliferation of AI slop videos infiltrating digital platforms to the New York Times. One more thing Who gets to decide who receives experimental medical treatments? There has been a trend toward lowering the bar for new medicines, and it is becoming easier for people to access treatments that might not help them—and could even harm them. Anecdotes appear to be overpowering evidence in decisions on drug approval. As a result, we’re ending up with some drugs that don’t work. We urgently need to question how these decisions are made. Who should have access to experimental therapies? And who should get to decide? Such questions are especially pressing considering how quickly biotechnology is advancing. We’re not just improving on existing classes of treatments—we’re creating entirely new ones. Read the full story. —Jessica Hamzelou We can still have nice things A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or skeet ’em at me.) + I love this crowd-sourced compendium of every known Wilhelm scream in all sorts of media.+ Happy birthday to pocket rocket Bruno Mars, who turned 40 this week.+ Here’s how to visit an interstellar interloper.+ Bumi the penguin is having the absolute time of their life with this bubble machine

The Download: our bodies’ memories, and Traton’s electric trucks Read Post »

AI, Committee, 新闻, Uncategorized

Together AI’s ATLAS adaptive speculator delivers 400% inference speedup by learning from workloads in real-time

Enterprises expanding AI deployments are hitting an invisible performance wall. The culprit? Static speculators that can’t keep up with shifting workloads. Speculators are smaller AI models that work alongside large language models during inference. They draft multiple tokens ahead, which the main model then verifies in parallel. This technique (called speculative decoding) has become essential for enterprises trying to reduce inference costs and latency. Instead of generating tokens one at a time, the system can accept multiple tokens at once, dramatically improving throughput. Together AI today announced research and a new system called ATLAS (AdapTive-LeArning Speculator System) that aims to help enterprises overcome the challenge of static speculators. The technique provides a self-learning inference optimization capability that can help to deliver up to 400% faster inference performance than a baseline level of performance available in existing inference technologies such as vLLM.. The system addresses a critical problem: as AI workloads evolve, inference speeds degrade, even with specialized speculators in place. The company which got its start in 2023, has been focused on optimizing inference on its enterprise AI platform. Earlier this year the company raised $305 million as customer adoption and demand has grown. “Companies we work with generally, as they scale up, they see shifting workloads, and then they don’t see as much speedup from speculative execution as before,” Tri Dao, chief scientist at Together AI, told VentureBeat in an exclusive interview. “These speculators generally don’t work well when their workload domain starts to shift.” The workload drift problem no one talks about Most speculators in production today are “static” models. They’re trained once on a fixed dataset representing expected workloads, then deployed without any ability to adapt. Companies like Meta and Mistral ship pre-trained speculators alongside their main models. Inference platforms like vLLM use these static speculators to boost throughput without changing output quality. But there’s a catch. When an enterprise’s AI usage evolves the static speculator’s accuracy plummets. “If you’re a company producing coding agents, and most of your developers have been writing in Python, all of a sudden some of them switch to writing Rust or C, then you see the speed starts to go down,” Dao explained. “The speculator has a mismatch between what it was trained on versus what the actual workload is.” This workload drift represents a hidden tax on scaling AI. Enterprises either accept degraded performance or invest in retraining custom speculators. That process captures only a snapshot in time and quickly becomes outdated. How adaptive speculators work: A dual-model approach ATLAS uses a dual-speculator architecture that combines stability with adaptation: The static speculator – A heavyweight model trained on broad data provides consistent baseline performance. It serves as a “speed floor.” The adaptive speculator – A lightweight model learns continuously from live traffic. It specializes on-the-fly to emerging domains and usage patterns. The confidence-aware controller – An orchestration layer dynamically chooses which speculator to use. It adjusts the speculation “lookahead” based on confidence scores. “Before the adaptive speculator learns anything, we still have the static speculator to help provide the speed boost in the beginning,” Ben Athiwaratkun, staff AI scientist at Together AI explained to VentureBeat. “Once the adaptive speculator becomes more confident, then the speed grows over time.” The technical innovation lies in balancing acceptance rate (how often the target model agrees with drafted tokens) and draft latency. As the adaptive model learns from traffic patterns, the controller relies more on the lightweight speculator and extends lookahead. This compounds performance gains. Users don’t need to tune any parameters. “On the user side, users don’t have to turn any knobs,” Dao said. “On our side, we have turned these knobs for users to adjust in a configuration that gets good speedup.” Performance that rivals custom silicon Together AI’s testing shows ATLAS reaching 500 tokens per second on DeepSeek-V3.1 when fully adapted. More impressively, those numbers on Nvidia B200 GPUs match or exceed specialized inference chips like Groq’s custom hardware. “The software and algorithmic improvement is able to close the gap with really specialized hardware,” Dao said. “We were seeing 500 tokens per second on these huge models that are even faster than some of the customized chips.” The 400% speedup that the company claims for inference represents the cumulative effect of Together’s Turbo optimization suite. FP4 quantization delivers 80% speedup over FP8 baseline. The static Turbo Speculator adds another 80-100% gain. The adaptive system layers on top. Each optimization compounds the benefits of the others. Compared to standard inference engines like vLLM or Nvidia’s TensorRT-LLM, the improvement is substantial. Together AI benchmarks against the stronger baseline between the two for each workload before applying speculative optimizations. The memory-compute tradeoff explained The performance gains stem from exploiting a fundamental inefficiency in modern inference: wasted compute capacity. Dao explained that typically during inference, much of the compute power is not fully utilized. “During inference, which is actually the dominant workload nowadays, you’re mostly using the memory subsystem,” he said. Speculative decoding trades idle compute for reduced memory access. When a model generates one token at a time, it’s memory-bound. The GPU sits idle while waiting for memory. But when the speculator proposes five tokens and the target model verifies them simultaneously, compute utilization spikes while memory access remains roughly constant. “The total amount of compute to generate five tokens is the same, but you only had to access memory once, instead of five times,” Dao said. Think of it as intelligent caching for AI For infrastructure teams familiar with traditional database optimization, adaptive speculators function like an intelligent caching layer, but with a crucial difference. Traditional caching systems like Redis or memcached require exact matches. You store the exact same query result and retrieve it when that specific query runs again. Adaptive speculators work differently. “You can view it as an intelligent way of caching, not storing exactly, but figuring out some patterns that you see,” Dao explained. “Broadly, we’re observing that you’re working with similar code, or working with similar, you

Together AI’s ATLAS adaptive speculator delivers 400% inference speedup by learning from workloads in real-time Read Post »

AI, Committee, 新闻, Uncategorized

Liquid AI Releases LFM2-8B-A1B: An On-Device Mixture-of-Experts with 8.3B Params and a 1.5B Active Params per Token

How much capability can a sparse 8.3B-parameter MoE with a ~1.5B active path deliver on your phone without blowing latency or memory? Liquid AI has released LFM2-8B-A1B, a small-scale Mixture-of-Experts (MoE) model built for on-device execution under tight memory, latency, and energy budgets. Unlike most MoE work optimized for cloud batch serving, LFM2-8B-A1B targets phones, laptops, and embedded systems. It showcases 8.3B total parameters but activates only ~1.5B parameters per token, using sparse expert routing to preserve a small compute path while increasing representational capacity. The model is released under the LFM Open License v1.0 (lfm1.0) Understanding the Architecture LFM2-8B-A1B retains the LFM2 ‘fast backbone’ and inserts sparse-MoE feed-forward blocks to lift capacity without materially increasing the active compute. The backbone uses 18 gated short-convolution blocks and 6 grouped-query attention (GQA) blocks. All layers except the first two include an MoE block; the first two remain dense for stability. Each MoE block defines 32 experts; the router selects top-4 experts per token with a normalized-sigmoid gate and adaptive routing bias to balance load and stabilize training. Context length is 32,768 tokens; vocabulary size 65,536; reported pre-training budget ~12T tokens. This approach keeps per-token FLOPs and cache growth bounded by the active path (attention + four expert MLPs), while total capacity allows specialization across domains such as multilingual knowledge, math, and code—use cases that often regress on very small dense models. https://www.liquid.ai/blog/lfm2-8b-a1b-an-efficient-on-device-mixture-of-experts Performance signals Liquid AI reports that LFM2-8B-A1B runs significantly faster than Qwen3-1.7B under CPU tests using an internal XNNPACK-based stack and a custom CPU MoE kernel. The public plots cover int4 quantization with int8 dynamic activations on AMD Ryzen AI 9 HX370 and Samsung Galaxy S24 Ultra. The Liquid AI team positions quality as comparable to 3–4B dense models, while keeping the active compute near 1.5B. No cross-vendor “×-faster” headline multipliers are published; the claims are framed as per-device comparisons versus similarly active models. On accuracy, the model card lists results across 16 benchmarks, including MMLU/MMLU-Pro/GPQA (knowledge), IFEval/IFBench/Multi-IF (instruction following), GSM8K/GSMPlus/MATH500/MATH-Lvl-5 (math), and MGSM/MMMLU (multilingual). The numbers indicate competitive instruction-following and math performance within the small-model band, and improved knowledge capacity relative to LFM2-2.6B, consistent with the larger total parameter budget. https://www.liquid.ai/blog/lfm2-8b-a1b-an-efficient-on-device-mixture-of-experts https://www.liquid.ai/blog/lfm2-8b-a1b-an-efficient-on-device-mixture-of-experts Deployment and tooling LFM2-8B-A1B ships with Transformers/vLLM for GPU inference and GGUF builds for llama.cpp; the official GGUF repo lists common quants from Q4_0 ≈4.7 GB up to F16 ≈16.7 GB for local runs, while llama.cpp requires a recent build with lfm2moe support (b6709+) to avoid “unknown model architecture” errors. Liquid’s CPU validation uses Q4_0 with int8 dynamic activations on AMD Ryzen AI 9 HX370 and Samsung Galaxy S24 Ultra, where LFM2-8B-A1B shows higher decode throughput than Qwen3-1.7B at a similar active-parameter class; ExecuTorch is referenced for mobile/embedded CPU deployment. https://www.liquid.ai/blog/lfm2-8b-a1b-an-efficient-on-device-mixture-of-experts https://www.liquid.ai/blog/lfm2-8b-a1b-an-efficient-on-device-mixture-of-experts Key Takeaways Architecture & routing: LFM2-8B-A1B pairs an LFM2 fast backbone (18 gated short-conv blocks + 6 GQA blocks) with per-layer sparse-MoE FFNs (all layers except the first two), using 32 experts with top-4 routing via normalized-sigmoid gating and adaptive biases; 8.3B total params, ~1.5B active per token. On-device target: Designed for phones, laptops, and embedded CPUs/GPUs; quantized variants “fit comfortably” on high-end consumer hardware for private, low-latency use. Performance positioning. Liquid reports LFM2-8B-A1B is significantly faster than Qwen3-1.7B in CPU tests and aims for 3–4B dense-class quality while keeping an ~1.5B active path. Editorial Comments LFM2-8B-A1B demonstrates that sparse MoE can be practical below the usual server-scale regime. The model combines an LFM2 conv-attention backbone with per-layer expert MLPs (except the first two layers) to keep token compute near 1.5B while lifting quality toward 3–4B dense classes. With standard and GGUF weights, llama.cpp/ExecuTorch/vLLM paths, and a permissive on-device posture, LFM2-8B-A1B is a concrete option for building low-latency, private assistants and application-embedded copilots on consumer and edge hardware. Check out the Model on Hugging Face and Technical details. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well. The post Liquid AI Releases LFM2-8B-A1B: An On-Device Mixture-of-Experts with 8.3B Params and a 1.5B Active Params per Token appeared first on MarkTechPost.

Liquid AI Releases LFM2-8B-A1B: An On-Device Mixture-of-Experts with 8.3B Params and a 1.5B Active Params per Token Read Post »

AI, Committee, 新闻, Uncategorized

MacroBench: A Novel Testbed for Web Automation Scripts via Large Language Models

arXiv:2510.04363v2 Announce Type: replace-cross Abstract: We introduce MacroBench, a code-first benchmark that evaluates whether LLMs can synthesize reusable browser-automation programs (macros) from natural-language goals by reading HTML/DOM and emitting Selenium. MacroBench instantiates seven self-hosted sites covering 681 tasks across interaction complexity and targeting difficulty. Our end-to-end protocol validates generated code via static checks, sandboxed execution, and outcome verification (DOM assertions, database snapshots), and includes a safety suite for scraping, spam/abuse, and credential/privacy prompts. Across 2,636 model-task runs, we observe stratified success: GPT-4o-mini (96.8%), GPT-4o (95.3%), Gemini (89.0%), DeepSeek (83.4%). Models handle simple tasks reliably (91.7%) but fail on complex workflows (0.0%), and none meet production-quality coding practices despite functional completion. We release our complete benchmark pipeline, evaluation framework, and experimental results at https://github.com/hyunjun1121/MacroBench to enable reproducible assessment of macro synthesis for web automation.

MacroBench: A Novel Testbed for Web Automation Scripts via Large Language Models Read Post »

AI, Committee, 新闻, Uncategorized

Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning

arXiv:2507.16746v2 Announce Type: replace-cross Abstract: Humans often use visual aids, for example diagrams or sketches, when solving complex problems. Training multimodal models to do the same, known as Visual Chain of Thought (Visual CoT), is challenging due to: (1) poor off-the-shelf visual CoT performance, which hinders reinforcement learning, and (2) the lack of high-quality visual CoT training data. We introduce $textbf{Zebra-CoT}$, a diverse large-scale dataset with 182,384 samples, containing logically coherent interleaved text-image reasoning traces. We focus on four categories of tasks where sketching or visual reasoning is especially natural, spanning scientific questions such as geometry, physics, and algorithms; 2D visual reasoning tasks like visual search and jigsaw puzzles; 3D reasoning tasks including 3D multi-hop inference, embodied and robot planning; visual logic problems and strategic games like chess. Fine-tuning the Anole-7B model on the Zebra-CoT training corpus results in an improvement of +12% in our test-set accuracy and yields up to +13% performance gain on standard VLM benchmark evaluations. Fine-tuning Bagel-7B yields a model that generates high-quality interleaved visual reasoning chains, underscoring Zebra-CoT’s effectiveness for developing multimodal reasoning abilities. We open-source our dataset and models to support development and evaluation of visual CoT.

Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning Read Post »

AI, Committee, 新闻, Uncategorized

From Correction to Mastery: Reinforced Distillation of Large Language Model Agents

arXiv:2509.14257v2 Announce Type: replace Abstract: Large Language Model agents excel at solving complex tasks through iterative reasoning and tool use, but typically depend on ultra-large, costly backbones. Existing distillation approaches train smaller students to imitate full teacher trajectories, yet reasoning and knowledge gaps between the teacher and student can cause compounding errors. We propose SCoRe, a student-centered framework in which the student generates training trajectories and the teacher corrects only the earliest error, producing training data matched to the student’s ability and exposing specific weaknesses. The student is first fine-tuned on corrected trajectories. Subsequently, short-horizon reinforcement learning starts from the verified prefix preceding the earliest error, with target rewards assigned at that step. This design encourages autonomous problem-solving beyond imitation and enhances training stability. On 12 challenging benchmarks, a 7B-parameter student distilled with SCoRe matches the agentic performance of a 72B-parameter teacher.

From Correction to Mastery: Reinforced Distillation of Large Language Model Agents Read Post »

AI, Committee, 新闻, Uncategorized

STEPER: Step-wise Knowledge Distillation for Enhancing Reasoning Ability in Multi-Step Retrieval-Augmented Language Models

arXiv:2510.07923v1 Announce Type: new Abstract: Answering complex real-world questions requires step-by-step retrieval and integration of relevant information to generate well-grounded responses. However, existing knowledge distillation methods overlook the need for different reasoning abilities at different steps, hindering transfer in multi-step retrieval-augmented frameworks. To address this, we propose Stepwise Knowledge Distillation for Enhancing Reasoning Ability in Multi-Step Retrieval-Augmented Language Models (StepER). StepER employs step-wise supervision to align with evolving information and reasoning demands across stages. Additionally, it incorporates difficulty-aware training to progressively optimize learning by prioritizing suitable steps. Our method is adaptable to various multi-step retrieval-augmented language models, including those that use retrieval queries for reasoning paths or decomposed questions. Extensive experiments show that StepER outperforms prior methods on multi-hop QA benchmarks, with an 8B model achieving performance comparable to a 70B teacher model.

STEPER: Step-wise Knowledge Distillation for Enhancing Reasoning Ability in Multi-Step Retrieval-Augmented Language Models Read Post »

We use cookies to improve your experience and performance on our website. You can learn more at 隱私權政策 and manage your privacy settings by clicking Settings.

Privacy Preferences

You can choose your cookie settings by turning on/off each type of cookie as you wish, except for essential cookies.

Allow All
Manage Consent Preferences
  • Always Active

Save
zh_CN