YouZum

ニュース

AI, Committee, ニュース, Uncategorized

Enterprise AI Without GPU Burn: Salesforce’s xGen-small Optimizes for Context, Cost, and Privacy

Language processing in enterprise environments faces critical challenges as business workflows increasingly depend on synthesising information from diverse sources, including internal documentation, code repositories, research reports, and real-time data streams. While recent advances in large language models have delivered impressive capabilities, this progress comes with significant downsides: skyrocketing per-request costs, constant hardware upgrade requirements, and increased data privacy risks.  Pursuing ever-larger model architectures has demonstrated diminishing returns, with the accelerating energy demands potentially constraining future AI development. Modern enterprises now require balanced solutions that deliver comprehensive long-context comprehension while maintaining efficient processing, predictable low-cost serving capabilities, and robust privacy guarantees—a combination that small language models are uniquely positioned to provide despite the complex, high-volume inference demands characteristic of today’s business applications. Traditional approaches to extending language model capabilities beyond their inherent context limitations have relied on several workaround methods. Retrieval-augmented generation (RAG) systems pull relevant information from external knowledge bases to supplement model inputs. External tool calls enable models to access specialised functions outside their parameters. Memory mechanisms artificially persist information across conversation turns. While functional, these techniques represent brittle “stitching” solutions that add complexity and potential failure points to processing pipelines.  Context window extensions in larger models attempted to address these limitations but introduced significant computational overhead. Each method fundamentally acknowledges the same critical need: genuine long-context processing capabilities that allow models to handle entire documents, sustained conversations, code repositories, and research reports in a single forward pass rather than through fragmented processing. These stopgap approaches highlight why native extended context is essential—it eliminates architectural complexity while maintaining information coherence throughout processing. Salesforce AI Research has developed xGen-small, an enterprise-ready compact language model for efficient long-context processing. This solution combines domain-focused data curation, scalable pre-training, length-extension techniques, instruction fine-tuning, and reinforcement learning to deliver high-performance enterprise AI capabilities with predictable low costs, addressing the critical balance businesses require between capability and operational efficiency. xGen-small’s architecture employs a “small but long” strategy that fundamentally inverts the traditional scale-up paradigm. Rather than increasing parameter counts, this approach deliberately shrinks model size while precisely refining data distributions toward enterprise-relevant domains and training protocols. This architectural philosophy demands comprehensive expertise across multiple development stages and components working in concert through a vertically integrated pipeline.  The framework begins with meticulous raw data curation followed by scalable pre-training optimised for efficient processing. Sophisticated length-extension mechanisms enable the compact model to handle extensive contexts while targeted post-training and reinforcement learning techniques enhance performance in enterprise-specific tasks. This architecture delivers strategic advantages for business applications by providing cost efficiency, robust privacy safeguards, and long-context understanding without the resource requirements of larger models, creating a sustainable pathway for deploying Enterprise AI at scale with predictable operational characteristics. xGen-small’s development pipeline integrates multiple stages into a streamlined workflow. Starting with a multi-trillion-token corpus, the process applies rigorous filtering and quality controls before large-scale TPU pre-training with optimised learning schedules. Targeted length-extension techniques expand context capacity, while task-specific post-training and reward-based reinforcement learning refine model capabilities. Data curation for xGen-small began with harvesting a corpus substantially larger than the final eight trillion training tokens. The pipeline applied fast heuristic filters to remove spam, followed by a two-stage quality assessment using classifier ensembles. Exact hashing and fuzzy fingerprinting eliminated near-duplicates, while careful balancing of general data with specialised content for code, mathematics, and natural language optimised performance. Extensive ablation studies refined this curation approach to maximise factual accuracy and overall usefulness. Pre-training of xGen-small utilises TPU v5p pods with Jaxformer v8 library, implementing FSDP, sequence-parallel attention, and splash kernels for maximum efficiency. The multi-phase learning rate schedule optimises training dynamics. At the same time, a carefully balanced data mixture combines code corpora, natural language examples, mathematical texts, and high-quality filtered content to capture both diversity and domain expertise. xGen-small demonstrates competitive performance against leading baselines in its size class. The strategic blending of diverse data types—including low-entropy code, high-entropy natural language, mathematical content, and classifier-filtered high-quality subsets—delivers exceptional results across evaluation metrics while maintaining the model’s compact, efficient architecture. This approach successfully balances processing efficiency with robust performance capabilities required for enterprise applications. Performance evaluations demonstrate xGen-small’s exceptional long-context capabilities, with the 9B model achieving state-of-the-art results on the RULER benchmark and the 4B model securing second place in its class. Unlike competitors whose performance degrades significantly at extended context lengths, xGen maintains consistent performance from 4K to 128K tokens. This stability comes from a sophisticated length-extension strategy using two-stage extension (32K then 128K), over-length training to 256K, and sequence parallelism to manage memory constraints efficiently, delivering reliable performance across the entire context spectrum. Post-training transforms xGen-small base models into comprehensive instruction models through a two-stage process. First, supervised fine-tuning uses a diverse, high-quality instruction dataset spanning mathematics, coding, safety, and general-purpose domains to establish core behaviours and alignment. Subsequently, large-scale reinforcement learning refines the model’s policy, particularly enhancing reasoning capabilities. This approach delivers exceptional performance in complex reasoning domains like mathematics, coding, and STEM applications while maintaining consistent instruction-following abilities across general tasks. The development of xGen-small demonstrates that deliberately constraining model size while extending context capacity creates optimal solutions for enterprise AI applications. This “small but long” approach significantly reduces inference costs and hardware requirements while enabling seamless processing of extensive internal knowledge sources without external retrieval dependencies. Through an integrated pipeline of meticulous data curation, scalable pre-training, targeted length-extension, and reinforcement learning, these compact models match or exceed larger counterparts’ performance. This architecture provides businesses with a predictable, sustainable, cost-effective, and privacy-preserving framework for deploying AI at enterprise scale. Check out the Model on Hugging Face and Technical details. Also, don’t forget to follow us on Twitter. Here’s a brief overview of what we’re building at Marktechpost: ML News Community – r/machinelearningnews (92k+ members) Newsletter– airesearchinsights.com/(30k+ subscribers) miniCON AI Events – minicon.marktechpost.com AI Reports & Magazines – magazine.marktechpost.com AI Dev & Research News – marktechpost.com (1M+ monthly readers) Partner with us The post Enterprise AI Without GPU Burn: Salesforce’s xGen-small Optimizes for Context, Cost, and Privacy appeared first on MarkTechPost.

Enterprise AI Without GPU Burn: Salesforce’s xGen-small Optimizes for Context, Cost, and Privacy 投稿を読む »

AI, Committee, ニュース, Uncategorized

Microsoft Researchers Introduce ARTIST: A Reinforcement Learning Framework That Equips LLMs with Agentic Reasoning and Dynamic Tool Use

LLMs have made impressive gains in complex reasoning, primarily through innovations in architecture, scale, and training approaches like RL. RL enhances LLMs by using reward signals to guide the model towards more effective reasoning strategies, resulting in longer and more coherent thought processes that adapt dynamically to a task’s complexity. Despite this, most RL-enhanced LLMs rely heavily on static internal knowledge and text-only reasoning, making them ill-suited for tasks requiring real-time information, domain-specific expertise, or precise computations. This limitation is especially evident in knowledge-intensive or open-ended problems where the inability to access and interact with external tools leads to inaccuracies or hallucinations. To overcome these constraints, recent work has explored agentic reasoning, where LLMs dynamically engage with external tools and environments during the reasoning process. These tools include web search, APIs, and code execution platforms, while environments range from simulated browsers to operating systems. Agentic reasoning enables models to plan, adapt, and solve tasks interactively, beyond static inference. However, current methods for tool integration often depend on manually designed prompts or supervised fine-tuning, which hinder scalability and generalization. Emerging reinforcement learning techniques like Group Relative Policy Optimization (GRPO) provide more efficient and adaptive training for tool use without step-level supervision. Yet, the intersection of RL, tool use, and agentic decision-making remains underexplored, particularly in real-world tasks that demand multi-turn reasoning, dynamic planning, and robust external interaction.  Microsoft Research introduces ARTIST (Agentic Reasoning and Tool Integration in Self-improving Transformers), a framework that combines agentic reasoning, reinforcement learning, and dynamic tool use to enhance LLMs. ARTIST enables models to autonomously decide when, how, and which tools to use during multi-step reasoning, learning robust strategies without step-level supervision. The model improves reasoning and interaction with external environments through integrated tool queries and outputs. Evaluated on challenging math and function-calling benchmarks, ARTIST outperforms top models like GPT-4o, achieving up to 22% gains. It demonstrates emergent agentic behaviors, setting a new standard in generalizable and interpretable problem-solving.  ARTIST is a flexible framework that enables LLMs to interact with external tools and environments using reinforcement learning. It alternates between reasoning and tool use, allowing the model to choose when and how to invoke tools like code interpreters or APIs. Training uses GRPO, which avoids value functions and uses outcome-based group rewards. ARTIST structures rollouts into reasoning, tool queries, tool outputs, and final answers, with a composite reward system encouraging correctness, proper format, and successful tool use, enabling adaptive, multi-step problem-solving.  ARTIST outperforms various baselines, including GPT-4o and tool-augmented LLMs, on complex mathematical benchmarks like AMC, AIME, and Olympiad. It achieves higher Pass@1 accuracy, with notable gains of up to 22% over base models and over 35% compared to other tool-integrated methods. ARTIST’s advantage comes from its agentic reinforcement learning, enabling it to use external tools and refine multi-step solutions strategically. Compared to prompt-based tool usage, it shows superior tool invocation, response quality, and reasoning depth. While its benefits are most evident in complex tasks, ARTIST significantly improves simpler datasets like MATH-500 through selective tool use.  In conclusion, ARTIST is a framework that combines agentic reasoning, reinforcement learning, and dynamic tool use to enhance the capabilities of LLMs. Unlike traditional prompt-based approaches, ARTIST enables models to autonomously plan, adapt, and solve complex tasks by interacting with external tools and environments. It learns effective tool-use strategies without step-by-step supervision, improving accuracy and deeper reasoning. Evaluations on mathematical and function-calling benchmarks show significant performance gains. ARTIST also produces more interpretable reasoning paths and robust behaviors. This work highlights the potential of agentic RL as a promising direction for creating more adaptive and capable AI systems.  Check out the Paper. Also, don’t forget to follow us on Twitter. Here’s a brief overview of what we’re building at Marktechpost: ML News Community – r/machinelearningnews (92k+ members) Newsletter– airesearchinsights.com/(30k+ subscribers) miniCON AI Events – minicon.marktechpost.com AI Reports & Magazines – magazine.marktechpost.com AI Dev & Research News – marktechpost.com (1M+ monthly readers) Partner with us The post Microsoft Researchers Introduce ARTIST: A Reinforcement Learning Framework That Equips LLMs with Agentic Reasoning and Dynamic Tool Use appeared first on MarkTechPost.

Microsoft Researchers Introduce ARTIST: A Reinforcement Learning Framework That Equips LLMs with Agentic Reasoning and Dynamic Tool Use 投稿を読む »

AI, Committee, ニュース, Uncategorized

ZeroSearch from Alibaba Uses Reinforcement Learning and Simulated Documents to Teach LLMs Retrieval Without Real-Time Search

Large language models are now central to various applications, from coding to academic tutoring and automated assistants. However, a critical limitation persists in how these models are designed; they are trained on static datasets that become outdated over time. This creates a fundamental challenge because the language models cannot update their knowledge or validate responses against fresh, real-world data. As a result, while these models demonstrate strong performance on reasoning tasks or structured queries, their answers can still include fabricated or obsolete information, reducing their reliability in real-world usage. To maintain credibility, especially for applications requiring updated knowledge such as news, research, or product reviews, models must interact with external data sources in a timely and cost-efficient manner. The core problem lies in teaching these models to effectively retrieve and incorporate external information. While fine-tuned pretraining helps develop a strong baseline understanding, the capacity to conduct meaningful, dynamic searches is missing. Equipping language models with this ability introduces practical constraints. Search engines used for external information retrieval provide varying document quality that introduces inconsistency in model training. Moreover, integrating reinforcement learning to simulate real-world searching requires large-scale interactions with live APIs, running up hundreds of thousands of calls, which becomes prohibitively expensive. This results in a bottleneck for academic research and commercial deployment, where cost and training scalability are critical. Various methods have been developed to enhance language models’ search and retrieval capabilities. Some early techniques relied on prompt-based instructions that guided the model through processes like generating sub-queries or managing multi-step searches. These methods, however, heavily relied on manual tuning and often required extensive computational resources to ensure consistent outputs. Other approaches leaned on supervised fine-tuning for smaller models to perform more targeted retrieval, with models like Self-RAG and RetroLLM emerging in this space. There have also been experiments with techniques like Monte Carlo Tree Search to expand possible answer paths during inference dynamically. Reinforcement learning-based solutions like Search-R1 and DeepResearcher allowed models to interact directly with real search engines, offering a training experience closer to how users behave. However, these innovations still suffer from either complexity, high computational demand, or financial cost due to live interaction constraints. Researchers from Tongyi Lab at Alibaba Group introduced an innovative solution called ZeroSearch. This reinforcement learning framework removes the need for live API-based search entirely. Instead, it uses another language model to simulate the behavior of a search engine. The simulation model is fine-tuned through supervised training to generate documents that either help or mislead the policy model, depending on whether the content is designed to be relevant or noisy. This allows complete control over the document quality and cost while enabling a realistic retrieval training experience. A key innovation lies in using curriculum-based learning during training, which means gradually introducing harder retrieval tasks by adjusting how much noise is present in the generated documents. This progression helps the policy model develop resilience and better reasoning skills over time without ever making a real search query. The structure of ZeroSearch involves distinct phases in the reasoning process. The model first thinks internally using designated tags, then generates queries if it determines that additional information is needed. Finally, it outputs an answer only when sufficient context is acquired. This structured approach enforces clarity in decision-making and has been shown to improve transparency and answer quality. A minimal change in prompts guides document generation for the simulated search engine that controls whether the document appears helpful or misleading. The simulated LLM is fine-tuned using interaction data where each retrieval trajectory is labeled based on the correctness of the final answer. The policy model is taught to handle straightforward and complex search conditions by systematically varying document quality. A performance scaling function determines how much noise is introduced at each training stage, increasing the model’s ability to navigate uncertainty over time. A 3-billion parameter model was able to simulate the retrieval process for training purposes effectively. The results became particularly notable with larger models. A 7B retrieval module was performed at a level comparable to Google Search regarding response quality. A 14B model even surpassed Google Search benchmarks. ZeroSearch also showed flexibility, functioning effectively across base and instruction-tuned LLMs of different sizes. It integrates well with a range of reinforcement learning algorithms, including PPO, GRPO, and Reinforce++, and it uses a reward design based on the F1 score rather than exact match to discourage the model from generating excessively long answers just to increase keyword overlap. Furthermore, ZeroSearch uses a masking mechanism during backpropagation to ensure that gradients are only computed on the policy model’s outputs, stabilizing training without sacrificing performance. The research demonstrates a clear and efficient alternative to real-time search engine reliance. Using simulation-driven document generation removes the need for high-cost APIs, and the quality of training input is controlled with precision. The method also boosts model reasoning capability by introducing progressive noise and uncertainty, effectively mimicking how real-world data retrieval might fail or mislead. The policy model is trained to extract the most useful information. These traits make ZeroSearch a scalable and practical solution for commercial-grade applications. This approach successfully identifies and addresses the twin challenges of document quality variability and economic cost that have limited real-time search integration in language model training. It combines document simulation, structured interaction, and reinforcement learning to ensure effectiveness and scalability. By relying solely on simulated data generation, the researchers achieved superior or comparable results to existing methods while removing all dependency on costly APIs. Several Key Takeaways from the Research include the following: A 3B model simulated realistic document retrieval effectively with zero API cost. A 7B retrieval module matched Google Search performance in benchmark tests. The 14B model exceeded real search engine performance. Reinforcement learning was performed with a curriculum-based rollout that gradually introduced noise. A simulation LLM generated both relevant and noisy documents via lightweight supervised fine-tuning. Structured interaction phases (<think>, <search>, <answer>) improved model clarity and accuracy. F1-based rewards discouraged reward hacking by penalizing irrelevant answer length. Compatible with major

ZeroSearch from Alibaba Uses Reinforcement Learning and Simulated Documents to Teach LLMs Retrieval Without Real-Time Search 投稿を読む »

AI, Committee, ニュース, Uncategorized

Huawei Introduces Pangu Ultra MoE: A 718B-Parameter Sparse Language Model Trained Efficiently on Ascend NPUs Using Simulation-Driven Architecture and System-Level Optimization

Sparse large language models (LLMs) based on the Mixture of Experts (MoE) framework have gained traction for their ability to scale efficiently by activating only a subset of parameters per token. This dynamic sparsity allows MoE models to retain high representational capacity while limiting computation per token. However, with their increasing complexity and model size approaching trillions of parameters, training them efficiently requires algorithmic innovation and a tightly integrated hardware-software optimization. These challenges are especially relevant when deploying models on non-standard AI accelerators like Ascend NPUs, which require specific architectural alignment to deliver optimal performance. A major technical challenge lies in the inefficient utilization of hardware resources while training sparse LLMs. Since only a portion of parameters are active for each token, workloads across devices become unbalanced, leading to synchronization delays and underused processing power. This imbalance also affects memory utilization as different experts process different numbers of tokens, sometimes exceeding capacity. These inefficiencies are compounded at a large scale, such as across thousands of AI chips, where communication and memory management bottlenecks significantly hinder throughput. The inability to fully harness the computational promise of sparsity in practice restricts the deployment of such models on hardware systems like Ascend NPUs. Several strategies have been proposed to tackle these challenges. These include auxiliary losses to balance token distribution across experts and drop-and-pad strategies that limit expert overload by discarding tokens exceeding capacity. However, these techniques either reduce model performance or introduce inefficiencies in memory and computation. Other efforts include heuristic expert placement and traditional communication patterns like All-to-All dispatching, but these often fail to scale well or maintain high throughput. Moreover, standard memory-saving techniques like recomputation are usually coarse-grained, targeting whole layers instead of specific operations, leading to increased runtime without proportional memory savings. Researchers from the Pangu team at Huawei Cloud introduced a highly structured and optimized training approach for large MoE models tailored to Ascend NPUs. They developed Pangu Ultra MoE, a sparse LLM with 718 billion parameters, focusing on aligning model architecture and system design with the capabilities of the Ascend hardware. Their approach begins with a simulation-based model configuration process that evaluates thousands of architecture variants using metrics grounded in actual hardware behavior. These simulations inform design decisions before any physical training is undertaken, thus saving substantial computational resources and enabling informed tuning of model hyperparameters. The simulation method analyzes combinations of parameters such as the number of layers, hidden size, and expert count using a five-dimensional parallelism strategy that includes Pipeline Parallelism, Tensor Parallelism, Expert Parallelism, Data Parallelism, and Context Parallelism. The final model configuration adopted by Huawei included 256 experts, a hidden size 7680, and 61 transformer layers. To further optimize performance, researchers integrated an Adaptive Pipe Overlap mechanism to mask communication costs and used hierarchical All-to-All communication to reduce inter-node data transfer. They employed fine-grained recomputation, such as recomputing only key-value vectors in attention modules, and introduced tensor swapping to offload activation memory to host devices dynamically. Pangu Ultra MoE achieved a Model Flops Utilization (MFU) of 30.0% and processed tokens at a rate of 1.46 million per second using 6,000 Ascend NPUs. The baseline MFU was 18.9% with 0.61 million tokens per second on 4,000 NPUs. The researchers also introduced dynamic expert placement strategies, improving device-level load balance and achieving a relative 10% MFU improvement. The model performed competitively on benchmark evaluations, attaining 81.3% on AIME2024, 97.4% on MATH500, 94.8% on CLUEWSC, and 91.5% on MMLU. In the healthcare domain, it outperformed DeepSeek R1 by scoring 87.1% on MedQA and 80.8% on MedMCQA, confirming its strength in domain-specific applications. This study illustrates how the Pangu team at Huawei effectively tackled the core difficulties of training massive MoE models on specialized hardware. Their systematic architecture search, efficient communication techniques, and tailored memory optimizations represent a strong framework for scalable AI training. The work demonstrates practical ways to unlock the performance potential of sparse models and sets a direction for future system-aware AI design. Check out Paper here. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit. Here’s a brief overview of what we’re building at Marktechpost: ML News Community – r/machinelearningnews (92k+ members) Newsletter– airesearchinsights.com/(30k+ subscribers) miniCON AI Events – minicon.marktechpost.com AI Reports & Magazines – magazine.marktechpost.com AI Dev & Research News – marktechpost.com (1M+ monthly readers) The post Huawei Introduces Pangu Ultra MoE: A 718B-Parameter Sparse Language Model Trained Efficiently on Ascend NPUs Using Simulation-Driven Architecture and System-Level Optimization appeared first on MarkTechPost.

Huawei Introduces Pangu Ultra MoE: A 718B-Parameter Sparse Language Model Trained Efficiently on Ascend NPUs Using Simulation-Driven Architecture and System-Level Optimization 投稿を読む »

AI, Committee, ニュース, Uncategorized

ServiceNow AI Released Apriel-Nemotron-15b-Thinker: A Compact Yet Powerful Reasoning Model Optimized for Enterprise-Scale Deployment and Efficiency

AI models today are expected to handle complex tasks such as solving mathematical problems, interpreting logical statements, and assisting with enterprise decision-making. Building such models demands the integration of mathematical reasoning, scientific understanding, and advanced pattern recognition. As the demand for intelligent agents in real-time applications, like coding assistants and business automation tools, continues to grow, there is a pressing need for models that combine strong performance with efficient memory and token usage, making them viable for deployment in practical hardware environments. A central challenge in AI development is the resource intensity of large-scale reasoning models. Despite their strong capabilities, these models often require significant memory and computational resources, limiting their real-world applicability. This creates a gap between what advanced models can achieve and what users can realistically deploy. Even well-resourced enterprises may find running models demanding dozens of gigabytes of memory or high inference costs unsustainable. The issue is not just about building smarter models, but ensuring they are efficient and deployable in real-world platforms. High-performing models such as QWQ‑32b, o1‑mini, and EXAONE‑Deep‑32b excel at tasks involving mathematical reasoning and academic benchmarks. However, their dependence on high-end GPUs and high token consumption limits their use in production settings. These models highlight the ongoing trade-off in AI deployment: achieving high accuracy at the cost of scalability and efficiency. Addressing this gap, researchers at ServiceNow introduced Apriel-Nemotron-15b-Thinker. This model consists of 15 billion parameters, a relatively modest size compared to its high-performing counterparts, yet it demonstrates performance on par with models almost twice its size. The primary advantage lies in its memory footprint and token efficiency. While delivering competitive results, it requires nearly half the memory of QWQ‑32b and EXAONE‑Deep‑32b. This directly contributes to improved operational efficiency in enterprise environments, making it feasible to integrate high-performance reasoning models into real-world applications without large-scale infrastructure upgrades. The development of Apriel-Nemotron-15b-Thinker followed a structured three-stage training approach, each designed to enhance a specific aspect of the model’s reasoning capabilities. In the initial phase, termed Continual Pre-training (CPT), the model was exposed to over 100 billion tokens. These tokens were not generic text but carefully selected examples from domains requiring deep reasoning, mathematical logic, programming challenges, scientific literature, and logical deduction tasks. This exposure provided the foundational reasoning capabilities that distinguish the model from others. The second stage involved Supervised Fine-Tuning (SFT) using 200,000 high-quality demonstrations. These examples further calibrated the model’s responses to reasoning challenges, enhancing performance on tasks that require accuracy and attention to detail. The final tuning stage, GRPO (Guided Reinforcement Preference Optimization), refined the model’s outputs by optimizing alignment with expected results across key tasks. This pipeline ensures the model is intelligent, precise, structured, and scalable. In enterprise-specific tasks such as MBPP, BFCL, Enterprise RAG, MT Bench, MixEval, IFEval, and Multi-Challenge, the model delivered competitive or superior performance compared to larger models. Regarding production efficiency, it consumed 40% fewer tokens than QWQ‑32b, significantly lowering inference costs. From a memory standpoint, it achieves all this with approximately 50% of the memory needed by QWQ‑32b and EXAONE-Deep‑32b, indicating a substantial improvement in deployment feasibility. Even in academic benchmarks, such as AIME-24, AIME-25, AMC-23, MATH-500, and GPQA, the model held its own, often equaling or surpassing the performance of other larger models, all while being significantly lighter in computational demand. Several Key Takeaways from the Research on Apriel-Nemotron-15b-Thinker: Apriel-Nemotron-15b-Thinker has 15 billion parameters, significantly smaller than QWQ-32b or EXAONE-Deep-32b, but performs competitively. Uses a 3-phase training, 100B+ tokens in CPT, 200K fine-tuning demos in SFT, and final GRPO refinement. Consumes around 50% less memory than QWQ-32b, allowing for easier deployment on enterprise hardware. Uses 40% fewer tokens in production tasks than QWQ-32b, reducing inference cost and increasing speed. Outperforms or equals larger models on MBPP, BFCL, Enterprise RAG, and academic tasks like GPQA and MATH-500. Optimized for Agentic and Enterprise tasks, suggesting utility in corporate automation, coding agents, and logical assistants. Designed specifically for real-world use, avoiding over-reliance on lab-scale compute environments. Check out the Model on Hugging Face. Also, don’t forget to follow us on Twitter. Here’s a brief overview of what we’re building at Marktechpost: ML News Community – r/machinelearningnews (92k+ members) Newsletter– airesearchinsights.com/(30k+ subscribers) miniCON AI Events – minicon.marktechpost.com AI Reports & Magazines – magazine.marktechpost.com AI Dev & Research News – marktechpost.com (1M+ monthly readers) The post ServiceNow AI Released Apriel-Nemotron-15b-Thinker: A Compact Yet Powerful Reasoning Model Optimized for Enterprise-Scale Deployment and Efficiency appeared first on MarkTechPost.

ServiceNow AI Released Apriel-Nemotron-15b-Thinker: A Compact Yet Powerful Reasoning Model Optimized for Enterprise-Scale Deployment and Efficiency 投稿を読む »

AI, Committee, ニュース, Uncategorized

Google Redefines Computer Science R&D: A Hybrid Research Model that Merges Innovation with Scalable Engineering

Computer science research has evolved into a multidisciplinary effort involving logic, engineering, and data-driven experimentation. With computing systems now deeply embedded in everyday life, research increasingly focuses on large-scale, real-time systems capable of adapting to diverse user needs. These systems often learn from massive datasets and must handle unpredictable interactions. As the scope of computer science broadens, so does the methodology, requiring tools and approaches that accommodate scalability, responsiveness, and empirical validation over purely theoretical models. The difficulty arises when connecting innovative ideas to practical applications without losing the depth and risk inherent in true research. Rapid development cycles, product deadlines, and user expectations often overlap with the uncertain timelines and exploratory nature of research. The challenge is enabling meaningful innovation while maintaining relevance and practical outcomes. Finding a structure where exploration and implementation coexist is essential to making real progress in this demanding and high-impact field. Traditionally, the division between research and engineering has led to inefficiencies. Research teams create conceptual models or prototypes, which are later handed over to engineering teams for scaling and integration. This separation often results in delays, failures in technology transfer, and difficulty adapting ideas to real-world use. Even when research has academic value, the lack of immediate relevance or scalable deployment options limits its broader impact. Conventional dissemination methods, such as peer-reviewed papers, don’t always align with the fast-moving demands of technology development. Google introduced a hybrid research model integrating researchers directly into product and engineering teams. This approach was designed to reduce delays between ideation and implementation, enabling faster and more relevant outcomes. Researchers at Google, a company that runs at the intersection of massive computing infrastructure and billions of users, operate within small teams that remain involved from concept to deployment. By embedding development research, the risk of failure is offset by iterative learning and empirical data gathered from actual user interactions. This model promotes cross-functional innovation where knowledge flows seamlessly between domains. The methodology adopted by Google supports research through robust infrastructure and real-time experimentation. Teams write production-ready code early and rely on continuous feedback from deployed services. Elaborate prototypes are avoided, as they slow the path to real user impact. Google’s services model allows even small teams to access powerful computing resources and integrate complex features quickly. Their projects are modularized, breaking long-term goals into smaller, achievable components. This structure keeps motivation high and provides frequent opportunities for measurable progress. Research is not isolated from engineering but rather supported by it, ensuring that practical constraints and user behavior shape every line of code and every experiment. The results of this model are substantial. Google published 279 research papers in 2011, a steep rise from 13 in 2003, showing an increased emphasis on sharing its scientific advancements. High-impact systems such as MapReduce, BigTable, and the Google File System originated within this hybrid structure and have become foundational to modern computing. Over 1,000 open-source projects and hundreds of public APIs have emerged from this integrated approach. Google Translate and Voice Search are examples of small research teams that transitioned ideas into large-scale products. Contributions extend to global standards, with team members shaping specifications like HTML5. By deeply connecting research with product development, Google has built a model that fosters innovation and delivers it at scale. Its hybrid research system empowers teams to work on difficult problems without being detached from practical realities. Projects are designed with user impact and academic relevance in mind, allowing teams to adjust direction quickly when goals are unmet. This has led to projects such as Google Health being re-evaluated when they did not yield the expected outcomes, showing the model’s flexibility and pragmatism. Combining experimentation, real-world data, and scalable engineering, Google has built a framework that makes research outcomes more tangible and impactful. This paper clearly shows how a unified approach to research and engineering can bridge the gap between innovation and usability, offering a potential blueprint for other technology-driven organizations. Check out the Paper. Also, don’t forget to follow us on Twitter. Here’s a brief overview of what we’re building at Marktechpost: ML News Community – r/machinelearningnews (92k+ members) Newsletter– airesearchinsights.com/(30k+ subscribers) miniCON AI Events – minicon.marktechpost.com AI Reports & Magazines – magazine.marktechpost.com AI Dev & Research News – marktechpost.com (1M+ monthly readers) The post Google Redefines Computer Science R&D: A Hybrid Research Model that Merges Innovation with Scalable Engineering appeared first on MarkTechPost.

Google Redefines Computer Science R&D: A Hybrid Research Model that Merges Innovation with Scalable Engineering 投稿を読む »

AI, Committee, ニュース, Uncategorized

AI That Teaches Itself: Tsinghua University’s ‘Absolute Zero’ Trains LLMs With Zero External Data

LLMs have shown advancements in reasoning capabilities through Reinforcement Learning with Verifiable Rewards (RLVR), which relies on outcome-based feedback rather than imitating intermediate reasoning steps. Current RLVR works face critical scalability challenges as they heavily depend on manually curated collections of questions and answers for training. As reasoning models advance, constructing large-scale, high-quality datasets becomes increasingly unsustainable, similar to bottlenecks identified in LLM pretraining. Moreover, exclusive dependency on human-designed tasks may constrain AI systems’ capacity for autonomous learning and development, especially as they evolve beyond human intellectual capabilities. Researchers have explored various approaches to enhance LLM reasoning capabilities. STaR pioneered self-bootstrapping using expert iteration and rejection sampling of outcome-verified responses to improve CoT reasoning. The o1 model deployed this concept at scale, achieving state-of-the-art results, and R1 later became the first open-weight model to match or surpass o1’s performance by introducing the “zero” setting where RL is applied directly to the base LLM. Further, self-play paradigms have evolved from Schmidhuber’s early two-agent setups to more complex implementations like AlphaGo and AlphaZero. Recent methods such as SPIN, Self-Rewarding Language Models, SPC, and SPAG have applied self-play to language models for alignment and reasoning. Researchers from Tsinghua University, Beijing Institute for General Artificial Intelligence, and Pennsylvania State University have proposed an RLVR paradigm called Absolute Zero to enable a single model to autonomously generate and solve tasks that maximize its own learning progress without relying on any external data. Under this method, researchers have introduced the Absolute Zero Reasoner (AZR) that self-evolves its training curriculum and reasoning ability through a code executor that validates proposed code reasoning tasks and verifies answers, providing a unified source of verifiable reward to guide open-ended yet grounded learning. AZR can be effectively implemented across different model scales and remains compatible with various model classes, suggesting broad applicability. LLMs provide an ideal framework for implementing AZR in multitask learning contexts. During each online rollout iteration in the absolute zero setting’s objective equation, AZR proposes new reasoning tasks based on task type and past self-generated examples, with explicit prompting to generate diverse tasks and then attempts to solve them, receiving grounded feedback for its model responses. AZR utilizes a code executor as both a flexible interface and verifiable environment, enabling automatic construction, execution, and validation of code reasoning tasks. Lastly, the AZR Algorithm includes buffer initialization, Task Proposal Inputs and Buffer Management, valid task construction, solution validation, and advantage estimator calculation through Task-Relative REINFORCE++. The Absolute Zero Reasoner-Coder-7B has achieved state-of-the-art performance in the 7B overall average and coding average categories, surpassing previous best models by 1.8 absolute percentage points despite being entirely out-of-distribution for both math and code reasoning benchmarks. It outperforms models trained with expert-curated human data in coding by 0.3 absolute percentage points while never accessing such data itself. Scaling analysis reveals that AZR delivers greater gains on larger models, with the 7B and 14B models continuing to improve beyond 200 training steps while the 3B model plateaus. Out-of-distribution performance gains increase with model size: +5.7, +10.2, and +13.2 for 3B, 7B, and 14B, respectively. In conclusion, researchers introduced the Absolute Zero paradigm to address data limitations in existing RLVR frameworks. Under this method, researchers present AZR, which trains models to propose and solve code-related reasoning tasks grounded by a code executor. However, there is a limitation regarding safety management in self-improving systems. The team observed several instances of safety-concerning CoT reasoning from the Llama-3.1-8B model, termed “uh-oh moments.” The findings indicate that while the Absolute Zero paradigm reduces human intervention needs in task curation, ongoing oversight remains necessary to address lingering safety concerns, highlighting a critical direction for future research. Check out the Paper, Model on Hugging Face and GitHub Page. Also, don’t forget to follow us on Twitter. Here’s a brief overview of what we’re building at Marktechpost: ML News Community – r/machinelearningnews (92k+ members) Newsletter– airesearchinsights.com/(30k+ subscribers) miniCON AI Events – minicon.marktechpost.com AI Reports & Magazines – magazine.marktechpost.com AI Dev & Research News – marktechpost.com (1M+ monthly readers) Partner with us The post AI That Teaches Itself: Tsinghua University’s ‘Absolute Zero’ Trains LLMs With Zero External Data appeared first on MarkTechPost.

AI That Teaches Itself: Tsinghua University’s ‘Absolute Zero’ Trains LLMs With Zero External Data 投稿を読む »

ja