YouZum

Committee

AI, Committee, Actualités, Uncategorized

Microsoft AI Introduces rStar2-Agent: A 14B Math Reasoning Model Trained with Agentic Reinforcement Learning to Achieve Frontier-Level Performance

Table of contents The Problem with “Thinking Longer” The Agentic Approach Infrastructure Challenges and Solutions GRPO-RoC: Learning from High-Quality Examples Training Strategy: From Simple to Complex Breakthrough Results Understanding the Mechanisms Summary The Problem with “Thinking Longer” Large language models have made impressive strides in mathematical reasoning by extending their Chain-of-Thought (CoT) processes—essentially “thinking longer” through more detailed reasoning steps. However, this approach has fundamental limitations. When models encounter subtle errors in their reasoning chains, they often compound these mistakes rather than detecting and correcting them. Internal self-reflection frequently fails, especially when the initial reasoning approach is fundamentally flawed. Microsoft’s new research report introduces rStar2-Agent, that takes a different approach: instead of just thinking longer, it teaches models to think smarter by actively using coding tools to verify, explore, and refine their reasoning process. https://arxiv.org/abs/2508.20722 The Agentic Approach rStar2-Agent represents a shift toward agentic reinforcement learning, where a 14B parameter model interacts with a Python execution environment throughout its reasoning process. Rather than relying solely on internal reflection, the model can write code, execute it, analyze the results, and adjust its approach based on concrete feedback. This creates a dynamic problem-solving process. When the model encounters a complex mathematical problem, it might generate initial reasoning, write Python code to test hypotheses, analyze execution results, and iterate toward a solution. The approach mirrors how human mathematicians often work—using computational tools to verify intuitions and explore different solution paths. Infrastructure Challenges and Solutions Scaling agentic RL presents significant technical hurdles. During training, a single batch can generate tens of thousands of concurrent code execution requests, creating bottlenecks that can stall GPU utilization. The researchers addressed this with two key infrastructure innovations. First, they built a distributed code execution service capable of handling 45,000 concurrent tool calls with sub-second latency. The system isolates code execution from the main training process while maintaining high throughput through careful load balancing across CPU workers. Second, they developed a dynamic rollout scheduler that allocates computational work based on real-time GPU cache availability rather than static assignment. This prevents GPU idle time caused by uneven workload distribution—a common problem when some reasoning traces require significantly more computation than others. These infrastructure improvements enabled the entire training process to complete in just one week using 64 AMD MI300X GPUs, demonstrating that frontier-level reasoning capabilities don’t require massive computational resources when efficiently orchestrated. GRPO-RoC: Learning from High-Quality Examples The core algorithmic innovation is Group Relative Policy Optimization with Resampling on Correct (GRPO-RoC). Traditional reinforcement learning in this context faces a quality problem: models receive positive rewards for correct final answers even when their reasoning process includes multiple code errors or inefficient tool usage. GRPO-RoC addresses this by implementing an asymmetric sampling strategy. During training, the algorithm: Oversamples initial rollouts to create a larger pool of reasoning traces Preserves diversity in failed attempts to maintain learning from various error modes Filters positive examples to emphasize traces with minimal tool errors and cleaner formatting This approach ensures the model learns from high-quality successful reasoning while still exposure to diverse failure patterns. The result is more efficient tool usage and shorter, more focused reasoning traces. https://arxiv.org/abs/2508.20722 Training Strategy: From Simple to Complex The training process unfolds in three carefully designed stages, starting with non-reasoning supervised fine-tuning that focuses purely on instruction following and tool formatting—deliberately avoiding complex reasoning examples that might create early biases. Stage 1 constrains responses to 8,000 tokens, forcing the model to develop concise reasoning strategies. Despite this limitation, performance jumps dramatically—from near-zero to over 70% on challenging benchmarks. Stage 2 extends the token limit to 12,000, allowing for more complex reasoning while maintaining the efficiency gains from the first stage. Stage 3 shifts focus to the most difficult problems by filtering out those the model has already mastered, ensuring continued learning from challenging cases. This progression from concise to extended reasoning, combined with increasing problem difficulty, maximizes learning efficiency while minimizing computational overhead. Breakthrough Results The results are striking. rStar2-Agent-14B achieves 80.6% accuracy on AIME24 and 69.8% on AIME25, surpassing much larger models including the 671B parameter DeepSeek-R1. Perhaps more importantly, it accomplishes this with significantly shorter reasoning traces—averaging around 10,000 tokens compared to over 17,000 for comparable models. The efficiency gains extend beyond mathematics. Despite training exclusively on math problems, the model demonstrates strong transfer learning, outperforming specialized models on scientific reasoning benchmarks and maintaining competitive performance on general alignment tasks. https://arxiv.org/abs/2508.20722 Understanding the Mechanisms Analysis of the trained model reveals fascinating behavioral patterns. High-entropy tokens in reasoning traces fall into two categories: traditional “forking tokens” that trigger self-reflection and exploration, and a new category of “reflection tokens” that emerge specifically in response to tool feedback. These reflection tokens represent a form of environment-driven reasoning where the model carefully analyzes code execution results, diagnoses errors, and adjusts its approach accordingly. This creates more sophisticated problem-solving behavior than pure CoT reasoning can achieve. Summary rStar2-Agent demonstrates that moderate-sized models can achieve frontier-level reasoning through sophisticated training rather than brute-force scaling. The approach suggests a more sustainable path toward advanced AI capabilities—one that emphasizes efficiency, tool integration, and smart training strategies over raw computational power. The success of this agentic approach also points toward future AI systems that can seamlessly integrate multiple tools and environments, moving beyond static text generation toward dynamic, interactive problem-solving capabilities. Check out the Paper and GitHub Page. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. The post Microsoft AI Introduces rStar2-Agent: A 14B Math Reasoning Model Trained with Agentic Reinforcement Learning to Achieve Frontier-Level Performance appeared first on MarkTechPost.

Microsoft AI Introduces rStar2-Agent: A 14B Math Reasoning Model Trained with Agentic Reinforcement Learning to Achieve Frontier-Level Performance Lire l’article »

AI, Committee, Actualités, Uncategorized

Microsoft AI Lab Unveils MAI-Voice-1 and MAI-1-Preview: New In-House Models for Voice AI

Microsoft AI lab officially launched MAI-Voice-1 and MAI-1-preview, marking a new phase for the company’s artificial intelligence research and development efforts. The announcement explains how Microsoft AI Lab is getting involved in AI research without any third party involvement. MAI-Voice-1 and MAI-1-preview models supports distinct but complementary roles in speech synthesis and general-purpose language understanding. MAI-Voice-1: Technical Details and Capabilities MAI-Voice-1 is a speech generation model that produces audio with high fidelity. It generates one minute of natural-sounding audio in under one second using a single GPU, supporting applications such as interactive assistants and podcast narration with low latency and hardware needs. Try out here The model uses a transformer-based architecture trained on a diverse multilingual speech dataset. It handles single-speaker and multi-speaker scenarios, providing expressive and context-appropriate voice outputs. MAI-Voice-1 is integrated into Microsoft products like Copilot Daily for voice updates and news summaries. It is available for testing in Copilot Labs, where users can create audio stories or guided narratives from text prompts. Technically, the model focuses on quality, versatility, and speed. Its single-GPU operation differs from systems requiring multiple GPUs, enabling integration in consumer devices and cloud applications beyond research settings MAI-1-Preview: Foundation Model Architecture and Performance MAI-1-preview is Microsoft’s first end-to-end, in-house foundation language model. Unlike previous models that Microsoft integrated or licensed from outside, MAI-1-preview was trained entirely on Microsoft’s own infrastructure, using a mixture-of-experts architecture and approximately 15,000 NVIDIA H100 GPUs. Microsoft AI team have made the MAI-1-preview on the LMArena platform, placing it next to several other models. MAI-1-preview is optimized for instruction-following and everyday conversational tasks, making it suitable for consumer-focused applications rather than enterprise or highly specialized use cases. Microsoft has begun rolling out access to the model for select text-based scenarios within Copilot, with a gradual expansion planned as feedback is collected and the system is refined. Model Development and Training Infrastructure The development of MAI-Voice-1 and MAI-1-preview was supported by Microsoft’s next-generation GB200 GPU cluster, a custom-built infrastructure specifically optimized for training large generative models. In addition to hardware, Microsoft has invested heavily in talent, assembling a team with deep expertise in generative AI, speech synthesis, and large-scale systems engineering. The company’s approach to model development emphasizes a balance between fundamental research and practical deployment, aiming to create systems that are not just theoretically impressive but also reliable and useful in everyday scenarios. Applications MAI-Voice-1 can be used for real-time voice assistance, audio content creation in media and education, or accessibility features. Its ability to simulate multiple speakers supports use in interactive scenarios such as storytelling, language learning, or simulated conversations. The model’s efficiency also allows for deployment on consumer hardware. MAI-1-preview is focused on general language understanding and generation, assisting with tasks like drafting emails, answering questions, summarizing text, or helping with understanding and assisting school tasks in a conversational format. Conclusion Microsoft’s release of MAI-Voice-1 and MAI-1-preview shows the company can now develop core generative AI models internally, backed by substantial investment in training infrastructure and technical talent. Both models are intended for practical, real-world use and are being refined with user feedback. This development adds to the diversity of model architectures and training methods in the field, with a focus on systems that are efficient, reliable, and suitable for integration into everyday applications. Microsoft’s approach—using large-scale resources, gradual deployment, and direct engagement with users—offers one example of how organizations can progress AI capabilities while emphasizing practical, incremental improvement. Check out the Technical details here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. The post Microsoft AI Lab Unveils MAI-Voice-1 and MAI-1-Preview: New In-House Models for Voice AI appeared first on MarkTechPost.

Microsoft AI Lab Unveils MAI-Voice-1 and MAI-1-Preview: New In-House Models for Voice AI Lire l’article »

AI, Committee, Actualités, Uncategorized

Top 20 Voice AI Blogs and News Websites 2025: The Ultimate Resource Guide

Voice AI technology has experienced unprecedented growth in 2025, with revolutionary breakthroughs in real-time conversational AI, emotional intelligence, and voice synthesis. As enterprises increasingly adopt voice agents and consumers embrace next-generation AI assistants, staying informed about the latest developments has become crucial for professionals across industries. The global Voice AI market has reached $5.4 billion in 2024, reflecting a remarkable 25% increase from the previous year, with voice AI solutions attracting $2.1 billion in equity funding. Top 20 Voice AI Blogs and Websites 1. OpenAI Blog – Voice AI Research & Development OpenAI leads the voice AI revolution with groundbreaking models like GPT-4o Realtime API and advanced text-to-speech systems. Their blog provides insider insights into cutting-edge research, model releases, and real-world applications. OpenAI’s recent announcement of gpt-realtime and Realtime API updates for production voice agents represents a major breakthrough in conversational AI. Key Focus Areas: Real-time speech-to-speech models Voice synthesis and emotional expression Safety and responsible AI deployment Developer tools and APIs 2. MarkTechPost – Voice AI News & Analysis MarkTechPost has established itself as the go-to source for comprehensive AI news coverage, with exceptional depth in voice AI reporting. Their expert analysis of emerging technologies and market trends makes complex developments accessible to both technical and business audiences. Their recent coverage of Microsoft’s MAI-Voice-1 launch and comprehensive analysis of the voice AI landscape demonstrates their commitment to timely, authoritative reporting. Key Focus Areas: Voice AI market analysis and trends Technical breakthroughs in speech synthesis Enterprise voice agent implementations Industry funding and acquisitions 3. Google AI Blog – Multimodal & Speech Research Google’s research team consistently pushes the boundaries of conversational AI, with innovations like real-time voice agent architecture and advanced speech recognition systems. Their recent work on building real-time voice agents with Gemini demonstrates practical applications of their research. Key Contributions: Multimodal AI integration Real-time voice agent architecture Speech understanding and generation Privacy-preserving voice technologies 4. Microsoft Azure AI Blog – Enterprise Voice Solutions Microsoft’s Azure AI Speech services power millions of enterprise applications. Their blog provides practical insights into implementing voice AI at scale, including personal voice creation, enterprise speech-to-text solutions, and multilingual voice support.autogpt+3 Focus Areas: Personal voice creation and customization Enterprise speech-to-text solutions Multilingual voice support Azure cognitive services integration 5. ElevenLabs Blog – Voice Synthesis Innovation ElevenLabs has revolutionized voice cloning and synthesis, setting new standards for natural-sounding AI voices. The company secured $180 million in Series C funding in January 2025, reaching a valuation of $3.3 billion, demonstrating strong investor confidence in their technology. Specializations: Voice cloning technology Multilingual speech synthesis Creative applications in media API development for voice integration 6. Deepgram Blog – Speech Recognition Excellence Deepgram’s State of Voice AI 2025 report provides authoritative market analysis, identifying 2025 as “the year of human-like voice AI agents”. Their technical content explores the latest in speech recognition and real-time transcription. Key Insights: Voice AI market trends and predictions Technical deep-dives into speech recognition Developer tutorials and best practices Industry adoption case studies 7. Anthropic Research – Conversational AI Ethics & Voice Mode Anthropic’s work on Claude focuses on safe, beneficial AI development with emphasis on alignment and responsible deployment. In May 2025, Anthropic launched voice mode for Claude, powered by Claude Sonnet 4, enabling complete spoken conversations with five distinct voice options. Focus Areas: AI safety in conversational systems Ethical voice AI development Human-AI interaction research Voice mode implementation using ElevenLabs technology 8. Stanford HAI Blog – Academic Voice AI Research Stanford’s Human-Centered AI Institute produces cutting-edge research on voice interaction and turn-taking in conversations. Their recent work on teaching voice assistants when to speak represents breakthrough research in conversational AI, moving beyond simple silence detection to analyze voice intonation patterns. Research Highlights: Conversational AI turn-taking and interruption handling World Wide Voice Web (WWvW) development Silent speech recognition advances Open-source virtual assistant development 9. Hume AI Blog – Emotionally Intelligent Voice Hume AI specializes in emotionally intelligent voice interactions, combining speech technology with empathic understanding. Their Empathic Voice Interface (EVI 3) represents a breakthrough in conversational AI, capable of understanding and responding with natural, emotionally intelligent voice interactions. Innovations: Emotional intelligence in voice AI Empathic voice interfaces Voice control and customization Human wellbeing optimization through AI 10. MIT Technology Review – Voice AI Analysis MIT Technology Review provides in-depth analysis of voice AI trends, societal implications, and breakthrough research with rigorous journalistic standards. Their coverage includes voice AI diversity initiatives, synthetic voice technology implications, and ethical considerations in voice technology deployment. Coverage Areas: Voice AI diversity and inclusion Audio deepfake detection and prevention Industry analysis and market trends Ethical considerations in voice tech 11. Resemble AI Blog – Voice Cloning & Security Resemble AI leads in voice cloning technology while addressing security concerns like deepfake detection. They specialize in advanced voice cloning techniques, enterprise voice solutions, and voice security authentication. Expertise: Advanced voice cloning techniques Deepfake detection and prevention Enterprise voice solutions Voice security and authentication 12. TechCrunch – Voice AI Industry News TechCrunch provides comprehensive coverage of voice AI startups, funding rounds, and industry developments. They extensively covered Anthropic’s voice mode launch and provide regular updates on industry partnerships and product launches. Coverage Focus: Startup funding and acquisitions Industry partnerships and deals Product launches and demos Market analysis and predictions 13. VentureBeat AI – Voice Technology Trends VentureBeat offers detailed coverage of voice AI business applications and enterprise adoption trends. They specialize in enterprise AI adoption analysis, voice technology market research, and developer tools coverage. Specializations: Enterprise AI adoption Voice technology market analysis Product reviews and comparisons Developer tools and platforms 14. Towards Data Science – Technical Voice AI Content This Medium publication features hands-on tutorials, technical deep-dives, and practical implementations of voice AI technologies. Content includes privacy-preserving voice AI implementations, voice assistant tuning, and AI-powered language learning applications. Content Types: Technical tutorials and guides Voice AI implementation case studies Python and machine learning applications Data science approaches to speech 15. Amazon Alexa Blog – Voice Assistant Innovation Amazon’s Alexa team shares

Top 20 Voice AI Blogs and News Websites 2025: The Ultimate Resource Guide Lire l’article »

AI, Committee, Actualités, Uncategorized

Multi-Lingual Implicit Discourse Relation Recognition with Multi-Label Hierarchical Learning

arXiv:2508.20712v1 Announce Type: new Abstract: This paper introduces the first multi-lingual and multi-label classification model for implicit discourse relation recognition (IDRR). Our model, HArch, is evaluated on the recently released DiscoGeM 2.0 corpus and leverages hierarchical dependencies between discourse senses to predict probability distributions across all three sense levels in the PDTB 3.0 framework. We compare several pre-trained encoder backbones and find that RoBERTa-HArch achieves the best performance in English, while XLM-RoBERTa-HArch performs best in the multi-lingual setting. In addition, we compare our fine-tuned models against GPT-4o and Llama-4-Maverick using few-shot prompting across all language configurations. Our results show that our fine-tuned models consistently outperform these LLMs, highlighting the advantages of task-specific fine-tuning over prompting in IDRR. Finally, we report SOTA results on the DiscoGeM 1.0 corpus, further validating the effectiveness of our hierarchical approach.

Multi-Lingual Implicit Discourse Relation Recognition with Multi-Label Hierarchical Learning Lire l’article »

AI, Committee, Actualités, Uncategorized

Token Buncher: Shielding LLMs from Harmful Reinforcement Learning Fine-Tuning

arXiv:2508.20697v1 Announce Type: cross Abstract: As large language models (LLMs) continue to grow in capability, so do the risks of harmful misuse through fine-tuning. While most prior studies assume that attackers rely on supervised fine-tuning (SFT) for such misuse, we systematically demonstrate that reinforcement learning (RL) enables adversaries to more effectively break safety alignment and facilitate advanced harmful task assistance, under matched computational budgets. To counter this emerging threat, we propose TokenBuncher, the first effective defense specifically targeting RL-based harmful fine-tuning. TokenBuncher suppresses the foundation on which RL relies: model response uncertainty. By constraining uncertainty, RL-based fine-tuning can no longer exploit distinct reward signals to drive the model toward harmful behaviors. We realize this defense through entropy-as-reward RL and a Token Noiser mechanism designed to prevent the escalation of expert-domain harmful capabilities. Extensive experiments across multiple models and RL algorithms show that TokenBuncher robustly mitigates harmful RL fine-tuning while preserving benign task utility and finetunability. Our results highlight that RL-based harmful fine-tuning poses a greater systemic risk than SFT, and that TokenBuncher provides an effective and general defense.

Token Buncher: Shielding LLMs from Harmful Reinforcement Learning Fine-Tuning Lire l’article »

AI, Committee, Actualités, Uncategorized

Measuring Reasoning Utility in LLMs via Conditional Entropy Reduction

arXiv:2508.20395v1 Announce Type: new Abstract: Recent advancements in large language models (LLMs) often rely on generating intermediate reasoning steps to enhance accuracy. However, little work has examined how reasoning utility contributes to the final answer’s correctness. Due to the stochastic nature of autoregressive generation, generating more context does not guarantee increased confidence in the answer. If we could predict, during generation, whether a reasoning step will be useful, we could stop early or prune ineffective steps, avoiding distractions in the final decision. We present an oracle study on MATH dataset, using Qwen2.5-32B and GPT-4o to generate reasoning chains, and then employing a separate model (Qwen3-8B) to quantify the utility of these chains for final accuracy. Specifically, we measure the model’s uncertainty on the answer span Y at each reasoning step using conditional entropy (expected negative log-likelihood over the vocabulary) with context expanding step by step. Our results show a clear pattern: conditional entropy that decreases over steps is strongly associated with correct answers, whereas flat or increasing entropy often results in wrong answers. We also corroborate that incorrect reasoning paths tend to be longer than correct ones, suggesting that longer reasoning does not necessarily yield better outcomes. These findings serve as a foundation to inspire future work on designing efficient reasoning pipelines that detect and avoid unproductive reasoning early.

Measuring Reasoning Utility in LLMs via Conditional Entropy Reduction Lire l’article »

AI, Committee, Actualités, Uncategorized

CoCoTen: Detecting Adversarial Inputs to Large Language Models through Latent Space Features of Contextual Co-occurrence Tensors

arXiv:2508.02997v3 Announce Type: replace Abstract: The widespread use of Large Language Models (LLMs) in many applications marks a significant advance in research and practice. However, their complexity and hard-to-understand nature make them vulnerable to attacks, especially jailbreaks designed to produce harmful responses. To counter these threats, developing strong detection methods is essential for the safe and reliable use of LLMs. This paper studies this detection problem using the Contextual Co-occurrence Matrix, a structure recognized for its efficacy in data-scarce environments. We propose a novel method leveraging the latent space characteristics of Contextual Co-occurrence Matrices and Tensors for the effective identification of adversarial and jailbreak prompts. Our evaluations show that this approach achieves a notable F1 score of 0.83 using only 0.5% of labeled prompts, which is a 96.6% improvement over baselines. This result highlights the strength of our learned patterns, especially when labeled data is scarce. Our method is also significantly faster, speedup ranging from 2.3 to 128.4 times compared to the baseline models.

CoCoTen: Detecting Adversarial Inputs to Large Language Models through Latent Space Features of Contextual Co-occurrence Tensors Lire l’article »

We use cookies to improve your experience and performance on our website. You can learn more at Politique de confidentialité and manage your privacy settings by clicking Settings.

Privacy Preferences

You can choose your cookie settings by turning on/off each type of cookie as you wish, except for essential cookies.

Allow All
Manage Consent Preferences
  • Always Active

Save
fr_FR