YouZum

Uncategorized

AI, Committee, 新闻, Uncategorized

Chai Discovery Team Releases Chai-2: AI Model Achieves 16% Hit Rate in De Novo Antibody Design

TLDR: Chai Discovery Team introduces Chai-2, a multimodal AI model that enables zero-shot de novo antibody design. Achieving a 16% hit rate across 52 novel targets using ≤20 candidates per target, Chai-2 outperforms prior methods by over 100x and delivers validated binders in under two weeks—eliminating the need for large-scale screening. In a significant advancement for computational drug discovery, the Chai Discovery Team has introduced Chai-2, a multimodal generative AI platform capable of zero-shot antibody and protein binder design. Unlike previous approaches that rely on extensive high-throughput screening, Chai-2 reliably designs functional binders in a single 24-well plate setup, achieving more than 100-fold improvement over existing state-of-the-art (SOTA) methods. Chai-2 was tested on 52 novel targets, none of which had known antibody or nanobody binders in the Protein Data Bank (PDB). Despite this challenge, the system achieved a 16% experimental hit rate, discovering binders for 50% of the tested targets within a two-week cycle from computational design to wet-lab validation. This performance marks a shift from probabilistic screening to deterministic generation in molecular engineering. AI-Powered De Novo Design at Experimental Scale Chai-2 integrates an all-atom generative design module and a folding model that predicts antibody-antigen complex structures with double the accuracy of its predecessor, Chai-1. The system operates in a zero-shot setting, generating sequences for antibody modalities like scFvs and VHHs without requiring prior binders. Key features of Chai-2 include: No target-specific tuning required Ability to prompt designs using epitope-level constraints Generation of therapeutically relevant formats (miniproteins, scFvs, VHHs) Support for cross-reactivity design between species (e.g., human and cyno) This approach allows researchers to design ≤20 antibodies or nanobodies per target and bypass the need for high-throughput screening altogether. Benchmarking Across Diverse Protein Targets In rigorous lab validations, Chai-2 was applied to targets with no sequence or structure similarity to known antibodies. Designs were synthesized and tested using bio-layer interferometry (BLI) for binding. Results show: 15.5% average hit rate across all formats 20.0% for VHHs, 13.7% for scFvs Successful binders for 26 out of 52 targets Notably, Chai-2 produced hits for hard targets such as TNFα, which has historically been intractable for in silico design. Many binders showed picomolar to low-nanomolar dissociation constants (KDs), indicating high-affinity interactions. Novelty, Diversity, and Specificity Chai-2’s outputs are structurally and sequentially distinct from known antibodies. Structural analysis showed: No generated design had <2Å RMSD from any known structure All CDR sequences had >10 edit distance from the closest known antibody Binders fell into multiple structural clusters per target, suggesting conformational diversity Additional evaluations confirmed low off-target binding and comparable polyreactivity profiles to clinical antibodies like Trastuzumab and Ixekizumab. Design Flexibility and Customization Beyond general-purpose binder generation, Chai-2 demonstrates the ability to: Target multiple epitopes on a single protein Produce binders across different antibody formats (e.g., scFv, VHH) Generate cross-species reactive antibodies in one prompt In a cross-reactivity case study, a Chai-2 designed antibody achieved nanomolar KDs against both human and cyno variants of a protein, demonstrating its utility for preclinical studies and therapeutic development. Implications for Drug Discovery Chai-2 effectively compresses the traditional biologics discovery timeline from months to weeks, delivering experimentally validated leads in a single round. Its combination of high success rate, design novelty, and modular prompting marks a paradigm shift in therapeutic discovery workflows. The framework can be extended beyond antibodies to miniproteins, macrocycles, enzymes, and potentially small molecules, paving the way for computational-first design paradigms. Future directions include expanding into bispecifics, ADCs, and exploring biophysical property optimization (e.g., viscosity, aggregation). As the field of AI in molecular design matures, Chai-2 sets a new bar for what can be achieved with generative models in real-world drug discovery settings. Check out the Technical Report. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter, Youtube and Spotify and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. The post Chai Discovery Team Releases Chai-2: AI Model Achieves 16% Hit Rate in De Novo Antibody Design appeared first on MarkTechPost.

Chai Discovery Team Releases Chai-2: AI Model Achieves 16% Hit Rate in De Novo Antibody Design Read Post »

AI, Committee, 新闻, Uncategorized

Kyutai Releases 2B Parameter Streaming Text-to-Speech TTS with 220ms Latency and 2.5M Hours of Training

Kyutai, an open AI research lab, has released a groundbreaking streaming Text-to-Speech (TTS) model with ~2 billion parameters. Designed for real-time responsiveness, this model delivers ultra-low latency audio generation (220 milliseconds) while maintaining high fidelity. It’s trained on an unprecedented 2.5 million hours of audio and is licensed under the permissive CC-BY-4.0, reinforcing Kyutai’s commitment to openness and reproducibility. This advancement redefines the efficiency and accessibility of large-scale speech generation models, particularly for edge deployment and agentic AI. Unpacking the Performance: Sub-350ms Latency for 32 Concurrent Users on a Single L40 GPU The model’s streaming capability is its most distinctive feature. On a single NVIDIA L40 GPU, the system can serve up to 32 concurrent users while keeping the latency under 350ms. For individual use, the model maintains a generation latency as low as 220ms, enabling nearly real-time applications such as conversational agents, voice assistants, and live narration systems. This performance is enabled through Kyutai’s novel Delayed Streams Modeling approach, which allows the model to generate speech incrementally as text arrives. Key Technical Metrics: Model size: ~2B parameters Training data: 2.5 million hours of speech Latency: 220ms single-user, <350ms with 32 users on one L40 GPU Language support: English and French License: CC-BY-4.0 (open source) Delayed Streams Modeling: Architecting Real-Time Responsiveness Kyutai’s innovation is anchored in Delayed Streams Modeling, a technique that allows speech synthesis to begin before the full input text is available. This approach is specifically designed to balance prediction quality with response speed, enabling high-throughput streaming TTS. Unlike conventional autoregressive models that suffer from response lag, this architecture maintains temporal coherence while achieving faster-than-real-time synthesis. The codebase and training recipe for this architecture are available at Kyutai’s GitHub repository, supporting full reproducibility and community contributions. Model Availability and Open Research Commitment Kyutai has released the model weights and inference scripts on Hugging Face, making it accessible for researchers, developers, and commercial teams. The permissive CC-BY-4.0 license encourages unrestricted adaptation and integration into applications, provided proper attribution is maintained. This release supports both batch and streaming inference, making it a versatile foundation for voice cloning, real-time chatbots, accessibility tools, and more. With pretrained models in both English and French, Kyutai sets the stage for multilingual TTS pipelines. Implications for Real-Time AI Applications By reducing the speech generation latency to the 200ms range, Kyutai’s model narrows the human-perceptible delay between intent and speech, making it viable for: Conversational AI: Human-like voice interfaces with low turnaround Assistive Tech: Faster screen readers and voice feedback systems Media Production: Voiceovers with rapid iteration cycles Edge Devices: Optimized inference for low-power or on-device environments The ability to serve 32 users on a single L40 GPU without quality degradation also makes it attractive for scaling speech services efficiently in cloud environments. Conclusion: Open, Fast, and Ready for Deployment Kyutai’s streaming TTS release is a milestone in speech AI. With high-quality synthesis, real-time latency, and generous licensing, it addresses critical needs for both researchers and real-world product teams. The model’s reproducibility, multilingual support, and scalable performance make it a standout alternative to proprietary solutions. For more details, you can explore the official model card on Hugging Face, technical explanation on Kyutai’s site, and implementation specifics on GitHub. The post Kyutai Releases 2B Parameter Streaming Text-to-Speech TTS with 220ms Latency and 2.5M Hours of Training appeared first on MarkTechPost.

Kyutai Releases 2B Parameter Streaming Text-to-Speech TTS with 220ms Latency and 2.5M Hours of Training Read Post »

AI, Committee, 新闻, Uncategorized

AbstRaL: Teaching LLMs Abstract Reasoning via Reinforcement to Boost Robustness on GSM Benchmarks

Recent research indicates that LLMs, particularly smaller ones, frequently struggle with robust reasoning. They tend to perform well on familiar questions but falter when those same problems are slightly altered, such as changing names or numbers, or adding irrelevant but related information. This weakness, known as poor out-of-distribution (OOD) generalization, results in notable accuracy drops, even in simple math tasks. One promising solution is to create synthetic variations of reasoning problems, helping models learn to focus on the underlying logic rather than surface details. Strengthening reasoning in this manner is crucial for developing more general and reliable AI systems. Abstracting the Core Logic of LLM Reasoning Failures LLMs have demonstrated impressive reasoning capabilities, yet they often falter when exposed to distribution shifts, such as changes in phrasing, numerical values, or the introduction of distractions. This vulnerability is evident across benchmarks in logic, mathematics, and commonsense reasoning. Prior solutions have relied on data augmentation to expose models to a broader variety of inputs, improving robustness but increasing computational demands. Researchers have also explored formats such as abstraction-of-thought and chain-of-abstraction to teach abstract reasoning, while planning techniques like chain-of-thought and tree-of-thought aid step-by-step problem-solving. Reinforcement learning and preference-based methods provide additional support for reasoning skill development beyond pattern memorization. AbstRaL’s Symbolic Learning Method to Improve Reasoning Consistency Researchers from Apple and EPFL propose AbstRaL, a method that teaches LLMs to understand abstract reasoning patterns rather than memorizing surface details. Instead of generating many varied training examples, which is computationally costly, AbstRaL helps LLMs learn the underlying structure of reasoning problems using reinforcement learning. This method connects these abstract patterns to symbolic tools, enabling more reliable problem-solving. Tested on GSM benchmarks, AbstRaL significantly improves LLM performance, especially when faced with input changes or distracting information. It outperforms models trained only with supervised learning by promoting more consistent and context-independent reasoning. Four Steps to Abstract Symbolic Reasoning via AbstRaL AbstRaL is a four-step framework designed to teach LLMs to reason abstractly rather than rely on surface patterns. First, it identifies key variables in a question and replaces them with symbolic placeholders. Then, using specially crafted data (GranulAR), the model learns to reason step-by-step with these abstract symbols. Next, it retrieves the general reasoning structure (abstraction) from the symbolic answer. Finally, it uses this abstraction with the original values to compute the correct answer. Reinforcement learning with two rewards, one for correctness and another for symbolic similarity, further improves the model’s ability to generate accurate, context-independent reasoning patterns. GSM8K Variations Reveal AbstRaL’s Robustness Across LLM Sizes The researchers evaluate AbstRaL on math reasoning tasks using models such as Llama-3 and Qwen2, training them with a dataset called GranulAR that rewrites math problems in an abstract symbolic form. This helps models focus on structure rather than surface details. They test robustness using altered versions of GSM8K problems, changing numbers, names, and phrasing. Compared to baselines like standard Chain-of-Thought prompting, AbstRaL shows stronger consistency and less accuracy drop on these variations. Especially for smaller models, it improves reliability across reworded inputs. The results suggest that teaching models to reason abstractly makes them more adaptable and less reliant on memorized patterns. Teaching LLMs Abstract Thinking through Reinforcement Yields Robust Reasoning In conclusion, AbstRaL is a method designed to enhance abstract reasoning in LLMs, making them more resilient to superficial changes in problems. Unlike traditional fine-tuning or data augmentation, AbstRaL uses reinforcement learning to train models on GranulAR rationales that mix Socratic chain-of-thought with detailed abstraction. This approach helps models strip away surface-level distractions and better connect with symbolic tools. Tested on challenging GSM8K perturbation benchmarks, AbstRaL notably reduces performance drops under distribution shifts, particularly in smaller models. The study shows that learning to abstract improves reasoning robustness more effectively than relying solely on direct supervision. Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter, Youtube and Spotify and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. The post AbstRaL: Teaching LLMs Abstract Reasoning via Reinforcement to Boost Robustness on GSM Benchmarks appeared first on MarkTechPost.

AbstRaL: Teaching LLMs Abstract Reasoning via Reinforcement to Boost Robustness on GSM Benchmarks Read Post »

AI, Committee, 新闻, Uncategorized

Inside India’s scramble for AI independence

In Bengaluru, India, Adithya Kolavi felt a mix of excitement and validation as he watched DeepSeek unleash its disruptive language model on the world earlier this year. The Chinese technology rivaled the best of the West in terms of benchmarks, but it had been built with far less capital in far less time.  “I thought: ‘This is how we disrupt with less,’” says Kolavi, the 20-year-old founder of the Indian AI startup CognitiveLab. “If DeepSeek could do it, why not us?”  But for Abhishek Upperwal, founder of Soket AI Labs and architect of one of India’s earliest efforts to develop a foundation model, the moment felt more bittersweet.  Upperwal’s model, called Pragna-1B, had struggled to stay afloat with tiny grants while he watched global peers raise millions. The multilingual model had a relatively modest 1.25 billion parameters and was designed to reduce the “language tax,” the extra costs that arise because India—unlike the US or even China—has a multitude of languages to support. His team had trained it, but limited resources meant it couldn’t scale. As a result, he says, the project became a proof of concept rather than a product.  “If we had been funded two years ago, there’s a good chance we’d be the ones building what DeepSeek just released,” he says. Kolavi’s enthusiasm and Upperwal’s dismay reflect the spectrum of emotions among India’s AI builders. Despite its status as a global tech hub, the country lags far behind the likes of the US and China when it comes to homegrown AI. That gap has opened largely because India has chronically underinvested in R&D, institutions, and invention. Meanwhile, since no one native language is spoken by the majority of the population, training language models is far more complicated than it is elsewhere.  Historically known as the global back office for the software industry, India has a tech ecosystem that evolved with a services-first mindset. Giants like Infosys and TCS built their success on efficient software delivery, but invention was neither prioritized nor rewarded. Meanwhile, India’s R&D spending hovered at just 0.65% of GDP ($25.4 billion) in 2024, far behind China’s 2.68% ($476.2 billion) and the US’s 3.5% ($962.3 billion). The muscle to invent and commercialize deep tech, from algorithms to chips, was just never built. Isolated pockets of world-class research do exist within government agencies like the DRDO (Defense Research & Development Organization) and ISRO (Indian Space Research Organization), but their breakthroughs rarely spill into civilian or commercial use. India lacks the bridges to connect risk-taking research to commercial pathways, the way DARPA does in the US. Meanwhile, much of India’s top talent migrates abroad, drawn to ecosystems that better understand and, crucially, fund deep tech. So when the open-source foundation model DeepSeek-R1 suddenly outperformed many global peers, it struck a nerve. This launch by a Chinese startup prompted Indian policymakers to confront just how far behind the country was in AI infrastructure, and how urgently it needed to respond. India responds In January 2025, 10 days after DeepSeek-R1’s launch, the Ministry of Electronics and Information Technology (MeitY) solicited proposals for India’s own foundation models, which are large AI models that can be adapted to a wide range of tasks. Its public tender invited private-sector cloud and data‑center companies to reserve GPU compute capacity for government‑led AI research.  Providers including Jio, Yotta, E2E Networks, Tata, AWS partners, and CDAC responded. Through this arrangement, MeitY suddenly had access to nearly 19,000 GPUs at subsidized rates, repurposed from private infrastructure and allocated specifically to foundational AI projects. This triggered a surge of proposals from companies wanting to build their own models.  Within two weeks, it had 67 proposals in hand. That number tripled by mid-March.  In April, the government announced plans to develop six large-scale models by the end of 2025, plus 18 additional AI applications targeting sectors like agriculture, education, and climate action. Most notably, it tapped Sarvam AI to build a 70-billion-parameter model optimized for Indian languages and needs.  For a nation long restricted by limited research infrastructure, things moved at record speed, marking a rare convergence of ambition, talent, and political will. “India could do a Mangalyaan in AI,” said Gautam Shroff of IIIT-Delhi, referencing the country’s cost-effective, and successful, Mars orbiter mission.  Jaspreet Bindra, cofounder of AI&Beyond, an organization focused on teaching AI literacy, captured the urgency: “DeepSeek is probably the best thing that happened to India. It gave us a kick in the backside to stop talking and start doing something.” The language problem One of the most fundamental challenges in building foundational AI models for India is the country’s sheer linguistic diversity. With 22 official languages, hundreds of dialects, and millions of people who are multilingual, India poses a problem that few existing LLMs are equipped to handle. Whereas a massive amount of high-quality web data is available in English, Indian languages collectively make up less than 1% of online content. The lack of digitized, labeled, and cleaned data in languages like Bhojpuri and Kannada makes it difficult to train LLMs that understand how Indians actually speak or search. Global tokenizers, which break text into units a model can process, also perform poorly on many Indian scripts, misinterpreting characters or skipping some altogether. As a result, even when Indian languages are included in multilingual models, they’re often poorly understood and inaccurately generated. And unlike OpenAI and DeepSeek, which achieved scale using structured English-language data, Indian teams often begin with fragmented and low-quality data sets encompassing dozens of Indian languages. This makes the early steps of training foundation models far more complex. Nonetheless, a small but determined group of Indian builders is starting to shape the country’s AI future. For example, Sarvam AI has created OpenHathi-Hi-v0.1, an open-source Hindi language model that shows the Indian AI field’s growing ability to address the country’s vast linguistic diversity. The model, built on Meta’s Llama 2 architecture, was trained on 40 billion tokens of Hindi and related Indian-language content, making it one of the largest open-source Hindi

Inside India’s scramble for AI independence Read Post »

AI, Committee, 新闻, Uncategorized

The Download: India’s AI independence, and predicting future epidemics

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Inside India’s scramble for AI independence Despite its status as a global tech hub, India lags far behind the likes of the US and China when it comes to homegrown AI. That gap has opened largely because India has chronically underinvested in R&D, institutions, and invention. Meanwhile, since no one native language is spoken by the majority of the population, training language models is far more complicated than it is elsewhere. So when the open-source foundation model DeepSeek-R1 suddenly outperformed many global peers, it struck a nerve. This launch by a Chinese startup prompted Indian policymakers to confront just how far behind the country was in AI infrastructure—and how urgently it needed to respond. Read the full story. —Shadma Shaikh Job titles of the future: Pandemic oracle Officially, Conor Browne is a biorisk consultant. Based in Belfast, Northern Ireland, he has advanced degrees in security studies and medical and business ethics, along with United Nations certifications in counterterrorism and conflict resolution. Early in the emergence of SARS-CoV-2, international energy conglomerates seeking expert guidance on navigating the potential turmoil in markets and transportation became his main clients.  Having studied the 2002 SARS outbreak, he predicted the exponential spread of the new airborne virus. In fact, he forecast the epidemic’s broadscale impact and its implications for business so accurately that he has come to be seen as a pandemic oracle. Read the full story. —Britta Shoot This story is from the most recent print edition of MIT Technology Review, which explores power—who has it, and who wants it. Subscribe here to receive future copies once they drop. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 Donald Trump’s ‘big beautiful bill’ has passed  Which is terrible news for the clean energy industry. (Vox)+ An energy-affordability crisis is looming in the US. (The Atlantic $)+ The President struck deals with House Republican holdouts to get it over the line. (WSJ $)+ The Trump administration has shut down more than 100 climate studies. (MIT Technology Review) 2 Daniel Gross is joining Meta’s superintelligence lab He’s jumping ship from the startup he co-founded with Ilya Sutskever. (Bloomberg $)+ Sutskever is stepping into the CEO role in his absence. (TechCrunch)+ Here’s what we can infer from Meta’s recent hires. (Semafor)3 AI’s energy demands could destabilize the global supplyThat’s according to the head of the world’s largest transformer maker. (FT $)+ We did the math on AI’s energy footprint. Here’s the story you haven’t heard. (MIT Technology Review)4 Elon Musk is threatening to start his own political partyWould anyone vote for him, though? (WP $)+ You’d think his bruising experience in the White House would have put him off. (NY Mag $) 5 The US has lifted exports on chip design software to ChinaIt suggests that frosty relations between the nations may be thawing. (Reuters) 6 Trump officials are going after this ICE warning appBut lawyers say there’s nothing illegal about it. (Wired $)+ Downloads of ICEBlock are rising. (NBC News) 7 Wildfires are making it harder to monitor air pollutantsCurrent tracking technology isn’t built to accommodate shifting smoke. (Undark)+ How AI can help spot wildfires. (MIT Technology Review) 8 Apple’s iOS 26 software can detect nudity on FaceTime callsThe feature will pause the call and ask if you want to continue. (Gizmodo) 9 Threads has finally launched DMsBut users are arguing there should be a way to opt out of them entirely. (TechCrunch) 10 You can hire a robot to write a handwritten note Or, y’know, pick up a pen and write it yourself. (Insider $) Quote of the day “It’s almost like we never even spoke.” Richard Wilson, an online dater who is convinced his most recent love interest used a chatbot to converse with him online before they awkwardly met in person, tells the Washington Post about his disappointment. One more thing Deepfakes of your dead loved ones are a booming Chinese business Once a week, Sun Kai has a video call with his mother, and they discuss his day-to-day life. But Sun’s mother died five years ago, and the person he’s talking to isn’t actually a person, but a digital replica he made of her. There are plenty of people like Sun who want to use AI to preserve, animate, and interact with lost loved ones as they mourn and try to heal. The market is particularly strong in China, where at least half a dozen companies are now offering such technologies and thousands of people have already paid for them. But some question whether interacting with AI replicas of the dead is truly a healthy way to process grief, and it’s not entirely clear what the legal and ethical implications of this technology may be. Read the full story. —Zeyi Yang We can still have nice things A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or skeet ’em at me.)+ There’s nothing cooler than wooden interiors right now.+ Talented artist Ian Robinson creates beautiful paintings of people’s vinyl collections.+ You’ll find me in every one of Europe’s top wine destinations this summer.+ Here’s everything you need to remember before Stranger Things returns this fall.

The Download: India’s AI independence, and predicting future epidemics Read Post »

AI, Committee, 新闻, Uncategorized

Can We Improve Llama 3’s Reasoning Through Post-Training Alone? ASTRO Shows +16% to +20% Benchmark Gains

Improving the reasoning capabilities of large language models (LLMs) without architectural changes is a core challenge in advancing AI alignment and usability. Researchers at Meta AI and the University of Washington have introduced ASTRO—Autoregressive Search-Taught Reasoner—a novel post-training framework designed to enhance reasoning in Llama-3.1-70B-Instruct. ASTRO is unique in teaching models to perform in-context search, self-reflection, and backtracking, mechanisms often associated with human problem-solving and traditional symbolic search algorithms. Through this approach, ASTRO boosts Llama 3’s math performance on several competitive benchmarks with significant improvements: MATH 500: 65.8% ➝ 81.8% AMC 2023: 37.5% ➝ 64.4% AIME 2024: 10.0% ➝ 30.0% Search-Guided Chain-of-Thought Generation ASTRO’s methodology begins with a Monte Carlo Tree Search (MCTS) over mathematical problem-solving trajectories. This search explores both correct and incorrect reasoning paths. The key innovation is procedure cloning: entire search trees are linearized into long chain-of-thoughts (CoT) that naturally encode both failures and recoveries via self-reflection and backtracking. These linearized traces are rewritten in natural language and used as the basis for supervised fine-tuning (SFT). This results in a model that doesn’t just solve problems step-by-step but reevaluates its trajectory—often backtracking after self-assessment to correct intermediate reasoning mistakes. For instance, the model may interject with phrases like “Let’s go back to where we set up the equation” when its internal confidence drops. Supervised Fine-Tuning: Injecting Search Priors ASTRO fine-tunes Llama-3.1-70B-Instruct on 36.1K curated CoT solutions from MATH, AMC/AIME, and AoPS-style datasets. The model trained with ASTRO-SFT achieves: MATH 500: 69.6% AMC 2023: 51.9% AIME 2024: 16.3% These scores are competitive with or exceed those of baseline and SPOC/Step-KTO variants trained without explicit search priors. Importantly, even SFT alone—without reinforcement learning—yields performance boosts by exposing the model to search-structured reasoning data. Reinforcement Learning with Search-Aware Initialization ASTRO proceeds to reinforcement learning (RL) by initializing with the SFT checkpoint and running an RL loop using a modified Group Relative Policy Optimization (GRPO). Unlike standard preference-based RL, ASTRO employs verifiable reward signals (+1 for correct, -1 for incorrect) on 8.7K moderately difficult prompts. During training, the model’s CoT generation grows longer—from ~1.8K to ~6K tokens—demonstrating deeper internal exploration. The resulting ASTRO-RL model achieves: MATH 500: 81.8% AMC 2023: 64.4% AIME 2024: 30.0% These results rival or exceed models with larger parameter counts and confirm the importance of ASTRO’s search-aware initialization. Backtracking Behavior Correlates with Reasoning Success A striking empirical observation is the positive correlation between backtracking frequency and performance. As training progresses, ASTRO-RL exhibits more self-corrective actions and deeper exploration. Pearson correlation coefficients across benchmarks exceed 0.8, indicating that self-reflection and backtracking are not merely cosmetic behaviors but functionally tied to better accuracy. Comparative Insights and Broader Impact Control experiments comparing ASTRO with models trained on direct CoT solutions (no search priors) reveal that even when trained on the same problem sets and search trees, ASTRO consistently outperforms. For instance, ASTRO-RL beats Direct-RL by: +2% on MATH 500 +3.9% on AMC 2023 +2.9% on AIME 2024 Moreover, ASTRO’s outputs can be visualized as directed graphs, with nodes as reasoning steps and edges capturing transitions, reflections, and corrections—facilitating better interpretability. ASTRO Key Takeaways Table Conclusion ASTRO demonstrates that LLMs like Llama 3 can learn to reason more effectively—not through larger models or longer pretraining, but via principled post-training techniques. By mimicking search algorithms in natural language, ASTRO enables models to think before answering, doubt their own steps, and correct themselves mid-reasoning. This framework sets a new benchmark for fine-tuning open LLMs to approach human-like reasoning through search-inspired behaviors. Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. The post Can We Improve Llama 3’s Reasoning Through Post-Training Alone? ASTRO Shows +16% to +20% Benchmark Gains appeared first on MarkTechPost.

Can We Improve Llama 3’s Reasoning Through Post-Training Alone? ASTRO Shows +16% to +20% Benchmark Gains Read Post »

AI, Committee, 新闻, Uncategorized

Analyzing and Improving Speaker Similarity Assessment for Speech Synthesis

arXiv:2507.02176v1 Announce Type: cross Abstract: Modeling voice identity is challenging due to its multifaceted nature. In generative speech systems, identity is often assessed using automatic speaker verification (ASV) embeddings, designed for discrimination rather than characterizing identity. This paper investigates which aspects of a voice are captured in such representations. We find that widely used ASV embeddings focus mainly on static features like timbre and pitch range, while neglecting dynamic elements such as rhythm. We also identify confounding factors that compromise speaker similarity measurements and suggest mitigation strategies. To address these gaps, we propose U3D, a metric that evaluates speakers’ dynamic rhythm patterns. This work contributes to the ongoing challenge of assessing speaker identity consistency in the context of ever-better voice cloning systems. We publicly release our code.

Analyzing and Improving Speaker Similarity Assessment for Speech Synthesis Read Post »

AI, Committee, 新闻, Uncategorized

Improving the Robustness of Distantly-Supervised Named Entity Recognition via Uncertainty-Aware Teacher Learning and Student-Student Collaborative Learning

arXiv:2311.08010v3 Announce Type: replace Abstract: Distantly-Supervised Named Entity Recognition (DS-NER) is widely used in real-world scenarios. It can effectively alleviate the burden of annotation by matching entities in existing knowledge bases with snippets in the text but suffer from the label noise. Recent works attempt to adopt the teacher-student framework to gradually refine the training labels and improve the overall robustness. However, these teacher-student methods achieve limited performance because the poor calibration of the teacher network produces incorrectly pseudo-labeled samples, leading to error propagation. Therefore, we propose: (1) Uncertainty-Aware Teacher Learning that leverages the prediction uncertainty to reduce the number of incorrect pseudo labels in the self-training stage; (2) Student-Student Collaborative Learning that allows the transfer of reliable labels between two student networks instead of indiscriminately relying on all pseudo labels from its teacher, and further enables a full exploration of mislabeled samples rather than simply filtering unreliable pseudo-labeled samples. We evaluate our proposed method on five DS-NER datasets, demonstrating that our method is superior to the state-of-the-art DS-NER methods.

Improving the Robustness of Distantly-Supervised Named Entity Recognition via Uncertainty-Aware Teacher Learning and Student-Student Collaborative Learning Read Post »

zh_CN