YouZum

Uncategorized

AI, Committee, ニュース, Uncategorized

The Download: India’s AI independence, and predicting future epidemics

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Inside India’s scramble for AI independence Despite its status as a global tech hub, India lags far behind the likes of the US and China when it comes to homegrown AI. That gap has opened largely because India has chronically underinvested in R&D, institutions, and invention. Meanwhile, since no one native language is spoken by the majority of the population, training language models is far more complicated than it is elsewhere. So when the open-source foundation model DeepSeek-R1 suddenly outperformed many global peers, it struck a nerve. This launch by a Chinese startup prompted Indian policymakers to confront just how far behind the country was in AI infrastructure—and how urgently it needed to respond. Read the full story. —Shadma Shaikh Job titles of the future: Pandemic oracle Officially, Conor Browne is a biorisk consultant. Based in Belfast, Northern Ireland, he has advanced degrees in security studies and medical and business ethics, along with United Nations certifications in counterterrorism and conflict resolution. Early in the emergence of SARS-CoV-2, international energy conglomerates seeking expert guidance on navigating the potential turmoil in markets and transportation became his main clients.  Having studied the 2002 SARS outbreak, he predicted the exponential spread of the new airborne virus. In fact, he forecast the epidemic’s broadscale impact and its implications for business so accurately that he has come to be seen as a pandemic oracle. Read the full story. —Britta Shoot This story is from the most recent print edition of MIT Technology Review, which explores power—who has it, and who wants it. Subscribe here to receive future copies once they drop. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 Donald Trump’s ‘big beautiful bill’ has passed  Which is terrible news for the clean energy industry. (Vox)+ An energy-affordability crisis is looming in the US. (The Atlantic $)+ The President struck deals with House Republican holdouts to get it over the line. (WSJ $)+ The Trump administration has shut down more than 100 climate studies. (MIT Technology Review) 2 Daniel Gross is joining Meta’s superintelligence lab He’s jumping ship from the startup he co-founded with Ilya Sutskever. (Bloomberg $)+ Sutskever is stepping into the CEO role in his absence. (TechCrunch)+ Here’s what we can infer from Meta’s recent hires. (Semafor)3 AI’s energy demands could destabilize the global supplyThat’s according to the head of the world’s largest transformer maker. (FT $)+ We did the math on AI’s energy footprint. Here’s the story you haven’t heard. (MIT Technology Review)4 Elon Musk is threatening to start his own political partyWould anyone vote for him, though? (WP $)+ You’d think his bruising experience in the White House would have put him off. (NY Mag $) 5 The US has lifted exports on chip design software to ChinaIt suggests that frosty relations between the nations may be thawing. (Reuters) 6 Trump officials are going after this ICE warning appBut lawyers say there’s nothing illegal about it. (Wired $)+ Downloads of ICEBlock are rising. (NBC News) 7 Wildfires are making it harder to monitor air pollutantsCurrent tracking technology isn’t built to accommodate shifting smoke. (Undark)+ How AI can help spot wildfires. (MIT Technology Review) 8 Apple’s iOS 26 software can detect nudity on FaceTime callsThe feature will pause the call and ask if you want to continue. (Gizmodo) 9 Threads has finally launched DMsBut users are arguing there should be a way to opt out of them entirely. (TechCrunch) 10 You can hire a robot to write a handwritten note Or, y’know, pick up a pen and write it yourself. (Insider $) Quote of the day “It’s almost like we never even spoke.” Richard Wilson, an online dater who is convinced his most recent love interest used a chatbot to converse with him online before they awkwardly met in person, tells the Washington Post about his disappointment. One more thing Deepfakes of your dead loved ones are a booming Chinese business Once a week, Sun Kai has a video call with his mother, and they discuss his day-to-day life. But Sun’s mother died five years ago, and the person he’s talking to isn’t actually a person, but a digital replica he made of her. There are plenty of people like Sun who want to use AI to preserve, animate, and interact with lost loved ones as they mourn and try to heal. The market is particularly strong in China, where at least half a dozen companies are now offering such technologies and thousands of people have already paid for them. But some question whether interacting with AI replicas of the dead is truly a healthy way to process grief, and it’s not entirely clear what the legal and ethical implications of this technology may be. Read the full story. —Zeyi Yang We can still have nice things A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or skeet ’em at me.)+ There’s nothing cooler than wooden interiors right now.+ Talented artist Ian Robinson creates beautiful paintings of people’s vinyl collections.+ You’ll find me in every one of Europe’s top wine destinations this summer.+ Here’s everything you need to remember before Stranger Things returns this fall.

The Download: India’s AI independence, and predicting future epidemics 投稿を読む »

AI, Committee, ニュース, Uncategorized

Can We Improve Llama 3’s Reasoning Through Post-Training Alone? ASTRO Shows +16% to +20% Benchmark Gains

Improving the reasoning capabilities of large language models (LLMs) without architectural changes is a core challenge in advancing AI alignment and usability. Researchers at Meta AI and the University of Washington have introduced ASTRO—Autoregressive Search-Taught Reasoner—a novel post-training framework designed to enhance reasoning in Llama-3.1-70B-Instruct. ASTRO is unique in teaching models to perform in-context search, self-reflection, and backtracking, mechanisms often associated with human problem-solving and traditional symbolic search algorithms. Through this approach, ASTRO boosts Llama 3’s math performance on several competitive benchmarks with significant improvements: MATH 500: 65.8% ➝ 81.8% AMC 2023: 37.5% ➝ 64.4% AIME 2024: 10.0% ➝ 30.0% Search-Guided Chain-of-Thought Generation ASTRO’s methodology begins with a Monte Carlo Tree Search (MCTS) over mathematical problem-solving trajectories. This search explores both correct and incorrect reasoning paths. The key innovation is procedure cloning: entire search trees are linearized into long chain-of-thoughts (CoT) that naturally encode both failures and recoveries via self-reflection and backtracking. These linearized traces are rewritten in natural language and used as the basis for supervised fine-tuning (SFT). This results in a model that doesn’t just solve problems step-by-step but reevaluates its trajectory—often backtracking after self-assessment to correct intermediate reasoning mistakes. For instance, the model may interject with phrases like “Let’s go back to where we set up the equation” when its internal confidence drops. Supervised Fine-Tuning: Injecting Search Priors ASTRO fine-tunes Llama-3.1-70B-Instruct on 36.1K curated CoT solutions from MATH, AMC/AIME, and AoPS-style datasets. The model trained with ASTRO-SFT achieves: MATH 500: 69.6% AMC 2023: 51.9% AIME 2024: 16.3% These scores are competitive with or exceed those of baseline and SPOC/Step-KTO variants trained without explicit search priors. Importantly, even SFT alone—without reinforcement learning—yields performance boosts by exposing the model to search-structured reasoning data. Reinforcement Learning with Search-Aware Initialization ASTRO proceeds to reinforcement learning (RL) by initializing with the SFT checkpoint and running an RL loop using a modified Group Relative Policy Optimization (GRPO). Unlike standard preference-based RL, ASTRO employs verifiable reward signals (+1 for correct, -1 for incorrect) on 8.7K moderately difficult prompts. During training, the model’s CoT generation grows longer—from ~1.8K to ~6K tokens—demonstrating deeper internal exploration. The resulting ASTRO-RL model achieves: MATH 500: 81.8% AMC 2023: 64.4% AIME 2024: 30.0% These results rival or exceed models with larger parameter counts and confirm the importance of ASTRO’s search-aware initialization. Backtracking Behavior Correlates with Reasoning Success A striking empirical observation is the positive correlation between backtracking frequency and performance. As training progresses, ASTRO-RL exhibits more self-corrective actions and deeper exploration. Pearson correlation coefficients across benchmarks exceed 0.8, indicating that self-reflection and backtracking are not merely cosmetic behaviors but functionally tied to better accuracy. Comparative Insights and Broader Impact Control experiments comparing ASTRO with models trained on direct CoT solutions (no search priors) reveal that even when trained on the same problem sets and search trees, ASTRO consistently outperforms. For instance, ASTRO-RL beats Direct-RL by: +2% on MATH 500 +3.9% on AMC 2023 +2.9% on AIME 2024 Moreover, ASTRO’s outputs can be visualized as directed graphs, with nodes as reasoning steps and edges capturing transitions, reflections, and corrections—facilitating better interpretability. ASTRO Key Takeaways Table Conclusion ASTRO demonstrates that LLMs like Llama 3 can learn to reason more effectively—not through larger models or longer pretraining, but via principled post-training techniques. By mimicking search algorithms in natural language, ASTRO enables models to think before answering, doubt their own steps, and correct themselves mid-reasoning. This framework sets a new benchmark for fine-tuning open LLMs to approach human-like reasoning through search-inspired behaviors. Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. The post Can We Improve Llama 3’s Reasoning Through Post-Training Alone? ASTRO Shows +16% to +20% Benchmark Gains appeared first on MarkTechPost.

Can We Improve Llama 3’s Reasoning Through Post-Training Alone? ASTRO Shows +16% to +20% Benchmark Gains 投稿を読む »

AI, Committee, ニュース, Uncategorized

Analyzing and Improving Speaker Similarity Assessment for Speech Synthesis

arXiv:2507.02176v1 Announce Type: cross Abstract: Modeling voice identity is challenging due to its multifaceted nature. In generative speech systems, identity is often assessed using automatic speaker verification (ASV) embeddings, designed for discrimination rather than characterizing identity. This paper investigates which aspects of a voice are captured in such representations. We find that widely used ASV embeddings focus mainly on static features like timbre and pitch range, while neglecting dynamic elements such as rhythm. We also identify confounding factors that compromise speaker similarity measurements and suggest mitigation strategies. To address these gaps, we propose U3D, a metric that evaluates speakers’ dynamic rhythm patterns. This work contributes to the ongoing challenge of assessing speaker identity consistency in the context of ever-better voice cloning systems. We publicly release our code.

Analyzing and Improving Speaker Similarity Assessment for Speech Synthesis 投稿を読む »

AI, Committee, ニュース, Uncategorized

Improving the Robustness of Distantly-Supervised Named Entity Recognition via Uncertainty-Aware Teacher Learning and Student-Student Collaborative Learning

arXiv:2311.08010v3 Announce Type: replace Abstract: Distantly-Supervised Named Entity Recognition (DS-NER) is widely used in real-world scenarios. It can effectively alleviate the burden of annotation by matching entities in existing knowledge bases with snippets in the text but suffer from the label noise. Recent works attempt to adopt the teacher-student framework to gradually refine the training labels and improve the overall robustness. However, these teacher-student methods achieve limited performance because the poor calibration of the teacher network produces incorrectly pseudo-labeled samples, leading to error propagation. Therefore, we propose: (1) Uncertainty-Aware Teacher Learning that leverages the prediction uncertainty to reduce the number of incorrect pseudo labels in the self-training stage; (2) Student-Student Collaborative Learning that allows the transfer of reliable labels between two student networks instead of indiscriminately relying on all pseudo labels from its teacher, and further enables a full exploration of mislabeled samples rather than simply filtering unreliable pseudo-labeled samples. We evaluate our proposed method on five DS-NER datasets, demonstrating that our method is superior to the state-of-the-art DS-NER methods.

Improving the Robustness of Distantly-Supervised Named Entity Recognition via Uncertainty-Aware Teacher Learning and Student-Student Collaborative Learning 投稿を読む »

AI, Committee, ニュース, Uncategorized

Layered Insights: Generalizable Analysis of Authorial Style by Leveraging All Transformer Layers

arXiv:2503.00958v2 Announce Type: replace Abstract: We propose a new approach for the authorship attribution task that leverages the various linguistic representations learned at different layers of pre-trained transformer-based models. We evaluate our approach on three datasets, comparing it to a state-of-the-art baseline in in-domain and out-of-domain scenarios. We found that utilizing various transformer layers improves the robustness of authorship attribution models when tested on out-of-domain data, resulting in new state-of-the-art results. Our analysis gives further insights into how our model’s different layers get specialized in representing certain stylistic features that benefit the model when tested out of the domain.

Layered Insights: Generalizable Analysis of Authorial Style by Leveraging All Transformer Layers 投稿を読む »

AI, Committee, ニュース, Uncategorized

Decision-Oriented Text Evaluation

arXiv:2507.01923v2 Announce Type: replace Abstract: Natural language generation (NLG) is increasingly deployed in high-stakes domains, yet common intrinsic evaluation methods, such as n-gram overlap or sentence plausibility, weakly correlate with actual decision-making efficacy. We propose a decision-oriented framework for evaluating generated text by directly measuring its influence on human and large language model (LLM) decision outcomes. Using market digest texts–including objective morning summaries and subjective closing-bell analyses–as test cases, we assess decision quality based on the financial performance of trades executed by human investors and autonomous LLM agents informed exclusively by these texts. Our findings reveal that neither humans nor LLM agents consistently surpass random performance when relying solely on summaries. However, richer analytical commentaries enable collaborative human-LLM teams to outperform individual human or agent baselines significantly. Our approach underscores the importance of evaluating generated text by its ability to facilitate synergistic decision-making between humans and LLMs, highlighting critical limitations of traditional intrinsic metrics.

Decision-Oriented Text Evaluation 投稿を読む »

AI, Committee, ニュース, Uncategorized

SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model

arXiv:2507.02822v1 Announce Type: new Abstract: With the widespread adoption of large language models (LLMs) in practical applications, selecting an appropriate model requires balancing not only performance but also operational cost. The emergence of reasoning-capable models has further widened the cost gap between “thinking” (high reasoning) and “non-thinking” (fast, low-cost) modes. In this work, we reveal that approximately 58% of medical questions can be accurately answered by the non-thinking mode alone, without requiring the high-cost reasoning process. This highlights a clear dichotomy in problem complexity and suggests that dynamically routing queries to the appropriate mode based on complexity could optimize accuracy, cost-efficiency, and overall user experience. Based on this, we further propose SynapseRoute, a machine learning-based dynamic routing framework that intelligently assigns input queries to either thinking or non-thinking modes. Experimental results on several medical datasets demonstrate that SynapseRoute not only improves overall accuracy (0.8390 vs. 0.8272) compared to the thinking mode alone but also reduces inference time by 36.8% and token consumption by 39.66%. Importantly, qualitative analysis indicates that over-reasoning on simpler queries can lead to unnecessary delays and even decreased accuracy, a pitfall avoided by our adaptive routing. Finally, this work further introduces the Accuracy-Inference-Token (AIT) index to comprehensively evaluate the trade-offs among accuracy, latency, and token cost.

SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model 投稿を読む »

AI, Committee, ニュース, Uncategorized

Batch-Max: Higher LLM Throughput using Larger Batch Sizes and KV Cache Compression

arXiv:2412.05693v3 Announce Type: replace Abstract: Several works have developed eviction policies to remove key-value (KV) pairs from the KV cache for more efficient inference. The focus has been on compressing the KV cache after the input prompt has been processed for faster token generation. In settings with limited GPU memory, and when the input context is longer than the generation length, we show that by also compressing the KV cache during the input processing phase, larger batch sizes can be used resulting in significantly higher throughput while still maintaining the original model’s accuracy.

Batch-Max: Higher LLM Throughput using Larger Batch Sizes and KV Cache Compression 投稿を読む »

AI, Committee, ニュース, Uncategorized

Pensieve Grader: An AI-Powered, Ready-to-Use Platform for Effortless Handwritten STEM Grading

arXiv:2507.01431v1 Announce Type: cross Abstract: Grading handwritten, open-ended responses remains a major bottleneck in large university STEM courses. We introduce Pensieve (https://www.pensieve.co), an AI-assisted grading platform that leverages large language models (LLMs) to transcribe and evaluate student work, providing instructors with rubric-aligned scores, transcriptions, and confidence ratings. Unlike prior tools that focus narrowly on specific tasks like transcription or rubric generation, Pensieve supports the entire grading pipeline-from scanned student submissions to final feedback-within a human-in-the-loop interface. Pensieve has been deployed in real-world courses at over 20 institutions and has graded more than 300,000 student responses. We present system details and empirical results across four core STEM disciplines: Computer Science, Mathematics, Physics, and Chemistry. Our findings show that Pensieve reduces grading time by an average of 65%, while maintaining a 95.4% agreement rate with instructor-assigned grades for high-confidence predictions.

Pensieve Grader: An AI-Powered, Ready-to-Use Platform for Effortless Handwritten STEM Grading 投稿を読む »

ja