YouZum

Committee

AI, Committee, ข่าว, Uncategorized

This AI Paper from Microsoft Introduces WINA: A Training-Free Sparse Activation Framework for Efficient Large Language Model Inference

Large language models (LLMs), with billions of parameters, power many AI-driven services across industries. However, their massive size and complex architectures make their computational costs during inference a significant challenge. As these models evolve, optimizing the balance between computational efficiency and output quality has become a crucial area of research. The core challenge lies in how LLMs handle inference. Every time an input is processed, the entire model is activated, which consumes extensive computational resources. This full activation is unnecessary for most tasks, as only a small subset of neurons contribute meaningfully to the final output. Existing sparse activation methods attempt to address this by selectively deactivating less important neurons. However, these approaches often focus only on the magnitude of hidden states while ignoring the critical role of weight matrices in propagating errors through the network. This oversight leads to high approximation errors and deteriorates model performance, particularly at higher sparsity levels. Sparse activation techniques have included methods like Mixture-of-Experts (MoE) used in models such as GPT-4 and Mistral, which rely on additional training to learn which experts to activate for each input. Other approaches, such as TEAL and CATS, aim to reduce computation by using the size of hidden activations to prune neurons, but they still leave room for improvement. These methods often struggle with balancing sparsity and accuracy, as they can mistakenly deactivate important neurons or retain those with minimal influence. Moreover, they require model-specific threshold tuning, making them less flexible across different architectures. Researchers from Microsoft, Renmin University of China, New York University, and the South China University of Technology proposed a new method called WINA (Weight Informed Neuron Activation) to address these issues. WINA introduces a training-free sparse activation technique that uses both hidden state magnitudes and column-wise ℓ2 norms of weight matrices to determine which neurons to activate during inference. By considering the combined impact of input magnitudes and weight importance, WINA creates a more effective sparsification strategy that adapts to different layers of the model without requiring retraining or fine-tuning. The WINA method is built on a simple yet powerful idea: neurons that have strong activations and large weight magnitudes are more likely to influence downstream computations. To operationalize this, WINA calculates the element-wise product of hidden states and weight norms, selecting the top-K components based on this combined metric. This strategy allows WINA to construct a sparse sub-network that preserves the most important signals while ignoring redundant activations. The method also includes a tensor transformation step that enforces column-wise orthogonality in weight matrices, ensuring theoretical error bounds translate effectively to real-world performance. By combining these steps, WINA maintains a tight approximation error while delivering significant computational savings. The research team evaluated WINA on several large language models, including Qwen-2.5-7B, LLaMA-2-7B, LLaMA-3-8B, and Phi-4-14B, across various tasks and sparsity levels. WINA outperformed TEAL and CATS across all tested models and sparsity settings. For example, on Qwen-2.5-7B at 65% sparsity, WINA achieved up to 2.94% higher average performance than TEAL and 1.41% better than TEAL-Transform. On LLaMA-3-8B, WINA delivered gains of 1.06% at 50% sparsity and 2.41% at 65% sparsity. Even at high sparsity levels, WINA retained stronger performance on reasoning-intensive tasks like GSM8K and ARC Challenge. WINA also delivered consistent computational savings, reducing floating-point operations by up to 63.7% on LLaMA-2-7B and 62.7% on Phi-4-14B. In summary, WINA offers a robust, training-free solution for sparse activation in large language models by combining hidden state magnitudes with weight matrix norms. This approach addresses the limitations of prior methods, such as TEAL, resulting in lower approximation errors, improved accuracy, and significant computational savings. The research team’s work represents an important step forward in developing more efficient LLM inference methods that can adapt to diverse models without requiring additional training. Check out the Paper and GitHub Page . All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. The post This AI Paper from Microsoft Introduces WINA: A Training-Free Sparse Activation Framework for Efficient Large Language Model Inference appeared first on MarkTechPost.

This AI Paper from Microsoft Introduces WINA: A Training-Free Sparse Activation Framework for Efficient Large Language Model Inference Read Post »

AI, Committee, ข่าว, Uncategorized

Meet NovelSeek: A Unified Multi-Agent Framework for Autonomous Scientific Research from Hypothesis Generation to Experimental Validation

Scientific research across fields like chemistry, biology, and artificial intelligence has long relied on human experts to explore knowledge, generate ideas, design experiments, and refine results. Yet, as problems grow more complex and data-intensive, discovery slows. While AI tools, such as language models and robotics, can handle specific tasks, like literature searches or code analysis, they rarely encompass the entire research cycle. Bridging the gap between idea generation and experimental validation remains a key challenge. For AI to autonomously advance science, it must propose hypotheses, design and execute experiments, analyze outcomes, and refine approaches in an iterative loop. Without this integration, AI risks producing disconnected ideas that depend on human supervision for validation. Before the introduction of a unified system, researchers relied on separate tools for each stage of the process. Large language models could help find relevant scientific papers, but they didn’t directly feed into experiment design or result analysis. Robotics can assist in automating physical experiments, and coding libraries like PyTorch can help build models; however, these tools operate independently of each other. There was no single system capable of handling the entire process, from forming ideas to verifying them through experiments. This led to bottlenecks, where researchers had to connect the dots manually, slowing progress and leaving room for errors or missed opportunities. The need for an integrated system that could handle the entire research cycle became clear. Researchers from the NovelSeek Team at the Shanghai Artificial Intelligence Laboratory developed NovelSeek, an AI system designed to run the entire scientific discovery process autonomously. NovelSeek comprises four main modules that work in tandem: a system that generates and refines research ideas, a feedback loop where human experts can interact with and refine these ideas, a method for translating ideas into code and experiment plans, and a process for conducting multiple rounds of experiments. What makes NovelSeek stand out is its versatility; it works across 12 scientific research tasks, including predicting chemical reaction yields, understanding molecular dynamics, forecasting time-series data, and handling functions like 2D semantic segmentation and 3D object classification. The team designed NovelSeek to minimize human involvement, expedite discoveries, and deliver consistent, high-quality results. The system behind NovelSeek involves multiple specialized agents, each focused on a specific part of the research workflow. The “Survey Agent” helps the system understand the problem by searching scientific papers and identifying relevant information based on keywords and task definitions. It adapts its search strategy by first doing a broad survey of papers, then going deeper by analyzing full-text documents for detailed insights. This ensures that the system captures both general trends and specific technical knowledge. The “Code Review Agent” examines existing codebases, whether user-uploaded or sourced from public repositories like GitHub, to understand how current methods work and identify areas for improvement. It checks how code is structured, looks for errors, and creates summaries that help the system build on past work. The “Idea Innovation Agent” generates creative research ideas, pushing the system to explore different approaches and refine them by comparing them to related studies and previous results. The system even includes a “Planning and Execution Agent” that turns ideas into detailed experiments, handles errors during the testing process, and ensures smooth execution of multi-step research plans. NovelSeek delivered impressive results across various tasks. In chemical reaction yield prediction, NovelSeek improved performance from a baseline of 24.2% (with a variation of ±4.2) to 34.8% (with a much smaller variation of ±1.1) in just 12 hours, progress that human researchers typically need months to achieve. In enhancer activity prediction, a key task in biology, NovelSeek raised the Pearson correlation coefficient from 0.65 to 0.79 within 4 hours. For 2D semantic segmentation, a task used in computer vision, precision improved from 78.8% to 81.0% in just 30 hours. These performance boosts, achieved in a fraction of the time typically needed, highlight the system’s efficiency. NovelSeek also successfully managed large, complex codebases with multiple files, demonstrating its ability to handle research tasks at a project level, not just in small, isolated tests. The team has made the code open-source, allowing others to use, test, and contribute to its improvement. Several Key Takeaways from the Research on NovelSeek include: NovelSeek supports 12 research tasks, including chemical reaction prediction, molecular dynamics, and 3D object classification. Reaction yield prediction accuracy improved from 24.2% to 34.8% in 12 hours. Enhancer activity prediction performance increased from 0.65 to 0.79 in 4 hours. 2D semantic segmentation precision improved from 78.8% to 81.0% in 30 hours. NovelSeek includes agents for literature search, code analysis, idea generation, and experiment execution. The system is open-source, enabling reproducibility and collaboration across scientific fields. In conclusion, NovelSeek demonstrates how combining AI tools into a single system can accelerate scientific discovery and reduce its dependence on human effort. It ties together the key steps, generating ideas, turning them into methods, and testing them through experiments, into one streamlined process. What once took researchers months or years can now be done in days or even hours. By linking every stage of research into a continuous loop, NovelSeek helps teams move from rough ideas to real-world results more quickly. This system highlights the power of AI not just to assist, but to drive scientific research in a way that could reshape how discoveries are made across many fields. Check out the Paper and GitHub Page . All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. The post Meet NovelSeek: A Unified Multi-Agent Framework for Autonomous Scientific Research from Hypothesis Generation to Experimental Validation appeared first on MarkTechPost.

Meet NovelSeek: A Unified Multi-Agent Framework for Autonomous Scientific Research from Hypothesis Generation to Experimental Validation Read Post »

AI, Committee, ข่าว, Uncategorized

BOND 2025 AI Trends Report Shows AI Ecosystem Growing Faster than Ever with Explosive User and Developer Adoption

BOND’s latest report on Trends – Artificial Intelligence (May 2025) presents a comprehensive data-driven snapshot of the current state and rapid evolution of AI technology. The report highlights some striking trends underscoring the unprecedented velocity of AI adoption, technological improvement, and market impact. This article reviews several key findings from the report and explores their implications for the AI ecosystem. Explosive Adoption of Open-Source Large Language Models One of the standout observations is the remarkable uptake of Meta’s Llama models. Over an eight-month span, Llama downloads surged by a factor of 3.4×, marking an unprecedented developer adoption curve for any open-source large language model (LLM). This acceleration highlights the expanding democratization of AI capabilities beyond proprietary platforms, enabling a broad spectrum of developers to integrate and innovate with advanced models. Source: https://www.bondcap.com/reports/tai The rapid acceptance of Llama illustrates a growing trend in the industry: open-source AI projects are becoming competitive alternatives to proprietary models, fueling a more distributed ecosystem. This proliferation accelerates innovation cycles and lowers barriers to entry for startups and research groups. AI Chatbots Achieving Human-Level Conversational Realism The report also documents significant advances in conversational AI. In Q1 2025, Turing-style tests showed that human evaluators mistook AI chatbot responses for human replies 73% of the time—a substantial jump from approximately 50% only six months prior. This rapid improvement reflects the growing sophistication of LLMs in mimicking human conversational nuances such as context retention, emotional resonance, and colloquial expression. Source: https://www.bondcap.com/reports/tai This trend has profound implications for industries reliant on customer interaction, including support, sales, and personal assistants. As chatbots approach indistinguishability from humans in conversation, businesses will need to rethink user experience design, ethical considerations, and transparency standards to maintain trust. ChatGPT’s Search Volume Surpasses Google’s Early Growth by 5.5× ChatGPT reached an estimated 365 billion annual searches within just two years of its public launch in November 2022. This growth rate outpaces Google’s trajectory, which took 11 years (1998–2009) to reach the same volume of annual searches. In essence, ChatGPT’s search volume ramped up about 5.5 times faster than Google’s did. Source: https://www.bondcap.com/reports/tai This comparison underscores the transformative shift in how users interact with information retrieval systems. The conversational and generative nature of ChatGPT has fundamentally altered expectations for search and discovery, accelerating adoption and daily engagement. NVIDIA’s GPUs Power Massive AI Throughput Gains While Reducing Power Draw Between 2016 and 2024, NVIDIA GPUs achieved a 225× increase in AI inference throughput, while simultaneously cutting data center power consumption by 43%. This impressive dual improvement has yielded an astounding >30,000× increase in theoretical annual token processing capacity per $1 billion data center investment. Source: https://www.bondcap.com/reports/tai This leap in efficiency underpins the scalability of AI workloads and dramatically lowers the operational cost of AI deployments. As a result, enterprises can now deploy larger, more complex AI models at scale with reduced environmental impact and better cost-effectiveness. DeepSeek’s Rapid User Growth Captures a Third of China’s Mobile AI Market In the span of just four months, from January to April 2025, DeepSeek scaled from zero to 54 million monthly active mobile AI users in China, securing over 34% market share in the mobile AI segment. This rapid growth reflects both the enormous demand in China’s mobile AI ecosystem and DeepSeek’s ability to capitalize on it through local market understanding and product fit. Source: https://www.bondcap.com/reports/tai The speed and scale of DeepSeek’s adoption also highlight the growing global competition in AI innovation, particularly between China and the U.S., with localized ecosystems developing rapidly in parallel. The Revenue Opportunity for AI Inference Has Skyrocketed The report outlines a massive shift in the potential revenue from AI inference tokens processed in large data centers. In 2016, a $1 billion-scale data center could process roughly 5 trillion inference tokens annually, generating about $24 million in token-related revenue. By 2024, that same investment could handle an estimated 1,375 trillion tokens per year, translating to nearly $7 billion in theoretical revenue — a 30,000× increase. Source: https://www.bondcap.com/reports/tai This enormous leap stems from improvements in both hardware efficiency and algorithmic optimizations that dramatically reduce inference costs. The Plunge in AI Inference Costs One of the key enablers of these trends is the steep decline in inference costs per million tokens. For example, the cost to generate a million tokens using GPT-3.5 dropped from over $10 in September 2022 to around $1 by mid-2023. ChatGPT’s cost per 75-word response approached near zero within its first year. This precipitous fall in pricing closely mirrors historical cost declines in other technologies, such as computer memory, which fell to near zero over two decades, and electric power, which dropped to about 2–3% of its initial price after 60–70 years. In contrast, more static costs like that of light bulbs have remained largely flat over time. The IT Consumer Price Index vs. Compute Demand BOND’s report also examines the relationship between IT consumer price trends and compute demand. Since 2010, compute requirements for AI have increased by approximately 360% per year, leading to an estimated total of 10²⁶ floating point operations (FLOPs) in 2024. During the same period, the IT consumer price index fell from 100 to below 10, indicating dramatically cheaper hardware costs. This decoupling means organizations can train larger and more complex AI models while spending significantly less on compute infrastructure, further accelerating AI innovation cycles. Conclusion BOND’s Trends – Artificial Intelligence report offers compelling quantitative evidence that AI is evolving at an unprecedented pace. The combination of rapid user adoption, explosive developer engagement, hardware efficiency breakthroughs, and falling inference costs is reshaping the AI landscape globally. From Meta’s Llama open-source surge to DeepSeek’s rapid market capture in China, and from ChatGPT’s hyper-accelerated search growth to NVIDIA’s remarkable GPU performance gains, the data reflect a highly dynamic ecosystem. The steep decline in AI inference costs amplifies this effect, enabling new applications and business models. The key takeaway for AI practitioners and industry watchers is clear: AI’s technological and economic momentum is accelerating, demanding continuous innovation and strategic agility.

BOND 2025 AI Trends Report Shows AI Ecosystem Growing Faster than Ever with Explosive User and Developer Adoption Read Post »

AI, Committee, ข่าว, Uncategorized

Yandex Releases Yambda: The World’s Largest Event Dataset to Accelerate Recommender Systems

Yandex has recently made a significant contribution to the recommender systems community by releasing Yambda, the world’s largest publicly available dataset for recommender system research and development. This dataset is designed to bridge the gap between academic research and industry-scale applications, offering nearly 5 billion anonymized user interaction events from Yandex Music — one of the company’s flagship streaming services with over 28 million monthly users. Why Yambda Matters: Addressing a Critical Data Gap in Recommender Systems Recommender systems underpin the personalized experiences of many digital services today, from e-commerce and social networks to streaming platforms. These systems rely heavily on massive volumes of behavioral data, such as clicks, likes, and listens, to infer user preferences and deliver tailored content. However, the field of recommender systems has lagged behind other AI domains, like natural language processing, largely due to the scarcity of large, openly accessible datasets. Unlike large language models (LLMs), which learn from publicly available text sources, recommender systems need sensitive behavioral data — which is commercially valuable and hard to anonymize. As a result, companies have traditionally guarded this data closely, limiting researchers’ access to real-world-scale datasets. Existing datasets such as Spotify’s Million Playlist Dataset, Netflix Prize data, and Criteo’s click logs are either too small, lack temporal detail, or are poorly documented for developing production-grade recommender models. Yandex’s release of Yambda addresses these challenges by providing a high-quality, extensive dataset with a rich set of features and anonymization safeguards. What Yambda Contains: Scale, Richness, and Privacy The Yambda dataset comprises 4.79 billion anonymized user interactions collected over a 10-month period. These events come from roughly 1 million users interacting with nearly 9.4 million tracks on Yandex Music. The dataset includes: User Interactions: Both implicit feedback (listens) and explicit feedback (likes, dislikes, and their removals). Anonymized Audio Embeddings: Vector representations of tracks derived from convolutional neural networks, enabling models to leverage audio content similarity. Organic Interaction Flags: An “is_organic” flag indicates whether users discovered a track independently or via recommendations, facilitating behavioral analysis. Precise Timestamps: Each event is timestamped to preserve temporal ordering, crucial for modeling sequential user behavior. All user and track identifiers are anonymized using numeric IDs to comply with privacy standards, ensuring no personally identifiable information is exposed. The dataset is provided in Apache Parquet format, which is optimized for big data processing frameworks like Apache Spark and Hadoop, and also compatible with analytical libraries such as Pandas and Polars. This makes Yambda accessible for researchers and developers working in diverse environments. Evaluation Method: Global Temporal Split A key innovation in Yandex’s dataset is the adoption of a Global Temporal Split (GTS) evaluation strategy. In typical recommender system research, the widely used Leave-One-Out method removes the last interaction of each user for testing. However, this approach disrupts the temporal continuity of user interactions, creating unrealistic training conditions. GTS, on the other hand, splits the data based on timestamps, preserving the entire sequence of events. This approach mimics real-world recommendation scenarios more closely because it prevents any future data from leaking into training and allows models to be tested on truly unseen, chronologically later interactions. This temporal-aware evaluation is essential for benchmarking algorithms under realistic constraints and understanding their practical effectiveness. Baseline Models and Metrics Included To support benchmarking and accelerate innovation, Yandex provides baseline recommender models implemented on the dataset, including: MostPop: A popularity-based model recommending the most popular items. DecayPop: A time-decayed popularity model. ItemKNN: A neighborhood-based collaborative filtering method. iALS: Implicit Alternating Least Squares matrix factorization. BPR: Bayesian Personalized Ranking, a pairwise ranking method. SANSA and SASRec: Sequence-aware models leveraging self-attention mechanisms. These baselines are evaluated using standard recommender metrics such as: NDCG@k (Normalized Discounted Cumulative Gain): Measures ranking quality emphasizing the position of relevant items. Recall@k: Assesses the fraction of relevant items retrieved. Coverage@k: Indicates the diversity of recommendations across the catalog. Providing these benchmarks helps researchers quickly gauge the performance of new algorithms relative to established methods. Broad Applicability Beyond Music Streaming While the dataset originates from a music streaming service, its value extends far beyond that domain. The interaction types, user behavior dynamics, and large scale make Yambda a universal benchmark for recommender systems across sectors like e-commerce, video platforms, and social networks. Algorithms validated on this dataset can be generalized or adapted to various recommendation tasks. Benefits for Different Stakeholders Academia: Enables rigorous testing of theories and new algorithms at an industry-relevant scale. Startups and SMBs: Offers a resource comparable to what tech giants possess, leveling the playing field and accelerating the development of advanced recommendation engines. End Users: Indirectly benefits from smarter recommendation algorithms that improve content discovery, reduce search time, and increase engagement. My Wave: Yandex’s Personalized Recommender System Yandex Music leverages a proprietary recommender system called My Wave, which incorporates deep neural networks and AI to personalize music suggestions. My Wave analyzes thousands of factors including: User interaction sequences and listening history. Customizable preferences such as mood and language. Real-time music analysis of spectrograms, rhythm, vocal tone, frequency ranges, and genres. This system dynamically adapts to individual tastes by identifying audio similarities and predicting preferences, demonstrating the kind of complex recommendation pipeline that benefits from large-scale datasets like Yambda. Ensuring Privacy and Ethical Use The release of Yambda underscores the importance of privacy in recommender system research. Yandex anonymizes all data with numeric IDs and omits personally identifiable information. The dataset contains only interaction signals without revealing exact user identities or sensitive attributes. This balance between openness and privacy allows for robust research while protecting individual user data, a critical consideration for the ethical advancement of AI technologies. Access and Versions Yandex offers the Yambda dataset in three sizes to accommodate different research and computational capacities: Full version: ~5 billion events. Medium version: ~500 million events. Small version: ~50 million events. All versions are accessible via Hugging Face, a popular platform for hosting datasets and machine learning models, enabling easy integration into research workflows. Conclusion Yandex’s release of the Yambda dataset marks a pivotal moment

Yandex Releases Yambda: The World’s Largest Event Dataset to Accelerate Recommender Systems Read Post »

th