YouZum

AI

AI, Committee, Noticias, Uncategorized

Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models

arXiv:2503.22879v3 Announce Type: replace-cross Abstract: State Space Models (SSMs) are emerging as a compelling alternative to Transformers because of their consistent memory usage and high performance. Despite this, scaling up SSMs on cloud services or limited-resource devices is challenging due to their storage requirements and computational power. To overcome this, quantizing SSMs with low bit-width data formats can reduce model size and benefit from hardware acceleration. As SSMs are prone to quantization-induced errors, recent efforts have focused on optimizing a particular model or bit-width for efficiency without sacrificing performance. However, distinct bit-width configurations are essential for different scenarios, like W4A8 for boosting large-batch decoding speed, and W4A16 for enhancing generation speed in short prompt applications for a single user. To this end, we present Quamba2, compatible with W8A8, W4A8, and W4A16 for both Mamba1 and Mamba2 backbones, addressing the growing demand for SSM deployment on various platforms. Based on the channel order preserving and activation persistence of SSMs, we propose an offline approach to quantize inputs of a linear recurrence in 8-bit by sorting and clustering for input $x$, combined with a per-state-group quantization for input-dependent parameters $B$ and $C$. To ensure compute-invariance in the SSM output, we rearrange weights offline according to the clustering sequence. The experiments show that Quamba2-8B outperforms two state-of-the-art SSM quantization methods and delivers 1.3$times$ and 3$times$ speed-ups in the pre-filling and generation stages, respectively, while offering 4$times$ memory reduction with only a $1.6%$ average accuracy drop. The evaluation on MMLU shows the generalizability and robustness of our framework. The code and quantized models will be released at: https://github.com/enyac-group/Quamba.

Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models Leer entrada »

AI, Committee, Noticias, Uncategorized

AutoSDT: Scaling Data-Driven Discovery Tasks Toward Open Co-Scientists

arXiv:2506.08140v1 Announce Type: cross Abstract: Despite long-standing efforts in accelerating scientific discovery with AI, building AI co-scientists remains challenging due to limited high-quality data for training and evaluation. To tackle this data scarcity issue, we present AutoSDT, an automatic pipeline that collects high-quality coding tasks in real-world data-driven discovery workflows. AutoSDT leverages the coding capabilities and parametric knowledge of LLMs to search for diverse sources, select ecologically valid tasks, and synthesize accurate task instructions and code solutions. Using our pipeline, we construct AutoSDT-5K, a dataset of 5,404 coding tasks for data-driven discovery that covers four scientific disciplines and 756 unique Python packages. To the best of our knowledge, AutoSDT-5K is the only automatically collected and the largest open dataset for data-driven scientific discovery. Expert feedback on a subset of 256 tasks shows the effectiveness of AutoSDT: 93% of the collected tasks are ecologically valid, and 92.2% of the synthesized programs are functionally correct. Trained on AutoSDT-5K, the Qwen2.5-Coder-Instruct LLM series, dubbed AutoSDT-Coder, show substantial improvement on two challenging data-driven discovery benchmarks, ScienceAgentBench and DiscoveryBench. Most notably, AutoSDT-Coder-32B reaches the same level of performance as GPT-4o on ScienceAgentBench with a success rate of 7.8%, doubling the performance of its base model. On DiscoveryBench, it lifts the hypothesis matching score to 8.1, bringing a 17.4% relative improvement and closing the gap between open-weight models and GPT-4o.

AutoSDT: Scaling Data-Driven Discovery Tasks Toward Open Co-Scientists Leer entrada »

AI, Committee, Noticias, Uncategorized

Olica: Efficient Structured Pruning of Large Language Models without Retraining

arXiv:2506.08436v1 Announce Type: new Abstract: Most existing structured pruning methods for Large Language Models (LLMs) require substantial computational and data resources for retraining to reestablish the corrupted correlations, making them prohibitively expensive. To address this, we propose a pruning framework for LLMs called Orthogonal decomposition and Linear Calibration (Olica), which eliminates the need for retraining. A key observation is that the multi-head attention (MHA) layer depends on two types of matrix products. By treating these matrix products as unified entities and applying principal component analysis (PCA), we extract the most important information to compress LLMs without sacrificing accuracy or disrupting their original structure. Consequently, retraining becomes unnecessary. A fast decomposition method is devised, reducing the complexity of PCA by a factor of the square of the number of attention heads. Additionally, to mitigate error accumulation problem caused by pruning the feed-forward network (FFN) layer, we introduce a linear calibration method to reconstruct the residual errors of pruned layers using low-rank matrices. By leveraging singular value decomposition (SVD) on the solution of the least-squares problem, these matrices are obtained without requiring retraining. Extensive experiments show that the proposed Olica is efficient in terms of data usage, GPU memory, and running time, while delivering superior performance across multiple benchmarks.

Olica: Efficient Structured Pruning of Large Language Models without Retraining Leer entrada »

AI, Committee, Noticias, Uncategorized

A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment

arXiv:2504.15585v4 Announce Type: replace-cross Abstract: The remarkable success of Large Language Models (LLMs) has illuminated a promising pathway toward achieving Artificial General Intelligence for both academic and industrial communities, owing to their unprecedented performance across various applications. As LLMs continue to gain prominence in both research and commercial domains, their security and safety implications have become a growing concern, not only for researchers and corporations but also for every nation. Currently, existing surveys on LLM safety primarily focus on specific stages of the LLM lifecycle, e.g., deployment phase or fine-tuning phase, lacking a comprehensive understanding of the entire “lifechain” of LLMs. To address this gap, this paper introduces, for the first time, the concept of “full-stack” safety to systematically consider safety issues throughout the entire process of LLM training, deployment, and eventual commercialization. Compared to the off-the-shelf LLM safety surveys, our work demonstrates several distinctive advantages: (I) Comprehensive Perspective. We define the complete LLM lifecycle as encompassing data preparation, pre-training, post-training, deployment and final commercialization. To our knowledge, this represents the first safety survey to encompass the entire lifecycle of LLMs. (II) Extensive Literature Support. Our research is grounded in an exhaustive review of over 800+ papers, ensuring comprehensive coverage and systematic organization of security issues within a more holistic understanding. (III) Unique Insights. Through systematic literature analysis, we have developed reliable roadmaps and perspectives for each chapter. Our work identifies promising research directions, including safety in data generation, alignment techniques, model editing, and LLM-based agent systems. These insights provide valuable guidance for researchers pursuing future work in this field.

A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment Leer entrada »

AI, Committee, Noticias, Uncategorized

Representation Decomposition for Learning Similarity and Contrastness Across Modalities for Affective Computing

arXiv:2506.07086v1 Announce Type: new Abstract: Multi-modal affective computing aims to automatically recognize and interpret human attitudes from diverse data sources such as images and text, thereby enhancing human-computer interaction and emotion understanding. Existing approaches typically rely on unimodal analysis or straightforward fusion of cross-modal information that fail to capture complex and conflicting evidence presented across different modalities. In this paper, we propose a novel LLM-based approach for affective computing that explicitly deconstructs visual and textual representations into shared (modality-invariant) and modality-specific components. Specifically, our approach firstly encodes and aligns input modalities using pre-trained multi-modal encoders, then employs a representation decomposition framework to separate common emotional content from unique cues, and finally integrates these decomposed signals via an attention mechanism to form a dynamic soft prompt for a multi-modal LLM. Extensive experiments on three representative tasks for affective computing, namely, multi-modal aspect-based sentiment analysis, multi-modal emotion analysis, and hateful meme detection, demonstrate the effectiveness of our approach, which consistently outperforms strong baselines and state-of-the-art models.

Representation Decomposition for Learning Similarity and Contrastness Across Modalities for Affective Computing Leer entrada »

AI, Committee, Noticias, Uncategorized

BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning

arXiv:2501.18858v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks, yet generating reliable reasoning processes remains a significant challenge. We present a unified probabilistic framework that formalizes LLM reasoning through a novel graphical model incorporating latent thinking processes and evaluation signals. Within this framework, we introduce the Bootstrapping Reinforced Thinking Process (BRiTE) algorithm, which works in two steps. First, it generates high-quality rationales by approximating the optimal thinking process through reinforcement learning, using a novel reward shaping mechanism. Second, it enhances the base LLM by maximizing the joint probability of rationale generation with respect to the model’s parameters. Theoretically, we demonstrate BRiTE’s convergence at a rate of $1/T$ with $T$ representing the number of iterations. Empirical evaluations on math and coding benchmarks demonstrate that our approach consistently improves performance across different base models without requiring human-annotated thinking processes. In addition, BRiTE demonstrates superior performance compared to existing algorithms that bootstrap thinking processes use alternative methods such as rejection sampling, and can even match or exceed the results achieved through supervised fine-tuning with human-annotated data.

BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning Leer entrada »

AI, Committee, Noticias, Uncategorized

Theorem-of-Thought: A Multi-Agent Framework for Abductive, Deductive, and Inductive Reasoning in Language Models

arXiv:2506.07106v1 Announce Type: new Abstract: Large language models (LLMs) have shown strong performance across natural language reasoning tasks, yet their reasoning processes remain brittle and difficult to interpret. Prompting techniques like Chain-of-Thought (CoT) enhance reliability by eliciting intermediate reasoning steps or aggregating multiple outputs. However, they lack mechanisms for enforcing logical structure and assessing internal coherence. We introduce Theorem-of-Thought (ToTh), a novel framework that models reasoning as collaboration among three parallel agents, each simulating a distinct mode of inference: abductive, deductive, and inductive. Each agent produces a reasoning trace, which is structured into a formal reasoning graph. To evaluate consistency, we apply Bayesian belief propagation guided by natural language inference (NLI), assigning confidence scores to each step. The most coherent graph is selected to derive the final answer. Experiments on symbolic (WebOfLies) and numerical (MultiArith) reasoning benchmarks show that ToTh consistently outperforms CoT, Self-Consistency, and CoT-Decoding across multiple LLMs, while producing interpretable and logically grounded reasoning chains. Our findings suggest a promising direction for building more robust and cognitively inspired LLM reasoning. The implementation is available at https://github.com/KurbanIntelligenceLab/theorem-of-thought.

Theorem-of-Thought: A Multi-Agent Framework for Abductive, Deductive, and Inductive Reasoning in Language Models Leer entrada »

AI, Committee, Noticias, Uncategorized

Introspective Growth: Automatically Advancing LLM Expertise in Technology Judgment

arXiv:2505.12452v2 Announce Type: replace Abstract: Large language models (LLMs) increasingly demonstrate signs of conceptual understanding, yet much of their internal knowledge remains latent, loosely structured, and difficult to access or evaluate. We propose self-questioning as a lightweight and scalable strategy to improve LLMs’ understanding, particularly in domains where success depends on fine-grained semantic distinctions. To evaluate this approach, we introduce a challenging new benchmark of 1.3 million post-2015 computer science patent pairs, characterized by dense technical jargon and strategically complex writing. The benchmark centers on a pairwise differentiation task: can a model distinguish between closely related but substantively different inventions? We show that compared to placebo scientific information, prompting LLMs to generate and answer their own questions – targeting the background knowledge required for the task – significantly improves performance. These self-generated questions and answers activate otherwise underutilized internal knowledge. Allowing LLMs to retrieve answers from external scientific texts further enhances performance, suggesting that model knowledge is compressed and lacks the full richness of the training data. We also find that chain-of-thought prompting and self-questioning converge, though self-questioning remains more effective for improving understanding of technical concepts. Notably, we uncover an asymmetry in prompting: smaller models often generate more fundamental, more open-ended, better-aligned questions for mid-sized models than large models do, revealing a new strategy for cross-model collaboration. Altogether, our findings establish self-questioning as both a practical mechanism for automatically improving LLM comprehension, especially in domains with sparse and underrepresented knowledge, and a diagnostic probe of how internal and external knowledge are organized.

Introspective Growth: Automatically Advancing LLM Expertise in Technology Judgment Leer entrada »

AI, Committee, Noticias, Uncategorized

The Download: an inspiring toy robot arm, and why AM radio matters

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. How a 1980s toy robot arm inspired modern robotics —Jon Keegan As a child of an electronic engineer, I spent a lot of time in our local Radio Shack as a kid. While my dad was locating capacitors and resistors, I was in the toy section. It was there, in 1984, that I discovered the best toy of my childhood: the Armatron robotic arm. Described as a “robot-like arm to aid young masterminds in scientific and laboratory experiments,” it was a legit robotic arm. And the bold look and function of Armatron made quite an impression on many young kids who would one day have a career in robotics. Read the full story. If you’re interested in the future of robots, why not check out: + Will we ever trust robots? If most robots still need remote human operators to be safe and effective, why should we welcome them into our homes? Read the full story. + When you might start speaking to robots. Google is only the latest to fuse large language models with robots. Here’s why the trend has big implications. + How AI models let robots carry out tasks in unfamiliar environments. Read the full story. + China’s EV giants are betting big on humanoid robots. Technical know-how and existing supply chains give Chinese electric-vehicle makers a significant head start in the sector. Read the full story. Why we still need AM radio The most reliable way to keep us informed in times of disaster is being threatened. Check out Ariel Aberg-Riger’s beautiful visual story illustrating AM radio’s importance in uncertain times.  Both of these stories are from the most recent edition of our print magazine, which is all about how technology is changing creativity. Subscribe now to get future copies before they land. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 Protestors set Waymo robotaxis alight in Los AngelesThe groups clashed with police over the Trump administration’s immigration raids. (LA Times $)+ Much of the technology that fuels deportation orders is error-ridden. (Slate $)+ Immigrants are using a swathe of new apps to stay ahead of deportation. (Rest of World) 2 What’s next for Elon Musk and Donald TrumpA full breakdown in relations could be much worse for Musk in the long run. (NY Mag $)+ Trump’s backers are rapidly turning on Musk, too. (New Yorker $)+ The biggest winner from their fall out? Jeff Bezos. (The Information $) 3 DOGE used an inaccurate AI tool to terminate Veteran Affairs contactsIts code frequently produced glaring mistakes. (ProPublica)+ Undeterred, the department is on a hiring spree. (Wired $)+ Can AI help DOGE slash government budgets? It’s complex. (MIT Technology Review) 4 Europe’s shrinking forests could cause it to miss net-zero targetsIts trees aren’t soaking up as much carbon as they used to. (New Scientist $)+ Inside the controversial tree farms powering Apple’s carbon neutral goal. (MIT Technology Review) 5 OpenAI wants to embed ChatGPT into college campuses The ultimate goal? A personalized AI account for every student. (NYT $)+ Meanwhile, other universities are experimenting with tech-free classes. (The Atlantic $)+ ChatGPT is going to change education, not destroy it. (MIT Technology Review) 6 Chinese regulators are pumping the brakes on self-driving carsThey’re developing a new framework to assess the safety of autonomous features. (FT $)+ The country’s robotaxis are rapidly catching up with the west. (Rest of World)+ How China is regulating robotaxis. (MIT Technology Review) 7 Desalination is finally becoming a realityRemoving salt from seawater is one way to combat water scarcity. (WSJ $)+ If you can make it through tons of plastic, that is. (The Atlantic $) 8 We’re getting better at fighting cancerDeaths from the disease in the US have dropped by a third since 1991. (Vox)+ Why it’s so hard to use AI to diagnose cancer. (MIT Technology Review) 9 Teenage TikTokers’ skin regimes offer virtually no benefitAnd could even be potentially harmful. (The Guardian)+ The fight for “Instagram face” (MIT Technology Review) 10 Tech’s layoff groups are providing much-needed supportWorkers who have been let go by their employers are forming little communities. (Insider $) Quote of the day “Every tech company is doing similar things but we were open about it.” —Luis von Ahn, chief executive of the language-learning app Duolingo, tells the Financial Times that his company is far from the only one adopting an AI-first strategy.  One more thing How to break free of Spotify’s algorithm Since the heyday of radio, the branding of sound has evolved from broad genres like rock and hip-hop to “paranormal dark cabaret afternoon” and “synth space,” and streaming has become the default. Meanwhile, the ritual of discovering something new is now neatly packaged in a 30-song playlist. The only rule in music streaming is personalization. What we’ve gained in convenience, we’ve lost in curiosity. But it doesn’t have to be this way. Read the full story. —Tiffany Ng We can still have nice things A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or skeet ’em at me.) + Happy birthday to Michael J Fox, who turns 64 today!+ Whenever you need to play the world’s smallest violin, these scientists can help you out + An early JMW Turner oil painting has been rediscovered.+ Watching robots attempt to kickbox is pretty amusing.

The Download: an inspiring toy robot arm, and why AM radio matters Leer entrada »

es_ES