YouZum

Committee

AI, Committee, Noticias, Uncategorized

Olica: Efficient Structured Pruning of Large Language Models without Retraining

arXiv:2506.08436v1 Announce Type: new Abstract: Most existing structured pruning methods for Large Language Models (LLMs) require substantial computational and data resources for retraining to reestablish the corrupted correlations, making them prohibitively expensive. To address this, we propose a pruning framework for LLMs called Orthogonal decomposition and Linear Calibration (Olica), which eliminates the need for retraining. A key observation is that the multi-head attention (MHA) layer depends on two types of matrix products. By treating these matrix products as unified entities and applying principal component analysis (PCA), we extract the most important information to compress LLMs without sacrificing accuracy or disrupting their original structure. Consequently, retraining becomes unnecessary. A fast decomposition method is devised, reducing the complexity of PCA by a factor of the square of the number of attention heads. Additionally, to mitigate error accumulation problem caused by pruning the feed-forward network (FFN) layer, we introduce a linear calibration method to reconstruct the residual errors of pruned layers using low-rank matrices. By leveraging singular value decomposition (SVD) on the solution of the least-squares problem, these matrices are obtained without requiring retraining. Extensive experiments show that the proposed Olica is efficient in terms of data usage, GPU memory, and running time, while delivering superior performance across multiple benchmarks.

Olica: Efficient Structured Pruning of Large Language Models without Retraining Leer entrada »

AI, Committee, Noticias, Uncategorized

Representation Decomposition for Learning Similarity and Contrastness Across Modalities for Affective Computing

arXiv:2506.07086v1 Announce Type: new Abstract: Multi-modal affective computing aims to automatically recognize and interpret human attitudes from diverse data sources such as images and text, thereby enhancing human-computer interaction and emotion understanding. Existing approaches typically rely on unimodal analysis or straightforward fusion of cross-modal information that fail to capture complex and conflicting evidence presented across different modalities. In this paper, we propose a novel LLM-based approach for affective computing that explicitly deconstructs visual and textual representations into shared (modality-invariant) and modality-specific components. Specifically, our approach firstly encodes and aligns input modalities using pre-trained multi-modal encoders, then employs a representation decomposition framework to separate common emotional content from unique cues, and finally integrates these decomposed signals via an attention mechanism to form a dynamic soft prompt for a multi-modal LLM. Extensive experiments on three representative tasks for affective computing, namely, multi-modal aspect-based sentiment analysis, multi-modal emotion analysis, and hateful meme detection, demonstrate the effectiveness of our approach, which consistently outperforms strong baselines and state-of-the-art models.

Representation Decomposition for Learning Similarity and Contrastness Across Modalities for Affective Computing Leer entrada »

AI, Committee, Noticias, Uncategorized

A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment

arXiv:2504.15585v4 Announce Type: replace-cross Abstract: The remarkable success of Large Language Models (LLMs) has illuminated a promising pathway toward achieving Artificial General Intelligence for both academic and industrial communities, owing to their unprecedented performance across various applications. As LLMs continue to gain prominence in both research and commercial domains, their security and safety implications have become a growing concern, not only for researchers and corporations but also for every nation. Currently, existing surveys on LLM safety primarily focus on specific stages of the LLM lifecycle, e.g., deployment phase or fine-tuning phase, lacking a comprehensive understanding of the entire “lifechain” of LLMs. To address this gap, this paper introduces, for the first time, the concept of “full-stack” safety to systematically consider safety issues throughout the entire process of LLM training, deployment, and eventual commercialization. Compared to the off-the-shelf LLM safety surveys, our work demonstrates several distinctive advantages: (I) Comprehensive Perspective. We define the complete LLM lifecycle as encompassing data preparation, pre-training, post-training, deployment and final commercialization. To our knowledge, this represents the first safety survey to encompass the entire lifecycle of LLMs. (II) Extensive Literature Support. Our research is grounded in an exhaustive review of over 800+ papers, ensuring comprehensive coverage and systematic organization of security issues within a more holistic understanding. (III) Unique Insights. Through systematic literature analysis, we have developed reliable roadmaps and perspectives for each chapter. Our work identifies promising research directions, including safety in data generation, alignment techniques, model editing, and LLM-based agent systems. These insights provide valuable guidance for researchers pursuing future work in this field.

A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment Leer entrada »

AI, Committee, Noticias, Uncategorized

Introspective Growth: Automatically Advancing LLM Expertise in Technology Judgment

arXiv:2505.12452v2 Announce Type: replace Abstract: Large language models (LLMs) increasingly demonstrate signs of conceptual understanding, yet much of their internal knowledge remains latent, loosely structured, and difficult to access or evaluate. We propose self-questioning as a lightweight and scalable strategy to improve LLMs’ understanding, particularly in domains where success depends on fine-grained semantic distinctions. To evaluate this approach, we introduce a challenging new benchmark of 1.3 million post-2015 computer science patent pairs, characterized by dense technical jargon and strategically complex writing. The benchmark centers on a pairwise differentiation task: can a model distinguish between closely related but substantively different inventions? We show that compared to placebo scientific information, prompting LLMs to generate and answer their own questions – targeting the background knowledge required for the task – significantly improves performance. These self-generated questions and answers activate otherwise underutilized internal knowledge. Allowing LLMs to retrieve answers from external scientific texts further enhances performance, suggesting that model knowledge is compressed and lacks the full richness of the training data. We also find that chain-of-thought prompting and self-questioning converge, though self-questioning remains more effective for improving understanding of technical concepts. Notably, we uncover an asymmetry in prompting: smaller models often generate more fundamental, more open-ended, better-aligned questions for mid-sized models than large models do, revealing a new strategy for cross-model collaboration. Altogether, our findings establish self-questioning as both a practical mechanism for automatically improving LLM comprehension, especially in domains with sparse and underrepresented knowledge, and a diagnostic probe of how internal and external knowledge are organized.

Introspective Growth: Automatically Advancing LLM Expertise in Technology Judgment Leer entrada »

AI, Committee, Noticias, Uncategorized

BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning

arXiv:2501.18858v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks, yet generating reliable reasoning processes remains a significant challenge. We present a unified probabilistic framework that formalizes LLM reasoning through a novel graphical model incorporating latent thinking processes and evaluation signals. Within this framework, we introduce the Bootstrapping Reinforced Thinking Process (BRiTE) algorithm, which works in two steps. First, it generates high-quality rationales by approximating the optimal thinking process through reinforcement learning, using a novel reward shaping mechanism. Second, it enhances the base LLM by maximizing the joint probability of rationale generation with respect to the model’s parameters. Theoretically, we demonstrate BRiTE’s convergence at a rate of $1/T$ with $T$ representing the number of iterations. Empirical evaluations on math and coding benchmarks demonstrate that our approach consistently improves performance across different base models without requiring human-annotated thinking processes. In addition, BRiTE demonstrates superior performance compared to existing algorithms that bootstrap thinking processes use alternative methods such as rejection sampling, and can even match or exceed the results achieved through supervised fine-tuning with human-annotated data.

BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning Leer entrada »

AI, Committee, Noticias, Uncategorized

Theorem-of-Thought: A Multi-Agent Framework for Abductive, Deductive, and Inductive Reasoning in Language Models

arXiv:2506.07106v1 Announce Type: new Abstract: Large language models (LLMs) have shown strong performance across natural language reasoning tasks, yet their reasoning processes remain brittle and difficult to interpret. Prompting techniques like Chain-of-Thought (CoT) enhance reliability by eliciting intermediate reasoning steps or aggregating multiple outputs. However, they lack mechanisms for enforcing logical structure and assessing internal coherence. We introduce Theorem-of-Thought (ToTh), a novel framework that models reasoning as collaboration among three parallel agents, each simulating a distinct mode of inference: abductive, deductive, and inductive. Each agent produces a reasoning trace, which is structured into a formal reasoning graph. To evaluate consistency, we apply Bayesian belief propagation guided by natural language inference (NLI), assigning confidence scores to each step. The most coherent graph is selected to derive the final answer. Experiments on symbolic (WebOfLies) and numerical (MultiArith) reasoning benchmarks show that ToTh consistently outperforms CoT, Self-Consistency, and CoT-Decoding across multiple LLMs, while producing interpretable and logically grounded reasoning chains. Our findings suggest a promising direction for building more robust and cognitively inspired LLM reasoning. The implementation is available at https://github.com/KurbanIntelligenceLab/theorem-of-thought.

Theorem-of-Thought: A Multi-Agent Framework for Abductive, Deductive, and Inductive Reasoning in Language Models Leer entrada »

AI, Committee, Noticias, Uncategorized

The Download: an inspiring toy robot arm, and why AM radio matters

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. How a 1980s toy robot arm inspired modern robotics —Jon Keegan As a child of an electronic engineer, I spent a lot of time in our local Radio Shack as a kid. While my dad was locating capacitors and resistors, I was in the toy section. It was there, in 1984, that I discovered the best toy of my childhood: the Armatron robotic arm. Described as a “robot-like arm to aid young masterminds in scientific and laboratory experiments,” it was a legit robotic arm. And the bold look and function of Armatron made quite an impression on many young kids who would one day have a career in robotics. Read the full story. If you’re interested in the future of robots, why not check out: + Will we ever trust robots? If most robots still need remote human operators to be safe and effective, why should we welcome them into our homes? Read the full story. + When you might start speaking to robots. Google is only the latest to fuse large language models with robots. Here’s why the trend has big implications. + How AI models let robots carry out tasks in unfamiliar environments. Read the full story. + China’s EV giants are betting big on humanoid robots. Technical know-how and existing supply chains give Chinese electric-vehicle makers a significant head start in the sector. Read the full story. Why we still need AM radio The most reliable way to keep us informed in times of disaster is being threatened. Check out Ariel Aberg-Riger’s beautiful visual story illustrating AM radio’s importance in uncertain times.  Both of these stories are from the most recent edition of our print magazine, which is all about how technology is changing creativity. Subscribe now to get future copies before they land. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 Protestors set Waymo robotaxis alight in Los AngelesThe groups clashed with police over the Trump administration’s immigration raids. (LA Times $)+ Much of the technology that fuels deportation orders is error-ridden. (Slate $)+ Immigrants are using a swathe of new apps to stay ahead of deportation. (Rest of World) 2 What’s next for Elon Musk and Donald TrumpA full breakdown in relations could be much worse for Musk in the long run. (NY Mag $)+ Trump’s backers are rapidly turning on Musk, too. (New Yorker $)+ The biggest winner from their fall out? Jeff Bezos. (The Information $) 3 DOGE used an inaccurate AI tool to terminate Veteran Affairs contactsIts code frequently produced glaring mistakes. (ProPublica)+ Undeterred, the department is on a hiring spree. (Wired $)+ Can AI help DOGE slash government budgets? It’s complex. (MIT Technology Review) 4 Europe’s shrinking forests could cause it to miss net-zero targetsIts trees aren’t soaking up as much carbon as they used to. (New Scientist $)+ Inside the controversial tree farms powering Apple’s carbon neutral goal. (MIT Technology Review) 5 OpenAI wants to embed ChatGPT into college campuses The ultimate goal? A personalized AI account for every student. (NYT $)+ Meanwhile, other universities are experimenting with tech-free classes. (The Atlantic $)+ ChatGPT is going to change education, not destroy it. (MIT Technology Review) 6 Chinese regulators are pumping the brakes on self-driving carsThey’re developing a new framework to assess the safety of autonomous features. (FT $)+ The country’s robotaxis are rapidly catching up with the west. (Rest of World)+ How China is regulating robotaxis. (MIT Technology Review) 7 Desalination is finally becoming a realityRemoving salt from seawater is one way to combat water scarcity. (WSJ $)+ If you can make it through tons of plastic, that is. (The Atlantic $) 8 We’re getting better at fighting cancerDeaths from the disease in the US have dropped by a third since 1991. (Vox)+ Why it’s so hard to use AI to diagnose cancer. (MIT Technology Review) 9 Teenage TikTokers’ skin regimes offer virtually no benefitAnd could even be potentially harmful. (The Guardian)+ The fight for “Instagram face” (MIT Technology Review) 10 Tech’s layoff groups are providing much-needed supportWorkers who have been let go by their employers are forming little communities. (Insider $) Quote of the day “Every tech company is doing similar things but we were open about it.” —Luis von Ahn, chief executive of the language-learning app Duolingo, tells the Financial Times that his company is far from the only one adopting an AI-first strategy.  One more thing How to break free of Spotify’s algorithm Since the heyday of radio, the branding of sound has evolved from broad genres like rock and hip-hop to “paranormal dark cabaret afternoon” and “synth space,” and streaming has become the default. Meanwhile, the ritual of discovering something new is now neatly packaged in a 30-song playlist. The only rule in music streaming is personalization. What we’ve gained in convenience, we’ve lost in curiosity. But it doesn’t have to be this way. Read the full story. —Tiffany Ng We can still have nice things A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or skeet ’em at me.) + Happy birthday to Michael J Fox, who turns 64 today!+ Whenever you need to play the world’s smallest violin, these scientists can help you out + An early JMW Turner oil painting has been rediscovered.+ Watching robots attempt to kickbox is pretty amusing.

The Download: an inspiring toy robot arm, and why AM radio matters Leer entrada »

AI, Committee, Noticias, Uncategorized

Yandex Releases Alchemist: A Compact Supervised Fine-Tuning Dataset for Enhancing Text-to-Image T2I Model Quality

Despite the substantial progress in text-to-image (T2I) generation brought about by models such as DALL-E 3, Imagen 3, and Stable Diffusion 3, achieving consistent output quality — both in aesthetic and alignment terms — remains a persistent challenge. While large-scale pretraining provides general knowledge, it is insufficient to achieve high aesthetic quality and alignment. Supervised fine-tuning (SFT) serves as a critical post-training step but its effectiveness is strongly dependent on the quality of the fine-tuning dataset. Current public datasets used in SFT either target narrow visual domains (e.g., anime or specific art genres) or rely on basic heuristic filters over web-scale data. Human-led curation is expensive, non-scalable, and frequently fails to identify samples that yield the greatest improvements. Moreover, recent T2I models use internal proprietary datasets with minimal transparency, limiting the reproducibility of results and slowing collective progress in the field. Approach: A Model-Guided Dataset Curation To mitigate these issues, Yandex have released Alchemist, a publicly available, general-purpose SFT dataset composed of 3,350 carefully selected image-text pairs. Unlike conventional datasets, Alchemist is constructed using a novel methodology that leverages a pre-trained diffusion model to act as a sample quality estimator. This approach enables the selection of training data with high impact on generative model performance without relying on subjective human labeling or simplistic aesthetic scoring. Alchemist is designed to improve the output quality of T2I models through targeted fine-tuning. The release also includes fine-tuned versions of five publicly available Stable Diffusion models. The dataset and models are accessible on Hugging Face under an open license. More about the methodology and experiments — in the preprint . Technical Design: Filtering Pipeline and Dataset Characteristics The construction of Alchemist involves a multi-stage filtering pipeline starting from ~10 billion web-sourced images. The pipeline is structured as follows: Initial Filtering: Removal of NSFW content and low-resolution images (threshold >1024×1024 pixels). Coarse Quality Filtering: Application of classifiers to exclude images with compression artifacts, motion blur, watermarks, and other defects. These classifiers were trained on standard image quality assessment datasets such as KonIQ-10k and PIPAL. Deduplication and IQA-Based Pruning: SIFT-like features are used for clustering similar images, retaining only high-quality ones. Images are further scored using the TOPIQ model, ensuring retention of clean samples. Diffusion-Based Selection: A key contribution is the use of a pre-trained diffusion model’s cross-attention activations to rank images. A scoring function identifies samples that strongly activate features associated with visual complexity, aesthetic appeal, and stylistic richness. This enables the selection of samples most likely to enhance downstream model performance. Caption Rewriting: The final selected images are re-captioned using a vision-language model fine-tuned to produce prompt-style textual descriptions. This step ensures better alignment and usability in SFT workflows. Through ablation studies, the authors determine that increasing the dataset size beyond 3,350 (e.g., 7k or 19k samples) results in lower quality of fine-tuned models, reinforcing the value of targeted, high-quality data over raw volume. Results Across Multiple T2I Models The effectiveness of Alchemist was evaluated across five Stable Diffusion variants: SD1.5, SD2.1, SDXL, SD3.5 Medium, and SD3.5 Large. Each model was fine-tuned using three datasets: (i) the Alchemist dataset, (ii) a size-matched subset from LAION-Aesthetics v2, and (iii) their respective baselines. Human Evaluation: Expert annotators performed side-by-side assessments across four criteria — text-image relevance, aesthetic quality, image complexity, and fidelity. Alchemist-tuned models showed statistically significant improvements in aesthetic and complexity scores, often outperforming both baselines and LAION-Aesthetics-tuned versions by margins of 12–20%. Importantly, text-image relevance remained stable, suggesting that prompt alignment was not negatively affected. Automated Metrics: Across metrics such as FD-DINOv2, CLIP Score, ImageReward, and HPS-v2, Alchemist-tuned models generally scored higher than their counterparts. Notably, improvements were more consistent when compared to size-matched LAION-based models than to baseline models. Dataset Size Ablation: Fine-tuning with larger variants of Alchemist (7k and 19k samples) led to lower performance, underscoring that stricter filtering and higher per-sample quality is more impactful than dataset size. Yandex has utilized the dataset to train its proprietary text-to-image generative model, YandexART v2.5, and plans to continue leveraging it for future model updates. Conclusion Alchemist provides a well-defined and empirically validated pathway to improve the quality of text-to-image generation via supervised fine-tuning.The approach emphasizes sample quality over scale and introduces a replicable methodology for dataset construction without reliance on proprietary tools. While the improvements are most notable in perceptual attributes like aesthetics and image complexity, the framework also highlights the trade-offs that arise in fidelity, particularly for newer base models already optimized through internal SFT. Nevertheless, Alchemist establishes a new standard for general-purpose SFT datasets and offers a valuable resource for researchers and developers working to advance the output quality of generative vision models. Check out the Paper here and Alchemist Dataset on Hugging Face. Thanks to the Yandex team for the thought leadership/ Resources for this article. The post Yandex Releases Alchemist: A Compact Supervised Fine-Tuning Dataset for Enhancing Text-to-Image T2I Model Quality appeared first on MarkTechPost.

Yandex Releases Alchemist: A Compact Supervised Fine-Tuning Dataset for Enhancing Text-to-Image T2I Model Quality Leer entrada »

es_ES