YouZum

Committee

AI, Committee, 新闻, Uncategorized

AI That Teaches Itself: Tsinghua University’s ‘Absolute Zero’ Trains LLMs With Zero External Data

LLMs have shown advancements in reasoning capabilities through Reinforcement Learning with Verifiable Rewards (RLVR), which relies on outcome-based feedback rather than imitating intermediate reasoning steps. Current RLVR works face critical scalability challenges as they heavily depend on manually curated collections of questions and answers for training. As reasoning models advance, constructing large-scale, high-quality datasets becomes increasingly unsustainable, similar to bottlenecks identified in LLM pretraining. Moreover, exclusive dependency on human-designed tasks may constrain AI systems’ capacity for autonomous learning and development, especially as they evolve beyond human intellectual capabilities. Researchers have explored various approaches to enhance LLM reasoning capabilities. STaR pioneered self-bootstrapping using expert iteration and rejection sampling of outcome-verified responses to improve CoT reasoning. The o1 model deployed this concept at scale, achieving state-of-the-art results, and R1 later became the first open-weight model to match or surpass o1’s performance by introducing the “zero” setting where RL is applied directly to the base LLM. Further, self-play paradigms have evolved from Schmidhuber’s early two-agent setups to more complex implementations like AlphaGo and AlphaZero. Recent methods such as SPIN, Self-Rewarding Language Models, SPC, and SPAG have applied self-play to language models for alignment and reasoning. Researchers from Tsinghua University, Beijing Institute for General Artificial Intelligence, and Pennsylvania State University have proposed an RLVR paradigm called Absolute Zero to enable a single model to autonomously generate and solve tasks that maximize its own learning progress without relying on any external data. Under this method, researchers have introduced the Absolute Zero Reasoner (AZR) that self-evolves its training curriculum and reasoning ability through a code executor that validates proposed code reasoning tasks and verifies answers, providing a unified source of verifiable reward to guide open-ended yet grounded learning. AZR can be effectively implemented across different model scales and remains compatible with various model classes, suggesting broad applicability. LLMs provide an ideal framework for implementing AZR in multitask learning contexts. During each online rollout iteration in the absolute zero setting’s objective equation, AZR proposes new reasoning tasks based on task type and past self-generated examples, with explicit prompting to generate diverse tasks and then attempts to solve them, receiving grounded feedback for its model responses. AZR utilizes a code executor as both a flexible interface and verifiable environment, enabling automatic construction, execution, and validation of code reasoning tasks. Lastly, the AZR Algorithm includes buffer initialization, Task Proposal Inputs and Buffer Management, valid task construction, solution validation, and advantage estimator calculation through Task-Relative REINFORCE++. The Absolute Zero Reasoner-Coder-7B has achieved state-of-the-art performance in the 7B overall average and coding average categories, surpassing previous best models by 1.8 absolute percentage points despite being entirely out-of-distribution for both math and code reasoning benchmarks. It outperforms models trained with expert-curated human data in coding by 0.3 absolute percentage points while never accessing such data itself. Scaling analysis reveals that AZR delivers greater gains on larger models, with the 7B and 14B models continuing to improve beyond 200 training steps while the 3B model plateaus. Out-of-distribution performance gains increase with model size: +5.7, +10.2, and +13.2 for 3B, 7B, and 14B, respectively. In conclusion, researchers introduced the Absolute Zero paradigm to address data limitations in existing RLVR frameworks. Under this method, researchers present AZR, which trains models to propose and solve code-related reasoning tasks grounded by a code executor. However, there is a limitation regarding safety management in self-improving systems. The team observed several instances of safety-concerning CoT reasoning from the Llama-3.1-8B model, termed “uh-oh moments.” The findings indicate that while the Absolute Zero paradigm reduces human intervention needs in task curation, ongoing oversight remains necessary to address lingering safety concerns, highlighting a critical direction for future research. Check out the Paper, Model on Hugging Face and GitHub Page. Also, don’t forget to follow us on Twitter. Here’s a brief overview of what we’re building at Marktechpost: ML News Community – r/machinelearningnews (92k+ members) Newsletter– airesearchinsights.com/(30k+ subscribers) miniCON AI Events – minicon.marktechpost.com AI Reports & Magazines – magazine.marktechpost.com AI Dev & Research News – marktechpost.com (1M+ monthly readers) Partner with us The post AI That Teaches Itself: Tsinghua University’s ‘Absolute Zero’ Trains LLMs With Zero External Data appeared first on MarkTechPost.

AI That Teaches Itself: Tsinghua University’s ‘Absolute Zero’ Trains LLMs With Zero External Data Read Post »

AI, Committee, 新闻, Uncategorized

Multimodal LLMs Without Compromise: Researchers from UCLA, UW–Madison, and Adobe Introduce X-Fusion to Add Vision to Frozen Language Models Without Losing Language Capabilities

LLMs have made significant strides in language-related tasks such as conversational AI, reasoning, and code generation. However, human communication extends beyond text, often incorporating visual elements to enhance understanding. To create a truly versatile AI, models need the ability to process and generate text and visual information simultaneously. Training such unified vision-language models from scratch using methods like autoregressive token prediction or a hybrid approach combining diffusion and language losses has shown strong performance. Still, it requires vast computational resources and retraining for each new modality. An alternative approach adapts pretrained LLMs with vision capabilities, which offers a more efficient path but often compromises the language model’s original performance. Current research has focused on three main strategies: merging LLMs with standalone image generation models, training large multimodal models end-to-end, or using a combination of diffusion and autoregressive losses. While these methods have achieved state-of-the-art results, they either require retraining large models or result in degradation of the LLM’s core capabilities. Despite these challenges, leveraging pretrained LLMs with added vision components has demonstrated significant potential, particularly in tasks involving image understanding and generation. However, these methods still face limitations in terms of efficiency and flexibility.  Researchers from UCLA, the University of Wisconsin-Madison, and Adobe Research propose X-Fusion, which adapts pretrained LLMs for multimodal tasks while preserving language capabilities. X-Fusion utilizes a dual-tower architecture, freezing the LLM’s language weights while adding a vision-specific tower to process visual information. The approach aligns text and vision features at multiple levels, improving performance in image-to-text and text-to-image tasks. Through ablation studies, the researchers emphasize the importance of clean image data for training and show that aligning vision features with pre-trained representations accelerates convergence, especially for smaller models.  X-Fusion is a unified framework that adapts pretrained LLMs for vision tasks while retaining their language capabilities. It uses a dual-tower design, freezing the LLM’s text weights while introducing a separate vision tower for processing visual information. Images are tokenized using a pretrained encoder, and image and text tokens are jointly optimized. The model incorporates an optional X-Fuse operation to merge features from both towers for enhanced performance. X-Fusion is trained with autoregressive and image denoising losses, and its performance is evaluated on image generation (text-to-image) and image understanding (image-to-text) tasks.  The study evaluates the Dual Tower architecture against alternative transformer variants for multimodal integration. It compares the Single Tower, Gated Tower, and Dual Projection designs, highlighting the flexibility of the Dual Tower for image and text tasks. The Dual Tower performs best in image generation and understanding, outperforming other designs by 23% in FID without increasing training parameters. The study also investigates the effects of noise and data ratios on performance, finding that clean images improve understanding and generation. Additionally, aligning vision features with a pretrained encoder like CLIP boosts performance, especially for smaller models.  In conclusion, X-Fusion is a framework that adapts pretrained LLMs to multimodal tasks, such as image understanding and generation, while preserving language capabilities. It introduces a Dual Tower architecture where language weights remain fixed, and a separate trainable vision tower processes visual features. Experimental results show that X-Fusion outperforms alternative designs in image and text-to-image tasks. Key findings include the benefits of incorporating understanding-focused data, reducing noise in image data, and the positive impact of feature alignment, especially for smaller models. The research contributes valuable insights into building efficient multimodal models.  Check out the Paper. Also, don’t forget to follow us on Twitter. Here’s a brief overview of what we’re building at Marktechpost: Newsletter– airesearchinsights.com/(30k+ subscribers) miniCON AI Events – minicon.marktechpost.com AI Reports & Magazines – magazine.marktechpost.com AI Dev & Research News – marktechpost.com (1M+ monthly readers) ML News Community – r/machinelearningnews (92k+ members) The post Multimodal LLMs Without Compromise: Researchers from UCLA, UW–Madison, and Adobe Introduce X-Fusion to Add Vision to Frozen Language Models Without Losing Language Capabilities appeared first on MarkTechPost.

Multimodal LLMs Without Compromise: Researchers from UCLA, UW–Madison, and Adobe Introduce X-Fusion to Add Vision to Frozen Language Models Without Losing Language Capabilities Read Post »

AI, Committee, 新闻, Uncategorized

ZeroSearch: Incentivize the Search Capability of LLMs without Searching

arXiv:2505.04588v1 Announce Type: new Abstract: Effective information searching is essential for enhancing the reasoning and generation capabilities of large language models (LLMs). Recent research has explored using reinforcement learning (RL) to improve LLMs’ search capabilities by interacting with live search engines in real-world environments. While these approaches show promising results, they face two major challenges: (1) Uncontrolled Document Quality: The quality of documents returned by search engines is often unpredictable, introducing noise and instability into the training process. (2) Prohibitively High API Costs: RL training requires frequent rollouts, potentially involving hundreds of thousands of search requests, which incur substantial API expenses and severely constrain scalability. To address these challenges, we introduce ZeroSearch, a reinforcement learning framework that incentivizes the search capabilities of LLMs without interacting with real search engines. Our approach begins with lightweight supervised fine-tuning to transform the LLM into a retrieval module capable of generating both relevant and noisy documents in response to a query. During RL training, we employ a curriculum-based rollout strategy that incrementally degrades the quality of generated documents, progressively eliciting the model’s reasoning ability by exposing it to increasingly challenging retrieval scenarios. Extensive experiments demonstrate that ZeroSearch effectively incentivizes the search capabilities of LLMs using a 3B LLM as the retrieval module. Remarkably, a 7B retrieval module achieves comparable performance to the real search engine, while a 14B retrieval module even surpasses it. Furthermore, it generalizes well across both base and instruction-tuned models of various parameter sizes and is compatible with a wide range of RL algorithms.

ZeroSearch: Incentivize the Search Capability of LLMs without Searching Read Post »

We use cookies to improve your experience and performance on our website. You can learn more at 隱私權政策 and manage your privacy settings by clicking Settings.

Privacy Preferences

You can choose your cookie settings by turning on/off each type of cookie as you wish, except for essential cookies.

Allow All
Manage Consent Preferences
  • Always Active

Save
zh_CN