Uncategorized Archives - Página 31 de 60

EPFL Researchers Unveil FG2 at CVPR: A New AI Model That Slashes Localization Errors by 28% for Autonomous Vehicles in GPS-Denied Environments

admin NU / junio 16, 2025

Navigating the dense urban canyons of cities like San Francisco or New York can be a nightmare for GPS systems. The towering skyscrapers block and reflect satellite signals, leading to location errors of tens of meters. For you and me, that might mean a missed turn. But for an autonomous vehicle or a delivery robot, that level of imprecision is the difference between a successful mission and a costly failure. These machines require pinpoint accuracy to operate safely and efficiently. Addressing this critical challenge, researchers from the École Polytechnique Fédérale de Lausanne (EPFL) in Switzerland have introduced a groundbreaking new method for visual localization during CVPR 2025 Their new paper, “FG2: Fine-Grained Cross-View Localization by Fine-Grained Feature Matching,” presents a novel AI model that significantly enhances the ability of a ground-level system, like an autonomous car, to determine its exact position and orientation using only a camera and a corresponding aerial (or satellite) image. The new approach has demonstrated a remarkable 28% reduction in mean localization error compared to the previous state-of-the-art on a challenging public dataset. Key Takeaways: Superior Accuracy: The FG2 model reduces the average localization error by a significant 28% on the VIGOR cross-area test set, a challenging benchmark for this task. Human-like Intuition: Instead of relying on abstract descriptors, the model mimics human reasoning by matching fine-grained, semantically consistent features—like curbs, crosswalks, and buildings—between a ground-level photo and an aerial map. Enhanced Interpretability: The method allows researchers to “see” what the AI is “thinking” by visualizing exactly which features in the ground and aerial images are being matched, a major step forward from previous “black box” models. Weakly Supervised Learning: Remarkably, the model learns these complex and consistent feature matches without any direct labels for correspondences. It achieves this using only the final camera pose as a supervisory signal. Challenge: Seeing the World from Two Different Angles The core problem of cross-view localization is the dramatic difference in perspective between a street-level camera and an overhead satellite view. A building facade seen from the ground looks completely different from its rooftop signature in an aerial image. Existing methods have struggled with this. Some create a general “descriptor” for the entire scene, but this is an abstract approach that doesn’t mirror how humans naturally localize themselves by spotting specific landmarks. Other methods transform the ground image into a Bird’s-Eye-View (BEV) but are often limited to the ground plane, ignoring crucial vertical structures like buildings. FG2: Matching Fine-Grained Features The EPFL team’s FG2 method introduces a more intuitive and effective process. It aligns two sets of points: one generated from the ground-level image and another sampled from the aerial map. Here’s a breakdown of their innovative pipeline: Mapping to 3D: The process begins by taking the features from the ground-level image and lifting them into a 3D point cloud centered around the camera. This creates a 3D representation of the immediate environment. Smart Pooling to BEV: This is where the magic happens. Instead of simply flattening the 3D data, the model learns to intelligently select the most important features along the vertical (height) dimension for each point. It essentially asks, “For this spot on the map, is the ground-level road marking more important, or is the edge of that building’s roof the better landmark?” This selection process is crucial, as it allows the model to correctly associate features like building facades with their corresponding rooftops in the aerial view. Feature Matching and Pose Estimation: Once both the ground and aerial views are represented as 2D point planes with rich feature descriptors, the model computes the similarity between them. It then samples a sparse set of the most confident matches and uses a classic geometric algorithm called Procrustes alignment to calculate the precise 3-DoF (x, y, and yaw) pose. Unprecedented Performance and Interpretability The results speak for themselves. On the challenging VIGOR dataset, which includes images from different cities in its cross-area test, FG2 reduced the mean localization error by 28% compared to the previous best method. It also demonstrated superior generalization capabilities on the KITTI dataset, a staple in autonomous driving research. Perhaps more importantly, the FG2 model offers a new level of transparency. By visualizing the matched points, the researchers showed that the model learns semantically consistent correspondences without being explicitly told to. For example, the system correctly matches zebra crossings, road markings, and even building facades in the ground view to their corresponding locations on the aerial map. This interpretability is extremenly valuable for building trust in safety-critical autonomous systems. “A Clearer Path” for Autonomous Navigation The FG2 method represents a significant leap forward in fine-grained visual localization. By developing a model that intelligently selects and matches features in a way that mirrors human intuition, the EPFL researchers have not only shattered previous accuracy records but also made the decision-making process of the AI more interpretable. This work paves the way for more robust and reliable navigation systems for autonomous vehicles, drones, and robots, bringing us one step closer to a future where machines can confidently navigate our world, even when GPS fails them. Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. The post EPFL Researchers Unveil FG2 at CVPR: A New AI Model That Slashes Localization Errors by 28% for Autonomous Vehicles in GPS-Denied Environments appeared first on MarkTechPost.

EPFL Researchers Unveil FG2 at CVPR: A New AI Model That Slashes Localization Errors by 28% for Autonomous Vehicles in GPS-Denied Environments Leer entrada »

AI, Committee, Noticias, Uncategorized

Enhancing Large Language Models for Mobility Analytics with Semantic Location Tokenization

admin NU / junio 16, 2025

arXiv:2506.11109v1 Announce Type: new Abstract: The widespread adoption of location-based services has led to the generation of vast amounts of mobility data, providing significant opportunities to model user movement dynamics within urban environments. Recent advancements have focused on adapting Large Language Models (LLMs) for mobility analytics. However, existing methods face two primary limitations: inadequate semantic representation of locations (i.e., discrete IDs) and insufficient modeling of mobility signals within LLMs (i.e., single templated instruction fine-tuning). To address these issues, we propose QT-Mob, a novel framework that significantly enhances LLMs for mobility analytics. QT-Mob introduces a location tokenization module that learns compact, semantically rich tokens to represent locations, preserving contextual information while ensuring compatibility with LLMs. Furthermore, QT-Mob incorporates a series of complementary fine-tuning objectives that align the learned tokens with the internal representations in LLMs, improving the model’s comprehension of sequential movement patterns and location semantics. The proposed QT-Mob framework not only enhances LLMs’ ability to interpret mobility data but also provides a more generalizable approach for various mobility analytics tasks. Experiments on three real-world dataset demonstrate the superior performance in both next-location prediction and mobility recovery tasks, outperforming existing deep learning and LLM-based methods.

Enhancing Large Language Models for Mobility Analytics with Semantic Location Tokenization Leer entrada »

AI, Committee, Noticias, Uncategorized

Enhancing multimodal analogical reasoning with Logic Augmented Generation

admin NU / junio 16, 2025

arXiv:2504.11190v2 Announce Type: replace-cross Abstract: Recent advances in Large Language Models have demonstrated their capabilities across a variety of tasks. However, automatically extracting implicit knowledge from natural language remains a significant challenge, as machines lack active experience with the physical world. Given this scenario, semantic knowledge graphs can serve as conceptual spaces that guide the automated text generation reasoning process to achieve more efficient and explainable results. In this paper, we apply a logic-augmented generation (LAG) framework that leverages the explicit representation of a text through a semantic knowledge graph and applies it in combination with prompt heuristics to elicit implicit analogical connections. This method generates extended knowledge graph triples representing implicit meaning, enabling systems to reason on unlabeled multimodal data regardless of the domain. We validate our work through three metaphor detection and understanding tasks across four datasets, as they require deep analogical reasoning capabilities. The results show that this integrated approach surpasses current baselines, performs better than humans in understanding visual metaphors, and enables more explainable reasoning processes, though still has inherent limitations in metaphor understanding, especially for domain-specific metaphors. Furthermore, we propose a thorough error analysis, discussing issues with metaphorical annotations and current evaluation methods.

Enhancing multimodal analogical reasoning with Logic Augmented Generation Leer entrada »

AI, Committee, Noticias, Uncategorized

Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache

admin NU / junio 16, 2025

arXiv:2506.11886v1 Announce Type: new Abstract: Large Language Models struggle with memory demands from the growing Key-Value (KV) cache as context lengths increase. Existing compression methods homogenize head dimensions or rely on attention-guided token pruning, often sacrificing accuracy or introducing computational overhead. We propose FourierAttention, a training-free framework that exploits the heterogeneous roles of transformer head dimensions: lower dimensions prioritize local context, while upper ones capture long-range dependencies. By projecting the long-context-insensitive dimensions onto orthogonal Fourier bases, FourierAttention approximates their temporal evolution with fixed-length spectral coefficients. Evaluations on LLaMA models show that FourierAttention achieves the best long-context accuracy on LongBench and Needle-In-A-Haystack (NIAH). Besides, a custom Triton kernel, FlashFourierAttention, is designed to optimize memory via streamlined read-write operations, enabling efficient deployment without performance compromise.

Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache Leer entrada »

AI, Committee, Noticias, Uncategorized

LLM-as-a-Judge for Reference-less Automatic Code Validation and Refinement for Natural Language to Bash in IT Automation

admin NU / junio 16, 2025

arXiv:2506.11237v1 Announce Type: cross Abstract: In an effort to automatically evaluate and select the best model and improve code quality for automatic incident remediation in IT Automation, it is crucial to verify if the generated code for remediation action is syntactically and semantically correct and whether it can be executed correctly as intended. There are three approaches: 1) conventional methods use surface form similarity metrics (token match, exact match, etc.) which have numerous limitations, 2) execution-based evaluation focuses more on code functionality based on pass/fail judgments for given test-cases, and 3) LLM-as-a-Judge employs LLMs for automated evaluation to judge if it is a correct answer for a given problem based on pre-defined metrics. In this work, we focused on enhancing LLM-as-a-Judge using bidirectional functionality matching and logic representation for reference-less automatic validation and refinement for Bash code generation to select the best model for automatic incident remediation in IT Automation. We used execution-based evaluation as ground-truth to evaluate our LLM-as-a-Judge metrics. Results show high accuracy and agreement with execution-based evaluation (and up to 8% over baseline). Finally, we built Reflection code agents to utilize judgments and feedback from our evaluation metrics which achieved significant improvement (up to 24% increase in accuracy) for automatic code refinement.

LLM-as-a-Judge for Reference-less Automatic Code Validation and Refinement for Natural Language to Bash in IT Automation Leer entrada »

AI, Committee, Noticias, Uncategorized

Evaluating and Improving Robustness in Large Language Models: A Survey and Future Directions

admin NU / junio 16, 2025

arXiv:2506.11111v1 Announce Type: new Abstract: Large Language Models (LLMs) have gained enormous attention in recent years due to their capability of understanding and generating natural languages. With the rapid development and wild-range applications (e.g., Agents, Embodied Intelligence), the robustness of LLMs has received increased attention. As the core brain of many AI applications, the robustness of LLMs requires that models should not only generate consistent contents, but also ensure the correctness and stability of generated content when dealing with unexpeted application scenarios (e.g., toxic prompts, limited noise domain data, outof-distribution (OOD) applications, etc). In this survey paper, we conduct a thorough review of the robustness of LLMs, aiming to provide a comprehensive terminology of concepts and methods around this field and facilitate the community. Specifically, we first give a formal definition of LLM robustness and present the collection protocol of this survey paper. Then, based on the types of perturbated inputs, we organize this survey from the following perspectives: 1) Adversarial Robustness: tackling the problem that prompts are manipulated intentionally, such as noise prompts, long context, data attack, etc; 2) OOD Robustness: dealing with the unexpected real-world application scenarios, such as OOD detection, zero-shot transferring, hallucinations, etc; 3) Evaluation of Robustness: summarizing the new evaluation datasets, metrics, and tools for verifying the robustness of LLMs. After reviewing the representative work from each perspective, we discuss and highlight future opportunities and research directions in this field. Meanwhile, we also organize related works and provide an easy-to-search project (https://github.com/zhangkunzk/Awesome-LLM-Robustness-papers) to support the community.

Evaluating and Improving Robustness in Large Language Models: A Survey and Future Directions Leer entrada »

AI, Committee, Noticias, Uncategorized

Sakana AI Introduces Text-to-LoRA (T2L): A Hypernetwork that Generates Task-Specific LLM Adapters (LoRAs) based on a Text Description of the Task

admin NU / junio 15, 2025

Transformer models have significantly influenced how AI systems approach tasks in natural language understanding, translation, and reasoning. These large-scale models, particularly large language models (LLMs), have grown in size and complexity to the point where they encompass broad capabilities across various domains. However, applying these models to new, specialized tasks remains a complex operation. Each new application typically demands careful dataset selection, hours of fine-tuning, and a high degree of computational power. Although these models offer a strong foundation in knowledge, their rigidity in handling new domains with minimal data remains a core limitation. As researchers aim to bring AI closer to human-like adaptability, the focus has shifted toward more efficient methods that allow such models to modify their behavior without retraining every parameter. The Challenge of Customizing LLMs for New Tasks The central difficulty lies in adapting foundation models to unique applications without repeating costly and time-intensive training cycles. Most solutions today rely on creating new adapters for each task, which are separate components trained to steer the model’s behavior. These adapters must be made from scratch for every task, and any benefits learned from one application often cannot be transferred to another. This adaptation process is time-consuming and lacks scalability. Moreover, tuning models on specific datasets usually requires a high level of precision in hyperparameter choices, and failing to find the right configuration can lead to poor results. Even when adaptation is successful, the result is often a large collection of isolated task-specific components that are not easy to integrate or reuse. In response to these limitations, researchers have adopted Low-Rank Adaptation (LoRA), a technique that modifies only a small set of parameters rather than the entire model. LoRA injects low-rank matrices into specific layers of a frozen LLM, allowing the base weights to remain unchanged while enabling task-specific customization. This method reduces the number of trainable parameters. However, for each task, a new LoRA adapter still needs to be trained from scratch. While more efficient than full fine-tuning, this method does not allow for fast, on-the-fly adaptation. Recent advancements have attempted to compress these adapters further or combine multiple adapters during inference; however, they still rely heavily on prior training and cannot generate new adapters dynamically. Introducing Text-to-LoRA: Instant Adapter Generation from Task Descriptions Researchers at Sakana AI introduced Text-to-LoRA (T2L), designed to instantly generate task-specific LoRA adapters from textual descriptions of the target task, instead of creating and training new adapters for each task. T2L functions as a hypernetwork capable of outputting adapter weights in a single forward pass. It learns from a library of pre-existing LoRA adapters covering various domains, including GSM8K, Arc-challenge, BoolQ, and others. Once trained, T2L can interpret a task’s description and generate the required adapter without additional training. This ability not only eliminates the need for manual adapter generation but also enables the system to generalize to tasks it has never encountered before. The T2L architecture uses a combination of module-specific and layer-specific embeddings to guide the generation process. Three architectural variants were tested: a large version with 55 million parameters, a medium with 34 million, and a small with just 5 million. Despite their differences in size, all models were capable of generating the necessary low-rank matrices for adapter functionality. The training utilized the Super Natural Instructions dataset across 479 tasks, with each task described in natural language and encoded into vector form. By merging these descriptions with learned layer and module embeddings, T2L creates the low-rank A and B matrices needed for adapter functionality. This allows one model to replace hundreds of hand-crafted LoRAs, producing consistent results with a much smaller computational footprint. Benchmark Performance and Scalability of T2L On benchmarks such as Arc-easy and GSM8K, T2L matched or surpassed the performance of task-specific LoRAs. For instance, the accuracy on Arc-easy using T2L was 76.6%, matching the accuracy of the best manually tuned adapter. On BoolQ, it reached 89.9%, slightly outperforming the original adapter. Even on more difficult benchmarks like PIQA and Winogrande, where overfitting typically hurts performance, T2L delivered better results than manually trained adapters. These improvements are believed to stem from the lossy compression inherent in the hypernetwork training, which acts as a form of regularization. When increasing the number of training datasets from 16 to 479, the performance in zero-shot settings improved substantially, showing T2L’s capability to generalize with broader exposure during training. Several Key Takeaways from the Research include: T2L allows instant adaptation of LLMs using only natural language descriptions. It supports zero-shot generalization to tasks not seen during training. Three architectural variants of T2L were tested with parameter counts of 55M, 34M, and 5M. Benchmarks include ArcE, BoolQ, GSM8K, Hellaswag, PIQA, MBPP, and more. T2L achieved benchmark accuracies of 76.6% (ArcE), 89.9% (BoolQ), and 92.6% (Hellaswag). It matched or exceeded manually trained LoRAs in performance on multiple tasks. Trained using 479 tasks from the Super Natural Instructions dataset. T2L uses the gte-large-en-v1.5 model for generating task embeddings. LoRA adapters produced by T2L target only query and value projections in attention blocks, totaling 3.4M parameters. Performance remained consistent even with higher reconstruction loss, showing resilience to compression. In conclusion, this research highlights a major step forward in flexible and efficient model adaptation. Instead of relying on repetitive, resource-heavy procedures, T2L uses natural language itself as a control mechanism, enabling models to specialize using simple task descriptions. This capability dramatically reduces the time and cost required to adapt LLMs to new domains. Moreover, it suggests that as long as enough prior adapters are available for training, future models could potentially adapt in seconds to any task described in plain English. The use of hypernetworks to dynamically construct adapters also means less storage is needed for model specialization, further increasing the practicality of this method in production environments. Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. The post Sakana AI Introduces

Sakana AI Introduces Text-to-LoRA (T2L): A Hypernetwork that Generates Task-Specific LLM Adapters (LoRAs) based on a Text Description of the Task Leer entrada »

AI, Committee, Noticias, Uncategorized

Rethinking AI: DeepSeek’s playbook shakes up the high-spend, high-compute paradigm

admin NU / junio 15, 2025

DeepSeek’s advancements were inevitable, but the company brought them forward a few years earlier than would have been possible otherwise.Read More

Rethinking AI: DeepSeek’s playbook shakes up the high-spend, high-compute paradigm Leer entrada »

AI, Committee, Noticias, Uncategorized

MemOS: A Memory-Centric Operating System for Evolving and Adaptive Large Language Models

admin NU / junio 15, 2025

LLMs are increasingly seen as key to achieving Artificial General Intelligence (AGI), but they face major limitations in how they handle memory. Most LLMs rely on fixed knowledge stored in their weights and short-lived context during use, making it hard to retain or update information over time. Techniques like RAG attempt to incorporate external knowledge but lack structured memory management. This leads to problems such as forgetting past conversations, poor adaptability, and isolated memory across platforms. Fundamentally, today’s LLMs don’t treat memory as a manageable, persistent, or sharable system, limiting their real-world usefulness. To address the limitations of memory in current LLMs, researchers from MemTensor (Shanghai) Technology Co., Ltd., Shanghai Jiao Tong University, Renmin University of China, and the Research Institute of China Telecom have developed MemO. This memory operating system makes memory a first-class resource in language models. At its core is MemCube, a unified memory abstraction that manages parametric, activation, and plaintext memory. MemOS enables structured, traceable, and cross-task memory handling, allowing models to adapt continuously, internalize user preferences, and maintain behavioral consistency. This shift transforms LLMs from passive generators into evolving systems capable of long-term learning and cross-platform coordination. As AI systems grow more complex—handling multiple tasks, roles, and data types—language models must evolve beyond understanding text to also retaining memory and learning continuously. Current LLMs lack structured memory management, which limits their ability to adapt and grow over time. MemOS, a new system that treats memory as a core, schedulable resource. It enables long-term learning through structured storage, version control, and unified memory access. Unlike traditional training, MemOS supports a continuous “memory training” paradigm that blurs the line between learning and inference. It also emphasizes governance, ensuring traceability, access control, and safe use in evolving AI systems. MemOS is a memory-centric operating system for language models that treats memory not just as stored data but as an active, evolving component of the model’s cognition. It organizes memory into three distinct types: Parametric Memory (knowledge baked into model weights via pretraining or fine-tuning), Activation Memory (temporary internal states, such as KV caches and attention patterns, used during inference), and Plaintext Memory (editable, retrievable external data, like documents or prompts). These memory types interact within a unified framework called the MemoryCube (MemCube), which encapsulates both content and metadata, allowing dynamic scheduling, versioning, access control, and transformation across types. This structured system enables LLMs to adapt, recall relevant information, and efficiently evolve their capabilities, transforming them into more than just static generators. At the core of MemOS is a three-layer architecture: the Interface Layer handles user inputs and parses them into memory-related tasks; the Operation Layer manages the scheduling, organization, and evolution of different types of memory; and the Infrastructure Layer ensures safe storage, access governance, and cross-agent collaboration. All interactions within the system are mediated through MemCubes, allowing traceable, policy-driven memory operations. Through modules like MemScheduler, MemLifecycle, and MemGovernance, MemOS maintains a continuous and adaptive memory loop—from the moment a user sends a prompt, to memory injection during reasoning, to storing useful data for future use. This design not only enhances the model’s responsiveness and personalization but also ensures that memory remains structured, secure, and reusable. In conclusion, MemOS is a memory operating system designed to make memory a central, manageable component in LLMs. Unlike traditional models that depend mostly on static model weights and short-term runtime states, MemOS introduces a unified framework for handling parametric, activation, and plaintext memory. At its core is MemCube, a standardized memory unit that supports structured storage, lifecycle management, and task-aware memory augmentation. The system enables more coherent reasoning, adaptability, and cross-agent collaboration. Future goals include enabling memory sharing across models, self-evolving memory blocks, and building a decentralized memory marketplace to support continual learning and intelligent evolution. Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. The post MemOS: A Memory-Centric Operating System for Evolving and Adaptive Large Language Models appeared first on MarkTechPost.

MemOS: A Memory-Centric Operating System for Evolving and Adaptive Large Language Models Leer entrada »

AI, Committee, Noticias, Uncategorized

Internal Coherence Maximization (ICM): A Label-Free, Unsupervised Training Framework for LLMs

admin NU / junio 15, 2025

Post-training methods for pre-trained language models (LMs) depend on human supervision through demonstrations or preference feedback to specify desired behaviors. However, this approach faces critical limitations as tasks and model behaviors become very complex. Human supervision is unreliable in these scenarios as LMs learn to mimic mistakes in demonstrations or exploit inherent flaws in feedback systems. The core challenge lies in training LMs for tasks that exceed human capability in reliability in demonstrations or evaluations. Recent research has identified diverse failure modes, including reward-hacking of human-designed supervision signals or real humans themselves. Limitations of Human Supervision in LLM Post-Training Researchers have explored several approaches to scale beyond human supervision. One standard method utilizes high-quality verifiable rewards, such as matching model outputs with ground-truth solutions in mathematical domains. Despite evidence that pre-trained base models have strong latent capabilities for downstream tasks, with post-training adding minimal improvements, effective elicitation remains challenging. The Contrast Consistent Search (CCS) method is an unsupervised elicitation approach that uses logical consistency to find latent knowledge without supervision. However, CCS underperforms supervised approaches and often fails to identify knowledge due to other prominent features satisfying consistency properties. Introducing Internal Coherence Maximization (ICM) Researchers from Anthropic, Schmidt Sciences, Independent, Constellation, New York University, and George Washington University have proposed Internal Coherence Maximization (ICM), which fine-tunes pre-trained models on their own generated labels without using any provided labels. ICM solves this by searching for label sets that are both logically consistent and mutually predictable according to the pre-trained model. Since optimal label set identification remains computationally infeasible, ICM uses a simulated annealing-inspired search algorithm to approximate the maximum objective. Moreover, this method matches the performance of training on golden labels on TruthfulQA and GSM8K, and outperforms training on crowdsourced human labels on Alpaca. How the ICM Algorithm Works The ICM algorithm follows an iterative three-step process: (a) the system samples a new unlabeled example from the dataset for potential inclusion, (b) it determines the optimal label for this example while simultaneously resolving any logical inconsistencies, and (c) the algorithm evaluates whether to accept this new labeled example based on the scoring function. ICM is evaluated across three datasets: TruthfulQA for truthfulness assessment, GSM8K-verification for mathematical correctness, and Alpaca for helpfulness and harmlessness. Researchers used four baselines in their experiments: Zero-shot, Zero-shot (Chat), Golden Label, and Human Label. Moreover, Experiments used two open-weight models, Llama 3.1 8B and 70B, and two proprietary models: Claude 3 Haiku and Claude 3.5 Haiku. Benchmark Performance and Model Comparisons In superhuman capability elicitation tasks, ICM matches golden supervision accuracy at 80%, outperforming the estimated human accuracy of 60%. Using ICM-generated reward models, researchers successfully trained an assistant chatbot without human supervision. The unsupervised reward model achieves 75.0% accuracy on RewardBench, compared to 72.2% for human-supervised alternatives trained on production data. Moreover, using both the unsupervised and human-supervised RM, two policies are trained with RL to create helpful, harmless, and honest assistants. The policy trained with the unsupervised RM achieves a 60% win rate. However, these policies still lag behind the publicly released Claude 3.5 Haiku, which achieves 92% win rates. Conclusion and Future Outlook This paper introduces Internal Coherence Maximization (ICM), an advancement in unsupervised LM for fine-tuning pre-trained models on self-generated labels. The method consistently matches golden supervision performance and surpasses crowdsourced human supervision across GSM8K-verification, TruthfulQA, and Alpaca reward modeling tasks. However, ICM’s limitations include dependency on concept salience within pre-trained models and ineffectiveness with long inputs due to context window constraints. As LMs advance beyond human evaluation capabilities, ICM offers promising alternatives to traditional RLHF, ensuring model alignment with human intent without human supervision boundaries. Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. The post Internal Coherence Maximization (ICM): A Label-Free, Unsupervised Training Framework for LLMs appeared first on MarkTechPost.

Internal Coherence Maximization (ICM): A Label-Free, Unsupervised Training Framework for LLMs Leer entrada »

Uncategorized

EPFL Researchers Unveil FG2 at CVPR: A New AI Model That Slashes Localization Errors by 28% for Autonomous Vehicles in GPS-Denied Environments

Enhancing Large Language Models for Mobility Analytics with Semantic Location Tokenization

Enhancing multimodal analogical reasoning with Logic Augmented Generation

Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache

LLM-as-a-Judge for Reference-less Automatic Code Validation and Refinement for Natural Language to Bash in IT Automation

Evaluating and Improving Robustness in Large Language Models: A Survey and Future Directions

Sakana AI Introduces Text-to-LoRA (T2L): A Hypernetwork that Generates Task-Specific LLM Adapters (LoRAs) based on a Text Description of the Task

Rethinking AI: DeepSeek’s playbook shakes up the high-spend, high-compute paradigm

MemOS: A Memory-Centric Operating System for Evolving and Adaptive Large Language Models

Internal Coherence Maximization (ICM): A Label-Free, Unsupervised Training Framework for LLMs

Nuestros servicios

Inicio

Cómo funciona

Noticias

Precios

Soporte

Centro de ayuda

Reportar un problema

Dar comentarios

Política de privacidad

Cuenta de usuario

Síguenos