YouZum

Committee

AI, Committee, Notizie, Uncategorized

A Matter of Representation: Towards Graph-Based Abstract Code Generation

arXiv:2510.13163v1 Announce Type: new Abstract: Most large language models (LLMs) today excel at generating raw, sequential code with minimal abstractions and custom structures. However, there has been little work on graph-based abstract code generation, where significant logic is encapsulated in predefined nodes and execution flow is determined by edges. This is relevant for visual programming languages, and in cases where raw source code is inaccessible to users and LLM training sets. In this work, we propose and evaluate JSON representations for graphs to enable high accuracy graph-based abstract code generation. We evaluate these representations on ScratchTest, a mini-benchmark based on our custom Python re-implementation of Scratch, which tests the LLM in code graph space. Our findings demonstrate that LLMs can indeed perform the aforementioned generation task in a single pass without relying on specialized or complex pipelines, given the correct graph representations. We also show that different representations induce significantly different accuracies, highlighting the instrumental role of representations in this generation task. All in all, this work establishes the first steps towards representation learning for graph-based abstract code generation.

A Matter of Representation: Towards Graph-Based Abstract Code Generation Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

QeRL: NVFP4-Quantized Reinforcement Learning (RL) Brings 32B LLM Training to a Single H100—While Improving Exploration

What would you build if you could run Reinforcement Learning (RL) post-training on a 32B LLM in 4-bit NVFP4—on a single H100—with BF16-level accuracy and 1.2–1.5× step speedups? NVIDIA researchers (with collaborators from MIT, HKU, and Tsinghua) have open-sourced QeRL (Quantization-enhanced Reinforcement Learning), a training framework that pushes Reinforcement Learning (RL) post-training into 4-bit FP4 (NVFP4) while keeping gradient math in higher precision via LoRA. The research team reports >1.5× speedups in the rollout phase, ~1.8× end-to-end vs QLoRA in one setting, and the first demonstration of RL training for a 32B policy on a single H100-80GB GPU. https://arxiv.org/pdf/2510.11696 What QeRL changes in the Reinforcement Learning (RL) loop? Most RLHF/GRPO/DAPO pipelines spend the bulk of wall-clock time in rollouts (token generation). QeRL shifts the policy’s weight path to NVFP4 (FP4) with dual-level scaling and keeps logits/gradients in higher precision via LoRA, so backprop remains stable while the sampling path hits hardware-efficient FP4×BF16 kernels (Marlin). The result is faster prefill/decoding during rollouts without maintaining a separate full-precision policy. Mechanically, the research team integrates Marlin-based FP4 kernels in both rollout and prefill, while LoRA limits trainable parameters. This directly targets the stage that dominates RL cost and latency for long reasoning traces. https://arxiv.org/pdf/2510.11696 Quantization as exploration, made schedulable A core empirical finding: deterministic FP4 quantization raises policy entropy, flattening token distributions early in training and improving exploration versus 16-bit LoRA and NF4-based QLoRA baselines. To control that effect over time, QeRL introduces Adaptive Quantization Noise (AQN)—channel-wise Gaussian perturbations mapped into LayerNorm scale parameters and annealed with an exponential schedule. This keeps kernel fusion intact (no extra weight tensors) while transitioning from exploration to exploitation. In ablations, QeRL shows faster reward growth and higher final scores on math-reasoning tasks under both GRPO and DAPO, aligning with the hypothesis that structured noise in parameter space can be a useful exploration driver in RL, even though such noise is typically detrimental in supervised fine-tuning. Reported results On Qwen2.5 backbone model, the research team show that NVFP4+LoRA outperforms vanilla LoRA and QLoRA in rollout throughput and overall training time, with >2× rollout throughput on 14B/32B models against QLoRA and ~1.8× end-to-end vs QLoRA in a representative setup. They also demonstrate training a 32B policy with GRPO on a single H100-80GB, enabled by the lower memory footprint of weight-only FP4. Accuracy is competitive with higher-precision baselines. For a 7B model, the research team reports GSM8K = 90.8% and MATH500 = 77.4%, surpassing 16-bit LoRA and QLoRA under their setup and matching full-parameter fine-tuning. Across broader math benchmarks (e.g., BigMath), QeRL maintains parity or advantage, while converging faster due to improved exploration. https://arxiv.org/pdf/2510.11696 What this is—and isn’t? QeRL is weight-only FP4 with LoRA updates; it does not claim FP4 precision for logits/gradients. The benefits concentrate in rollout/prefill throughput and memory footprint, with empirical evidence that quantization-induced entropy aids RL exploration when AQN modulates it over training. Generalization to modalities beyond math-reasoning tasks or to safety/tool-use RL depends on reward design and sequence lengths. Key Takeaways QeRL combines NVFP4 4-bit weight quantization with LoRA to accelerate the rollout phase and cut memory, enabling RL for a 32B LLM on a single H100-80GB. Quantization acts as exploration: FP4 increases policy entropy, while Adaptive Quantization Noise (AQN) schedules channel-wise noise via LayerNorm scales. Reported efficiency: >1.5× rollout speedups vs 16-bit LoRA and ~1.8× end-to-end vs QLoRA; >2× rollout throughput vs QLoRA on 14B/32B setups. Accuracy holds: Qwen2.5-7B reaches 90.8% on GSM8K and 77.4% on MATH500, matching full-parameter fine-tuning under the paper’s setup. NVFP4 is a hardware-optimized 4-bit floating format with two-level scaling (FP8 E4M3 block scalers + FP32 tensor scale), enabling efficient Marlin-based kernels. Editorial Comments QeRL speeds up the RL rollout stage. It quantizes weights to NVFP4 and keeps updates and logits in higher precision using LoRA. It reports >1.5× rollout speedups and can train a 32B policy on a single H100-80GB GPU. It adds Adaptive Quantization Noise to make exploration a controlled signal during training. Results are shown mainly on math-reasoning tasks using GRPO and DAPO. The gains rely on NVFP4 kernel support such as Marlin. Check out the FULL CODES here and Paper. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well. The post QeRL: NVFP4-Quantized Reinforcement Learning (RL) Brings 32B LLM Training to a Single H100—While Improving Exploration appeared first on MarkTechPost.

QeRL: NVFP4-Quantized Reinforcement Learning (RL) Brings 32B LLM Training to a Single H100—While Improving Exploration Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Precise Attribute Intensity Control in Large Language Models via Targeted Representation Editing

arXiv:2510.12121v1 Announce Type: cross Abstract: Precise attribute intensity control–generating Large Language Model (LLM) outputs with specific, user-defined attribute intensities–is crucial for AI systems adaptable to diverse user expectations. Current LLM alignment methods, however, typically provide only directional or open-ended guidance, failing to reliably achieve exact attribute intensities. We address this limitation with three key designs: (1) reformulating precise attribute intensity control as a target-reaching problem, rather than simple maximization; (2) training a lightweight value function via temporal-difference learning to predict final attribute intensity scores from partial generations, thereby steering LLM outputs; and (3) employing gradient-based interventions on hidden representations to navigate the model precisely towards specific attribute intensity targets. Our method enables fine-grained, continuous control over attribute intensities, moving beyond simple directional alignment. Experiments on LLaMA-3.2-3b and Phi-4-mini confirm our method’s ability to steer text generation to user-specified attribute intensities with high accuracy. Finally, we demonstrate efficiency enhancements across three downstream tasks: preference data synthesis, Pareto frontier approximation and optimization, and distillation of aligned behaviors for intervention-free inference. Our code is available on https://github.com/Pre-Control/pre-control

Precise Attribute Intensity Control in Large Language Models via Targeted Representation Editing Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Credal Transformer: A Principled Approach for Quantifying and Mitigating Hallucinations in Large Language Models

arXiv:2510.12137v1 Announce Type: new Abstract: Large Language Models (LLMs) hallucinate, generating factually incorrect yet confident assertions. We argue this stems from the Transformer’s Softmax function, which creates “Artificial Certainty” by collapsing ambiguous attention scores into a single probability distribution, discarding uncertainty information at each layer. To fix this, we introduce the Credal Transformer, which replaces standard attention with a Credal Attention Mechanism (CAM) based on evidential theory. CAM produces a “credal set” (a set of distributions) instead of a single attention vector, with the set’s size directly measuring model uncertainty. We implement this by re-conceptualizing attention scores as evidence masses for a Dirichlet distribution: sufficient evidence recovers standard attention, while insufficient evidence yields a diffuse distribution, representing ambiguity. Empirically, the Credal Transformer identifies out-of-distribution inputs, quantifies ambiguity, and significantly reduces confident errors on unanswerable questions by abstaining. Our contribution is a new architecture to mitigate hallucinations and a design paradigm that integrates uncertainty quantification directly into the model, providing a foundation for more reliable AI.

Credal Transformer: A Principled Approach for Quantifying and Mitigating Hallucinations in Large Language Models Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Agent Learning via Early Experience

arXiv:2510.08558v2 Announce Type: replace-cross Abstract: A long-term goal of language agents is to learn and improve through their own experience, ultimately outperforming humans in complex, real-world tasks. However, training agents from experience data with reinforcement learning remains difficult in many environments, which either lack verifiable rewards (e.g., websites) or require inefficient long-horizon rollouts (e.g., multi-turn tool use). As a result, most current agents rely on supervised fine-tuning on expert data, which is challenging to scale and generalizes poorly. This limitation stems from the nature of expert demonstrations: they capture only a narrow range of scenarios and expose the agent to limited environment diversity. We address this limitation with a middle-ground paradigm we call early experience: interaction data generated by the agent’s own actions, where the resulting future states serve as supervision without reward signals. Within this paradigm we study two strategies of using such data: (1) Implicit world modeling, which uses collected states to ground the policy in environment dynamics; and (2) Self-reflection, where the agent learns from its suboptimal actions to improve reasoning and decision-making. We evaluate across eight diverse environments and multiple model families. Our approaches consistently improve effectiveness and out-of-domain generalization, highlighting the value of early experience. Moreover, in environments with verifiable rewards, our results provide promising signals that early experience offers a strong foundation for subsequent reinforcement learning, positioning it as a practical bridge between imitation learning and fully experience-driven agents.

Agent Learning via Early Experience Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

SafeMT: Multi-turn Safety for Multimodal Language Models

arXiv:2510.12133v1 Announce Type: new Abstract: With the widespread use of multi-modal Large Language models (MLLMs), safety issues have become a growing concern. Multi-turn dialogues, which are more common in everyday interactions, pose a greater risk than single prompts; however, existing benchmarks do not adequately consider this situation. To encourage the community to focus on the safety issues of these models in multi-turn dialogues, we introduce SafeMT, a benchmark that features dialogues of varying lengths generated from harmful queries accompanied by images. This benchmark consists of 10,000 samples in total, encompassing 17 different scenarios and four jailbreak methods. Additionally, we propose Safety Index (SI) to evaluate the general safety of MLLMs during conversations. We assess the safety of 17 models using this benchmark and discover that the risk of successful attacks on these models increases as the number of turns in harmful dialogues rises. This observation indicates that the safety mechanisms of these models are inadequate for recognizing the hazard in dialogue interactions. We propose a dialogue safety moderator capable of detecting malicious intent concealed within conversations and providing MLLMs with relevant safety policies. Experimental results from several open-source models indicate that this moderator is more effective in reducing multi-turn ASR compared to existed guard models.

SafeMT: Multi-turn Safety for Multimodal Language Models Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

EmoDebt: Bayesian-Optimized Emotional Intelligence for Strategic Agent-to-Agent Debt Recovery

arXiv:2503.21080v5 Announce Type: replace Abstract: The emergence of autonomous Large Language Model (LLM) agents has created a new ecosystem of strategic, agent-to-agent interactions. However, a critical challenge remains unaddressed: in high-stakes, emotion-sensitive domains like debt collection, LLM agents pre-trained on human dialogue are vulnerable to exploitation by adversarial counterparts who simulate negative emotions to derail negotiations. To fill this gap, we first contribute a novel dataset of simulated debt recovery scenarios and a multi-agent simulation framework. Within this framework, we introduce EmoDebt, an LLM agent architected for robust performance. Its core innovation is a Bayesian-optimized emotional intelligence engine that reframes a model’s ability to express emotion in negotiation as a sequential decision-making problem. Through online learning, this engine continuously tunes EmoDebt’s emotional transition policies, discovering optimal counter-strategies against specific debtor tactics. Extensive experiments on our proposed benchmark demonstrate that EmoDebt achieves significant strategic robustness, substantially outperforming non-adaptive and emotion-agnostic baselines across key performance metrics, including success rate and operational efficiency. By introducing both a critical benchmark and a robustly adaptive agent, this work establishes a new foundation for deploying strategically robust LLM agents in adversarial, emotion-sensitive debt interactions.

EmoDebt: Bayesian-Optimized Emotional Intelligence for Strategic Agent-to-Agent Debt Recovery Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

The Visual Iconicity Challenge: Evaluating Vision-Language Models on Sign Language Form-Meaning Mapping

arXiv:2510.08482v2 Announce Type: replace-cross Abstract: Iconicity, the resemblance between linguistic form and meaning, is pervasive in signed languages, offering a natural testbed for visual grounding. For vision-language models (VLMs), the challenge is to recover such essential mappings from dynamic human motion rather than static context. We introduce the Visual Iconicity Challenge, a novel video-based benchmark that adapts psycholinguistic measures to evaluate VLMs on three tasks: (i) phonological sign-form prediction (e.g., handshape, location), (ii) transparency (inferring meaning from visual form), and (iii) graded iconicity ratings. We assess 13 state-of-the-art VLMs in zero- and few-shot settings on Sign Language of the Netherlands and compare them to human baselines. On phonological form prediction, VLMs recover some handshape and location detail but remain below human performance; on transparency, they are far from human baselines; and only top models correlate moderately with human iconicity ratings. Interestingly, models with stronger phonological form prediction correlate better with human iconicity judgment, indicating shared sensitivity to visually grounded structure. Our findings validate these diagnostic tasks and motivate human-centric signals and embodied learning methods for modelling iconicity and improving visual grounding in multimodal models.

The Visual Iconicity Challenge: Evaluating Vision-Language Models on Sign Language Form-Meaning Mapping Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Detecting Hallucinations in Authentic LLM-Human Interactions

arXiv:2510.10539v1 Announce Type: new Abstract: As large language models (LLMs) are increasingly applied in sensitive domains such as medicine and law, hallucination detection has become a critical task. Although numerous benchmarks have been proposed to advance research in this area, most of them are artificially constructed–either through deliberate hallucination induction or simulated interactions–rather than derived from genuine LLM-human dialogues. Consequently, these benchmarks fail to fully capture the characteristics of hallucinations that occur in real-world usage. To address this limitation, we introduce AuthenHallu, the first hallucination detection benchmark built entirely from authentic LLM-human interactions. For AuthenHallu, we select and annotate samples from genuine LLM-human dialogues, thereby providing a faithful reflection of how LLMs hallucinate in everyday user interactions. Statistical analysis shows that hallucinations occur in 31.4% of the query-response pairs in our benchmark, and this proportion increases dramatically to 60.0% in challenging domains such as Math & Number Problems. Furthermore, we explore the potential of using vanilla LLMs themselves as hallucination detectors and find that, despite some promise, their current performance remains insufficient in real-world scenarios.

Detecting Hallucinations in Authentic LLM-Human Interactions Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Let’s Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM’s Math Capability

arXiv:2505.23703v4 Announce Type: replace-cross Abstract: Enhancing the mathematical reasoning capabilities of LLMs has garnered significant attention in both the mathematical and computer science communities. Recent works have made substantial progress in both Natural Language (NL) reasoning and Formal Language (FL) reasoning by leveraging the potential of pure Reinforcement Learning (RL) methods on base models. However, RL approaches struggle to impart new capabilities not presented in the base model, highlighting the need to integrate more knowledge like FL into NL math reasoning effectively. Yet, this integration is challenging due to inherent disparities in problem structure and reasoning format between NL and FL. To address these challenges, we introduce **NL-FL HybridReasoning (NFL-HR)**, an end-to-end framework designed to incorporate the FL expert into NL math problem-solving. To bridge the NL and FL input format gap, we propose the NL-FL Problem Alignment method, which reformulates the Question-Answering (QA) problems in NL as existence theorems in FL. Subsequently, the Mixed Problem Input technique we provide enables the FL reasoner to handle both QA and existence problems concurrently. Lastly, we mitigate the NL and FL output format gap in reasoning through an LLM-based Answer Extraction mechanism. Comprehensive experiments demonstrate that the NFL-HR framework achieves **89.80**% and **84.34%** accuracy rates on the MATH-500 and the AMC benchmarks, surpassing the NL baseline by **4.60%** and **4.82%**, respectively. Notably, some problems resolved by our framework remain unsolved by the NL baseline model even under a larger number of trials.

Let’s Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM’s Math Capability Leggi l'articolo »

We use cookies to improve your experience and performance on our website. You can learn more at Politica sulla privacy and manage your privacy settings by clicking Settings.

Privacy Preferences

You can choose your cookie settings by turning on/off each type of cookie as you wish, except for essential cookies.

Allow All
Manage Consent Preferences
  • Always Active

Save
it_IT