YouZum

Notizie

AI, Committee, Notizie, Uncategorized

Fun-ASR Technical Report

arXiv:2509.12508v3 Announce Type: replace Abstract: In recent years, automatic speech recognition (ASR) has witnessed transformative advancements driven by three complementary paradigms: data scaling, model size scaling, and deep integration with large language models (LLMs). However, LLMs are prone to hallucination, which can significantly degrade user experience in real-world ASR applications. In this paper, we present Fun-ASR, a large-scale, LLM-based ASR system that synergistically combines massive data, large model capacity, LLM integration, and reinforcement learning to achieve state-of-the-art performance across diverse and complex speech recognition scenarios. Moreover, Fun-ASR is specifically optimized for practical deployment, with enhancements in streaming capability, noise robustness, code-switching, hotword customization, and satisfying other real-world application requirements. Experimental results show that while most LLM-based ASR systems achieve strong performance on open-source benchmarks, they often underperform on real industry evaluation sets. Thanks to production-oriented optimizations, Fun-ASR achieves state-of-the-art performance on real application datasets, demonstrating its effectiveness and robustness in practical settings.

Fun-ASR Technical Report Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

How Many Parameters Does Your Task Really Need? Task Specific Pruning with LLM-Sieve

arXiv:2505.18350v2 Announce Type: replace-cross Abstract: As Large Language Models (LLMs) are increasingly deployed for narrow tasks in resource-constrained settings, a central question arises: how much of an LLM is truly necessary for a given task? We present LLM-Sieve, a framework that prunes LLMs down to the minimal parameter subset needed to preserve task performance. Our approach introduces two innovations: (i) output-aligned non-orthogonal projections, which yield more faithful low-rank approximations than traditional PCA/SVD by aligning directly with layer outputs; and (ii) adaptive pruning via a Genetic Algorithm, which automatically discovers matrix-specific pruning levels and exposes the uneven distribution of task-relevant knowledge. Across models from 3.8B to 70B parameters, LLM-Sieve removes 20-75% of weights with only 1-5% accuracy loss-substantially ahead of prior pruning methods. Beyond efficiency, our framework reveals bottleneck matrices that concentrate critical knowledge, suggesting architectural implications for future LLM design. LLM-Sieve integrates seamlessly with LoRA fine-tuning and quantization, enabling both efficient deployment and deeper understanding of knowledge organization in LLMs.

How Many Parameters Does Your Task Really Need? Task Specific Pruning with LLM-Sieve Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Google DeepMind Introduces CodeMender: A New AI Agent that Uses Gemini Deep Think to Automatically Patch Critical Software Vulnerabilities

What if an AI agent could localize a root cause, prove a candidate fix via automated analysis and testing, and proactively rewrite related code to eliminate the entire vulnerability class—then open an upstream patch for review? Google DeepMind introduces CodeMender, an AI agent that generates, validates, and upstreams fixes for real-world vulnerabilities using Gemini “Deep Think” reasoning and a tool-augmented workflow. In six months of internal deployment, CodeMender contributed 72 security patches across open-source projects, including codebases up to ~4.5M lines, and is designed to act both reactively (patching known issues) and proactively (rewriting code to remove vulnerability classes). Understanding the Architecture The agent couples large-scale code reasoning with program-analysis tooling: static and dynamic analysis, differential testing, fuzzing, and satisfiability-modulo-theory (SMT) solvers. A multi-agent design adds specialized “critique” reviewers that inspect semantic diffs and trigger self-corrections when regressions are detected. These components let the system localize root causes, synthesize candidate patches, and automatically regression-test changes before surfacing them for human review. https://deepmind.google/discover/blog/introducing-codemender-an-ai-agent-for-code-security/? Validation Pipeline and Human Gate DeepMind emphasizes automatic validation before any human touches a patch: the system tests for root-cause fixes, functional correctness, absence of regressions, and style compliance; only high-confidence patches are proposed for maintainer review. This workflow is explicitly tied to Gemini Deep Think’s planning-centric reasoning over debugger traces, code search results, and test outcomes. Proactive Hardening: Compiler-Level Guards Beyond patching, CodeMender applies security-hardening transforms at scale. Example: automated insertion of Clang’s -fbounds-safety annotations in libwebp to enforce compiler-level bounds checks—an approach that would have neutralized the 2023 libwebp heap overflow (CVE-2023-4863) exploited in a zero-click iOS chain and similar buffer over/underflows where annotations are applied. Case Studies DeepMind details two non-trivial fixes: (1) a crash initially flagged as a heap overflow traced to incorrect XML stack management; and (2) a lifetime bug requiring edits to a custom C-code generator. In both cases, agent-generated patches passed automated analysis and an LLM-judge check for functional equivalence before proposal. https://deepmind.google/discover/blog/introducing-codemender-an-ai-agent-for-code-security/? Deployment Context and Related Initiatives Google’s broader announcement frames CodeMender as part of a defensive stack that includes a new AI Vulnerability Reward Program (consolidating AI-related bounties) and the Secure AI Framework 2.0 for agent security. The post reiterates the motivation: as AI-powered vulnerability discovery scales (e.g., via BigSleep and OSS-Fuzz), automated remediation must scale in tandem. Our Comments CodeMender operationalizes Gemini Deep Think plus program-analysis tools (static/dynamic analysis, fuzzing, SMT) to localize root causes and propose patches that pass automated validation before human review. Reported early data: 72 upstreamed security fixes across open-source projects over six months, including codebases on the order of ~4.5M lines. The system also applies proactive hardening (e.g., compiler-enforced bounds via Clang -fbounds-safety) to reduce memory-safety bug classes rather than only patching instances. No latency or throughput benchmarks are published yet, so impact is best measured by validated fixes and scope of hardened code. Check out the TECHNICAL DETAILS. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well. The post Google DeepMind Introduces CodeMender: A New AI Agent that Uses Gemini Deep Think to Automatically Patch Critical Software Vulnerabilities appeared first on MarkTechPost.

Google DeepMind Introduces CodeMender: A New AI Agent that Uses Gemini Deep Think to Automatically Patch Critical Software Vulnerabilities Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Internal World Models as Imagination Networks in Cognitive Agents

arXiv:2510.04391v1 Announce Type: cross Abstract: What is the computational objective of imagination? While classical interpretations suggest imagination is useful for maximizing rewards, recent findings challenge this view. In this study, we propose that imagination serves to access an internal world model (IWM) and use psychological network analysis to explore IWMs in humans and large language models (LLMs). Specifically, we assessed imagination vividness ratings using two questionnaires and constructed imagination networks from these reports. Imagination networks from human groups showed correlations between different centrality measures, including expected influence, strength, and closeness. However, imagination networks from LLMs showed a lack of clustering and lower correlations between centrality measures under different prompts and conversational memory conditions. Together, these results indicate a lack of similarity between IWMs in human and LLM agents. Overall, our study offers a novel method for comparing internally-generated representations in humans and AI, providing insights for developing human-like imagination in artificial intelligence.

Internal World Models as Imagination Networks in Cognitive Agents Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Large Language Models Preserve Semantic Isotopies in Story Continuations

arXiv:2510.04400v1 Announce Type: new Abstract: In this work, we explore the relevance of textual semantics to Large Language Models (LLMs), extending previous insights into the connection between distributional semantics and structural semantics. We investigate whether LLM-generated texts preserve semantic isotopies. We design a story continuation experiment using 10,000 ROCStories prompts completed by five LLMs. We first validate GPT-4o’s ability to extract isotopies from a linguistic benchmark, then apply it to the generated stories. We then analyze structural (coverage, density, spread) and semantic properties of isotopies to assess how they are affected by completion. Results show that LLM completion within a given token horizon preserves semantic isotopies across multiple properties.

Large Language Models Preserve Semantic Isotopies in Story Continuations Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations

arXiv:2510.04398v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly deployed in high-risk domains. However, state-of-the-art LLMs often produce hallucinations, raising serious concerns about their reliability. Prior work has explored adversarial attacks for hallucination elicitation in LLMs, but it often produces unrealistic prompts, either by inserting gibberish tokens or by altering the original meaning. As a result, these approaches offer limited insight into how hallucinations may occur in practice. While adversarial attacks in computer vision often involve realistic modifications to input images, the problem of finding realistic adversarial prompts for eliciting LLM hallucinations has remained largely underexplored. To address this gap, we propose Semantically Equivalent and Coherent Attacks (SECA) to elicit hallucinations via realistic modifications to the prompt that preserve its meaning while maintaining semantic coherence. Our contributions are threefold: (i) we formulate finding realistic attacks for hallucination elicitation as a constrained optimization problem over the input prompt space under semantic equivalence and coherence constraints; (ii) we introduce a constraint-preserving zeroth-order method to effectively search for adversarial yet feasible prompts; and (iii) we demonstrate through experiments on open-ended multiple-choice question answering tasks that SECA achieves higher attack success rates while incurring almost no constraint violations compared to existing methods. SECA highlights the sensitivity of both open-source and commercial gradient-inaccessible LLMs to realistic and plausible prompt variations. Code is available at https://github.com/Buyun-Liang/SECA.

SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation

arXiv:2509.08825v2 Announce Type: replace Abstract: Large language models are rapidly transforming social science research by enabling the automation of labor-intensive tasks like data annotation and text analysis. However, LLM outputs vary significantly depending on the implementation choices made by researchers (e.g., model selection or prompting strategy). Such variation can introduce systematic biases and random errors, which propagate to downstream analyses and cause Type I (false positive), Type II (false negative), Type S (wrong sign), or Type M (exaggerated effect) errors. We call this phenomenon where configuration choices lead to incorrect conclusions LLM hacking. We find that intentional LLM hacking is strikingly simple. By replicating 37 data annotation tasks from 21 published social science studies, we show that, with just a handful of prompt paraphrases, virtually anything can be presented as statistically significant. Beyond intentional manipulation, our analysis of 13 million labels from 18 different LLMs across 2361 realistic hypotheses shows that there is also a high risk of accidental LLM hacking, even when following standard research practices. We find incorrect conclusions in approximately 31% of hypotheses for state-of-the-art LLMs, and in half the hypotheses for smaller language models. While higher task performance and stronger general model capabilities reduce LLM hacking risk, even highly accurate models remain susceptible. The risk of LLM hacking decreases as effect sizes increase, indicating the need for more rigorous verification of LLM-based findings near significance thresholds. We analyze 21 mitigation techniques and find that human annotations provide crucial protection against false positives. Common regression estimator correction techniques can restore valid inference but trade off Type I vs. Type II errors. We publish a list of practical recommendations to prevent LLM hacking.

Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

GPT and Prejudice: A Sparse Approach to Understanding Learned Representations in Large Language Models

arXiv:2510.01252v2 Announce Type: replace Abstract: As large language models (LLMs) are increasingly trained on massive, uncurated corpora, understanding both model representations and the data they internalize has become a major challenge. In this work, we show that pairing LLMs with sparse autoencoders (SAEs) enables interpretation not only of model behavior but also of the deeper structures, themes, and biases embedded in the training data. We train a GPT-style transformer model exclusively on the novels of Jane Austen, a corpus rich in social constructs and narrative patterns. We then apply SAEs to hidden states across multiple layers, uncovering sparse, interpretable features that reflect the key narratives and concepts present in the corpus, including gender, class, and societal duty. Our findings demonstrate that LLMs combined with SAEs can act as scalable probes into complex datasets, offering a new path for corpus exploration, bias discovery, and model interpretability at scale.

GPT and Prejudice: A Sparse Approach to Understanding Learned Representations in Large Language Models Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Emission-GPT: A domain-specific language model agent for knowledge retrieval, emission inventory and data analysis

arXiv:2510.02359v1 Announce Type: new Abstract: Improving air quality and addressing climate change relies on accurate understanding and analysis of air pollutant and greenhouse gas emissions. However, emission-related knowledge is often fragmented and highly specialized, while existing methods for accessing and compiling emissions data remain inefficient. These issues hinder the ability of non-experts to interpret emissions information, posing challenges to research and management. To address this, we present Emission-GPT, a knowledge-enhanced large language model agent tailored for the atmospheric emissions domain. Built on a curated knowledge base of over 10,000 documents (including standards, reports, guidebooks, and peer-reviewed literature), Emission-GPT integrates prompt engineering and question completion to support accurate domain-specific question answering. Emission-GPT also enables users to interactively analyze emissions data via natural language, such as querying and visualizing inventories, analyzing source contributions, and recommending emission factors for user-defined scenarios. A case study in Guangdong Province demonstrates that Emission-GPT can extract key insights–such as point source distributions and sectoral trends–directly from raw data with simple prompts. Its modular and extensible architecture facilitates automation of traditionally manual workflows, positioning Emission-GPT as a foundational tool for next-generation emission inventory development and scenario-based assessment.

Emission-GPT: A domain-specific language model agent for knowledge retrieval, emission inventory and data analysis Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Beyond Imitation: Recovering Dense Rewards from Demonstrations

arXiv:2510.02493v1 Announce Type: cross Abstract: Conventionally, supervised fine-tuning (SFT) is treated as a simple imitation learning process that only trains a policy to imitate expert behavior on demonstration datasets. In this work, we challenge this view by establishing a fundamental equivalence between SFT and Inverse Reinforcement Learning. We prove that the SFT objective is a special case of Inverse Q-Learning, which implies that the SFT process does not just learn a policy, but also an implicit, dense, token-level reward model that explains the expert demonstrations. We then show how to recover this dense reward signal directly from the SFT model by formulating a baseline-relative reward function. The availability of such a dense reward model offers numerous benefits, providing granular credit assignment for each token generated. We demonstrate one key application by using these recovered rewards to further improve the policy with reinforcement learning. Our method, Dense-Path REINFORCE, consistently outperforms the original SFT models on instruction-following benchmarks. This work reframes SFT not merely as policy imitation but as a powerful reward learning mechanism, opening new possibilities for leveraging expert demonstrations.

Beyond Imitation: Recovering Dense Rewards from Demonstrations Leggi l'articolo »

We use cookies to improve your experience and performance on our website. You can learn more at Politica sulla privacy and manage your privacy settings by clicking Settings.

Privacy Preferences

You can choose your cookie settings by turning on/off each type of cookie as you wish, except for essential cookies.

Allow All
Manage Consent Preferences
  • Always Active

Save
it_IT