YouZum

AI

AI, Committee, Notizie, Uncategorized

$A^2R^2$: Advancing Img2LaTeX Conversion via Visual Reasoning with Attention-Guided Refinement

arXiv:2507.20890v1 Announce Type: cross Abstract: Img2LaTeX is a practically significant task that involves converting mathematical expressions or tabular data from images into LaTeX code. In recent years, vision-language models (VLMs) have demonstrated strong performance across a variety of visual understanding tasks, owing to their generalization capabilities. While some studies have explored the use of VLMs for the Img2LaTeX task, their performance often falls short of expectations. Empirically, VLMs sometimes struggle with fine-grained visual elements, leading to inaccurate LaTeX predictions. To address this challenge, we propose $A^2R^2$: Advancing Img2LaTeX Conversion via Visual Reasoning with Attention-Guided Refinement, a framework that effectively integrates attention localization and iterative refinement within a visual reasoning framework, enabling VLMs to perform self-correction and progressively improve prediction quality. For effective evaluation, we introduce a new dataset, Img2LaTex-Hard-1K, consisting of 1,100 carefully curated and challenging examples designed to rigorously evaluate the capabilities of VLMs within this task domain. Extensive experimental results demonstrate that: (1) $A^2R^2$ significantly improves model performance across six evaluation metrics spanning both textual and visual levels, consistently outperforming other baseline methods; (2) Increasing the number of inference rounds yields notable performance gains, underscoring the potential of $A^2R^2$ in test-time scaling scenarios; (3) Ablation studies and human evaluations validate the practical effectiveness of our approach, as well as the strong synergy among its core components during inference.

$A^2R^2$: Advancing Img2LaTeX Conversion via Visual Reasoning with Attention-Guided Refinement Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

SGPO: Self-Generated Preference Optimization based on Self-Improver

arXiv:2507.20181v1 Announce Type: new Abstract: Large language models (LLMs), despite their extensive pretraining on diverse datasets, require effective alignment to human preferences for practical and reliable deployment. Conventional alignment methods typically employ off-policy learning and depend on human-annotated datasets, which limits their broad applicability and introduces distribution shift issues during training. To address these challenges, we propose Self-Generated Preference Optimization based on Self-Improver (SGPO), an innovative alignment framework that leverages an on-policy self-improving mechanism. Specifically, the improver refines responses from a policy model to self-generate preference data for direct preference optimization (DPO) of the policy model. Here, the improver and policy are unified into a single model, and in order to generate higher-quality preference data, this self-improver learns to make incremental yet discernible improvements to the current responses by referencing supervised fine-tuning outputs. Experimental results on AlpacaEval 2.0 and Arena-Hard show that the proposed SGPO significantly improves performance over DPO and baseline self-improving methods without using external preference data.

SGPO: Self-Generated Preference Optimization based on Self-Improver Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents

arXiv:2502.18509v2 Announce Type: replace-cross Abstract: Conversational agents are increasingly woven into individuals’ personal lives, yet users often underestimate the privacy risks associated with them. The moment users share information with these agents-such as large language models (LLMs)-their private information becomes vulnerable to exposure. In this paper, we characterize the notion of contextual privacy for user interactions with LLM-based Conversational Agents (LCAs). It aims to minimize privacy risks by ensuring that users (sender) disclose only information that is both relevant and necessary for achieving their intended goals when interacting with LCAs (untrusted receivers). Through a formative design user study, we observe how even “privacy-conscious” users inadvertently reveal sensitive information through indirect disclosures. Based on insights from this study, we propose a locally deployable framework that operates between users and LCAs, identifying and reformulating out-of-context information in user prompts. Our evaluation using examples from ShareGPT shows that lightweight models can effectively implement this framework, achieving strong gains in contextual privacy while preserving the user’s intended interaction goals. Notably, about 76% of participants in our human evaluation preferred the reformulated prompts over the original ones, validating the usability and effectiveness of contextual privacy in our proposed framework. We opensource the code at https://github.com/IBM/contextual-privacy-LLM.

Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Amazon Develops an AI Architecture that Cuts Inference Time 30% by Activating Only Relevant Neurons

Amazon researchers developed a new AI architecture that cuts inference time by 30% by selecting only task-relevant neurons, similar to how the brain uses specialized regions for specific tasks. This breakthrough approach addresses one of the biggest challenges facing large AI models: the computational expense and latency associated with activating every neuron for every request, regardless of their relevance. The traditional deployment of large language models (LLMs) and foundational AI systems has relied on activating the full network for every input. While this guarantees versatility, it results in significant inefficiency—much of the network’s activity is superfluous for any given prompt. Inspired by the human brain’s efficiency—the brain flexibly recruits only the circuits it needs for a given cognitive task—Amazon’s architecture mimics this behavior by activating neurons most relevant to the current input context. Dynamic, Context-Aware Pruning At the heart of this innovation is dynamic, context-aware pruning. Rather than trimming the model statically during training and locking in those changes, Amazon’s solution prunes the network “on the fly,” during inference itself. This enables the model to remain large and versatile, yet efficient and fast-active for any specific task. Before processing an input, the model evaluates which neurons or modules will be most useful, based on signals such as the type of task (e.g., legal writing, translation, or coding assistance), language, and other context features. It leverages a gate predictor, a lightweight neural component trained to generate a “mask” that determines which neurons are switched on for that particular sequence. The gating decisions are binary, so neurons are either fully active or completely skipped, ensuring real compute savings. How the System Works The architecture introduces a context-aware gating mechanism. This mechanism analyzes input features (and, for speech models, auxiliary information such as language and task tokens) to decide which modules—such as self-attention blocks, feed-forward networks, or specialized convolutions—are essential for the current step. For example, in a speech recognition task, it may activate local context modules for detailed sound analysis while skipping unnecessary components that are only beneficial for other tasks. This pruning strategy is structured and modular: instead of removing individual weights (which can lead to hardware inefficiency), it skips entire modules or layers. This preserves the model’s structural integrity and ensures compatibility with GPU and modern hardware accelerators. The gate predictor model is trained with a sparsity loss to achieve a target sparsity: the proportion of modules skipped. Training uses techniques like the Gumbel-Softmax estimator, ensuring that gating behavior remains differentiable during optimization, but ultimately yields crisp, binary neuron selection at inference. Demonstrated Results: Speed Without Sacrificing Quality Experiments show that dynamically skipping irrelevant modules can: Reduce inference time by up to 34% for multilingual speech-to-text or automatic speech recognition (ASR) tasks—where typical baseline models suffered 9.28s latency, pruned models ran in as little as 5.22s, depending on the task and desired sparsity level. Decrease FLOPs (floating-point operations) by over 60% at high sparsity levels, greatly lowering cloud and hardware costs. Maintain output quality: Pruning the decoder in particular preserves BLEU scores (for translation tasks) and Word Error Rate (WER) for ASR up to moderate sparsity, meaning users see no drop in model performance until very aggressive pruning is applied. Provide interpretability: Analyzing pruned module patterns reveals which parts of the model are essential for each context—local context modules dominate in ASR, while feed-forward networks are prioritized for speech translation. Task and Language Adaptation A core insight is that optimal pruning strategies—meaning which modules to retain or skip—can change dramatically depending on the task and language. For instance: In ASR, the importance of local context modules (cgMLP) is paramount, while the decoder can be sparsified heavily with little accuracy loss. For speech translation (ST), both the encoder and the decoder require more balanced attention, as the decoder’s feed-forward layers are essential. In multilingual or multitask scenarios, module selection adapts but shows consistent patterns within each type, highlighting the learned specialization within the architecture. Broader Implications This dynamic, modular pruning opens the door for: More energy-efficient, scalable AI—especially vital as LLMs and multimodal models continue to grow. AI models that can personalize their compute pathways—not only by task but potentially by user profile, region, or device. Transferability to other domains, such as natural language processing and computer vision, wherever foundation models are used. By selectively activating only task-relevant modules in real time, inspired by biological neural efficiency, Amazon’s architecture points the way toward AI that is both powerful and practical for global, real-world use. Check out the Paper and Technical details. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. The post Amazon Develops an AI Architecture that Cuts Inference Time 30% by Activating Only Relevant Neurons appeared first on MarkTechPost.

Amazon Develops an AI Architecture that Cuts Inference Time 30% by Activating Only Relevant Neurons Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Data Augmentation for Spoken Grammatical Error Correction

arXiv:2507.19374v1 Announce Type: new Abstract: While there exist strong benchmark datasets for grammatical error correction (GEC), high-quality annotated spoken datasets for Spoken GEC (SGEC) are still under-resourced. In this paper, we propose a fully automated method to generate audio-text pairs with grammatical errors and disfluencies. Moreover, we propose a series of objective metrics that can be used to evaluate the generated data and choose the more suitable dataset for SGEC. The goal is to generate an augmented dataset that maintains the textual and acoustic characteristics of the original data while providing new types of errors. This augmented dataset should augment and enrich the original corpus without altering the language assessment scores of the second language (L2) learners. We evaluate the use of the augmented corpus both for written GEC (the text part) and for SGEC (the audio-text pairs). Our experiments are conducted on the S&I Corpus, the first publicly available speech dataset with grammar error annotations.

Data Augmentation for Spoken Grammatical Error Correction Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Evaluation of LLM Vulnerabilities to Being Misused for Personalized Disinformation Generation

arXiv:2412.13666v2 Announce Type: replace Abstract: The capabilities of recent large language models (LLMs) to generate high-quality content indistinguishable by humans from human-written texts raises many concerns regarding their misuse. Previous research has shown that LLMs can be effectively misused for generating disinformation news articles following predefined narratives. Their capabilities to generate personalized (in various aspects) content have also been evaluated and mostly found usable. However, a combination of personalization and disinformation abilities of LLMs has not been comprehensively studied yet. Such a dangerous combination should trigger integrated safety filters of the LLMs, if there are some. This study fills this gap by evaluating vulnerabilities of recent open and closed LLMs, and their willingness to generate personalized disinformation news articles in English. We further explore whether the LLMs can reliably meta-evaluate the personalization quality and whether the personalization affects the generated-texts detectability. Our results demonstrate the need for stronger safety-filters and disclaimers, as those are not properly functioning in most of the evaluated LLMs. Additionally, our study revealed that the personalization actually reduces the safety-filter activations; thus effectively functioning as a jailbreak. Such behavior must be urgently addressed by LLM developers and service providers.

Evaluation of LLM Vulnerabilities to Being Misused for Personalized Disinformation Generation Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

HIVMedQA: Benchmarking large language models for HIV medical decision support

arXiv:2507.18143v2 Announce Type: replace Abstract: Large language models (LLMs) are emerging as valuable tools to support clinicians in routine decision-making. HIV management is a compelling use case due to its complexity, including diverse treatment options, comorbidities, and adherence challenges. However, integrating LLMs into clinical practice raises concerns about accuracy, potential harm, and clinician acceptance. Despite their promise, AI applications in HIV care remain underexplored, and LLM benchmarking studies are scarce. This study evaluates the current capabilities of LLMs in HIV management, highlighting their strengths and limitations. We introduce HIVMedQA, a benchmark designed to assess open-ended medical question answering in HIV care. The dataset consists of curated, clinically relevant questions developed with input from an infectious disease physician. We evaluated seven general-purpose and three medically specialized LLMs, applying prompt engineering to enhance performance. Our evaluation framework incorporates both lexical similarity and an LLM-as-a-judge approach, extended to better reflect clinical relevance. We assessed performance across key dimensions: question comprehension, reasoning, knowledge recall, bias, potential harm, and factual accuracy. Results show that Gemini 2.5 Pro consistently outperformed other models across most dimensions. Notably, two of the top three models were proprietary. Performance declined as question complexity increased. Medically fine-tuned models did not always outperform general-purpose ones, and larger model size was not a reliable predictor of performance. Reasoning and comprehension were more challenging than factual recall, and cognitive biases such as recency and status quo were observed. These findings underscore the need for targeted development and evaluation to ensure safe, effective LLM integration in clinical care.

HIVMedQA: Benchmarking large language models for HIV medical decision support Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

TokenSmith: Streamlining Data Editing, Search, and Inspection for Large-Scale Language Model Training and Interpretability

arXiv:2507.19419v1 Announce Type: new Abstract: Understanding the relationship between training data and model behavior during pretraining is crucial, but existing workflows make this process cumbersome, fragmented, and often inaccessible to researchers. We present TokenSmith, an open-source library for interactive editing, inspection, and analysis of datasets used in Megatron-style pretraining frameworks such as GPT-NeoX, Megatron, and NVIDIA NeMo. TokenSmith supports a wide range of operations including searching, viewing, ingesting, exporting, inspecting, and sampling data, all accessible through a simple user interface and a modular backend. It also enables structured editing of pretraining data without requiring changes to training code, simplifying dataset debugging, validation, and experimentation. TokenSmith is designed as a plug and play addition to existing large language model pretraining workflows, thereby democratizing access to production-grade dataset tooling. TokenSmith is hosted on GitHub1, with accompanying documentation and tutorials. A demonstration video is also available on YouTube.

TokenSmith: Streamlining Data Editing, Search, and Inspection for Large-Scale Language Model Training and Interpretability Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation

arXiv:2501.12612v3 Announce Type: replace Abstract: Text-to-image (T2I) models have rapidly advanced, enabling the generation of high-quality images from text prompts across various domains. However, these models present notable safety concerns, including the risk of generating harmful, biased, or private content. Current research on assessing T2I safety remains in its early stages. While some efforts have been made to evaluate models on specific safety dimensions, many critical risks remain unexplored. To address this gap, we introduce T2ISafety, a safety benchmark that evaluates T2I models across three key domains: toxicity, fairness, and bias. We build a detailed hierarchy of 12 tasks and 44 categories based on these three domains, and meticulously collect 70K corresponding prompts. Based on this taxonomy and prompt set, we build a large-scale T2I dataset with 68K manually annotated images and train an evaluator capable of detecting critical risks that previous work has failed to identify, including risks that even ultra-large proprietary models like GPTs cannot correctly detect. We evaluate 12 prominent diffusion models on T2ISafety and reveal several concerns including persistent issues with racial fairness, a tendency to generate toxic content, and significant variation in privacy protection across the models, even with defense methods like concept erasing. Data and evaluator are released under https://github.com/adwardlee/t2i_safety.

T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation Leggi l'articolo »

it_IT