YouZum

Uncategorized

AI, Committee, Notizie, Uncategorized

Completing A Systematic Review in Hours instead of Months with Interactive AI Agents

arXiv:2504.14822v2 Announce Type: replace-cross Abstract: Systematic reviews (SRs) are vital for evidence-based practice in high stakes disciplines, such as healthcare, but are often impeded by intensive labors and lengthy processes that can take months to complete. Due to the high demand for domain expertise, existing automatic summarization methods fail to accurately identify relevant studies and generate high-quality summaries. To that end, we introduce InsightAgent, a human-centered interactive AI agent powered by large language models that revolutionize this workflow. InsightAgent partitions a large literature corpus based on semantics and employs a multi-agent design for more focused processing of literature, leading to significant improvement in the quality of generated SRs. InsightAgent also provides intuitive visualizations of the corpus and agent trajectories, allowing users to effortlessly monitor the actions of the agent and provide real-time feedback based on their expertise. Our user studies with 9 medical professionals demonstrate that the visualization and interaction mechanisms can effectively improve the quality of synthesized SRs by 27.2%, reaching 79.7% of human-written quality. At the same time, user satisfaction is improved by 34.4%. With InsightAgent, it only takes a clinician about 1.5 hours, rather than months, to complete a high-quality systematic review.

Completing A Systematic Review in Hours instead of Months with Interactive AI Agents Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You

arXiv:2401.16092v4 Announce Type: replace Abstract: Text-to-image generation models have recently achieved astonishing results in image quality, flexibility, and text alignment, and are consequently employed in a fast-growing number of applications. Through improvements in multilingual abilities, a larger community now has access to this technology. However, our results show that multilingual models suffer from significant gender biases just as monolingual models do. Furthermore, the natural expectation that multilingual models will provide similar results across languages does not hold up. Instead, there are important differences between languages. We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models. We use MAGBIG to investigate the effect of multilingualism on gender bias in T2I models. To this end, we construct multilingual prompts requesting portraits of people with a certain occupation or trait. Our results show that not only do models exhibit strong gender biases but they also behave differently across languages. Furthermore, we investigate prompt engineering strategies, such as indirect, neutral formulations, to mitigate these biases. Unfortunately, these approaches have limited success and result in worse text-to-image alignment. Consequently, we call for more research into diverse representations across languages in image generators, as well as into steerability to address biased model behavior.

Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Graph-Based Spectral Decomposition for Parameter Coordination in Language Model Fine-Tuning

arXiv:2504.19583v2 Announce Type: replace-cross Abstract: This paper proposes a parameter collaborative optimization algorithm for large language models, enhanced with graph spectral analysis. The goal is to improve both fine-tuning efficiency and structural awareness during training. In the proposed method, the parameters of a pre-trained language model are treated as nodes in a graph. A weighted graph is constructed, and Laplacian spectral decomposition is applied to enable frequency-domain modeling and structural representation of the parameter space. Based on this structure, a joint loss function is designed. It combines the task loss with a spectral regularization term to facilitate collaborative updates among parameters. In addition, a spectral filtering mechanism is introduced during the optimization phase. This mechanism adjusts gradients in a structure-aware manner, enhancing the model’s training stability and convergence behavior. The method is evaluated on multiple tasks, including traditional fine-tuning comparisons, few-shot generalization tests, and convergence speed analysis. In all settings, the proposed approach demonstrates superior performance. The experimental results confirm that the spectral collaborative optimization framework effectively reduces parameter perturbations and improves fine-tuning quality while preserving overall model performance. This work contributes significantly to the field of artificial intelligence by advancing parameter-efficient training methodologies for large-scale models, reinforcing the importance of structural signal processing in deep learning optimization, and offering a robust, generalizable framework for enhancing language model adaptability and performance.

Graph-Based Spectral Decomposition for Parameter Coordination in Language Model Fine-Tuning Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

This AI Paper Introduces LLaDA-V: A Purely Diffusion-Based Multimodal Large Language Model for Visual Instruction Tuning and Multimodal Reasoning

Multimodal large language models (MLLMs) are designed to process and generate content across various modalities, including text, images, audio, and video. These models aim to understand and integrate information from different sources, enabling applications such as visual question answering, image captioning, and multimodal dialogue systems. The development of MLLMs represents a significant step toward creating AI systems that can interpret and interact with the world in a more human-like manner. A primary challenge in developing effective MLLMs lies in integrating diverse input types, particularly visual data, into language models while maintaining high performance across tasks. Existing models often struggle with balancing strong language understanding and effective visual reasoning, especially when scaling to complex data. Further, many models require large datasets to perform well, making it difficult to adapt to specific tasks or domains. These challenges highlight the need for more efficient and scalable approaches to multimodal learning. Current MLLMs predominantly utilize autoregressive methods, predicting one token at a time in a left-to-right manner. While effective, this approach has limitations in handling complex multimodal contexts. Alternative methods, such as diffusion models, have been explored; however, they often exhibit weaker language understanding due to their restricted architectures or inadequate training strategies. These limitations suggest a gap where a purely diffusion-based model could offer competitive multimodal reasoning capabilities if designed effectively. Researchers from the Renmin University of China and Ant Group introduced LLaDA-V, a purely diffusion-based masked language modeling (MLLM) model that integrates visual instruction tuning with masked diffusion models. Built upon LLaDA, a large language diffusion model, LLaDA-V incorporates a vision encoder and an MLP connector to project visual features into the language embedding space, enabling effective multimodal alignment. This design represents a departure from the autoregressive paradigms dominant in current multimodal approaches, aiming to overcome existing limitations while maintaining data efficiency and scalability. LLaDA-V employs a masked diffusion process where text responses are gradually refined through iterative prediction of masked tokens. Unlike autoregressive models that predict tokens sequentially, LLaDA-V generates outputs by reversing the masked diffusion process. The model is trained in three stages: the first stage aligns vision and language embeddings by mapping visual features from SigLIP2 into LLaDA’s language space. The second stage fine-tunes the model using 10 million single-image samples and 2 million multimodal samples from MAmmoTH-VL. The third stage focuses on reasoning, using 900K QA pairs from VisualWebInstruct and a mixed dataset strategy. Bidirectional attention improves context comprehension, enabling robust multimodal understanding. In evaluations across 18 multimodal tasks, LLaDA-V demonstrated superior performance compared to hybrid autoregressive-diffusion and purely diffusion-based models. It outperformed LLaMA3-V on most multidisciplinary knowledge and mathematical reasoning tasks like MMMU, MMMU-Pro, and MMStar, achieving a score of 60.1 on MMStar, close to Qwen2-VL’s 60.7, despite LLaDA-V using the weaker LLaDA-8B language tower. LLaDA-V also excelled in data efficiency, outperforming LLaMA3-V on MMMU-Pro with 1M samples against LLaMA3-V’s 9M. Although it lagged in chart and document understanding benchmarks, such as AI2D, and in real-world scene tasks, like RealworldQA, LLaDA-V’s results highlight its promise for multimodal tasks. In summary, LLaDA-V addresses the challenges of building effective multimodal models by introducing a purely diffusion-based architecture that combines visual instruction tuning with masked diffusion. The approach offers strong multimodal reasoning capabilities while maintaining data efficiency. This work demonstrates the potential of diffusion models in multimodal AI, paving the way for further exploration of probabilistic approaches to complex AI tasks. Check out the Paper and GitHub Page . All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. The post This AI Paper Introduces LLaDA-V: A Purely Diffusion-Based Multimodal Large Language Model for Visual Instruction Tuning and Multimodal Reasoning appeared first on MarkTechPost.

This AI Paper Introduces LLaDA-V: A Purely Diffusion-Based Multimodal Large Language Model for Visual Instruction Tuning and Multimodal Reasoning Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Meta Releases Llama Prompt Ops: A Python Package that Automatically Optimizes Prompts for Llama Models

The growing adoption of open-source large language models such as Llama has introduced new integration challenges for teams previously relying on proprietary systems like OpenAI’s GPT or Anthropic’s Claude. While performance benchmarks for Llama are increasingly competitive, discrepancies in prompt formatting and system message handling often result in degraded output quality when existing prompts are reused without modification. To address this issue, Meta has introduced Llama Prompt Ops, a Python-based toolkit designed to streamline the migration and adaptation of prompts originally constructed for closed models. Now available on GitHub, the toolkit programmatically adjusts and evaluates prompts to align with Llama’s architecture and conversational behavior, minimizing the need for manual experimentation. Prompt engineering remains a central bottleneck in deploying LLMs effectively. Prompts tailored to the internal mechanics of GPT or Claude frequently do not transfer well to Llama, due to differences in how these models interpret system messages, handle user roles, and process context tokens. The result is often unpredictable degradation in task performance. Llama Prompt Ops addresses this mismatch with a utility that automates the transformation process. It operates on the assumption that prompt format and structure can be systematically restructured to match the operational semantics of Llama models, enabling more consistent behavior without retraining or extensive manual tuning. Core Capabilities The toolkit introduces a structured pipeline for prompt adaptation and evaluation, comprising the following components: Automated Prompt Conversion:Llama Prompt Ops parses prompts designed for GPT, Claude, and Gemini, and reconstructs them using model-aware heuristics to better suit Llama’s conversational format. This includes reformatting system instructions, token prefixes, and message roles. Template-Based Fine-Tuning:By providing a small set of labeled query-response pairs (minimum ~50 examples), users can generate task-specific prompt templates. These are optimized through lightweight heuristics and alignment strategies to preserve intent and maximize compatibility with Llama. Quantitative Evaluation Framework:The tool generates side-by-side comparisons of original and optimized prompts, using task-level metrics to assess performance differences. This empirical approach replaces trial-and-error methods with measurable feedback. Together, these functions reduce the cost of prompt migration and provide a consistent methodology for evaluating prompt quality across LLM platforms. Workflow and Implementation Llama Prompt Ops is structured for ease of use with minimal dependencies. The optimization workflow is initiated using three inputs: A YAML configuration file specifying the model and evaluation parameters A JSON file containing prompt examples and expected completions A system prompt, typically designed for a closed model The system applies transformation rules and evaluates outcomes using a defined metric suite. The entire optimization cycle can be completed within approximately five minutes, enabling iterative refinement without the overhead of external APIs or model retraining. Importantly, the toolkit supports reproducibility and customization, allowing users to inspect, modify, or extend transformation templates to fit specific application domains or compliance constraints. Implications and Applications For organizations transitioning from proprietary to open models, Llama Prompt Ops offers a practical mechanism to maintain application behavior consistency without reengineering prompts from scratch. It also supports development of cross-model prompting frameworks by standardizing prompt behavior across different architectures. By automating a previously manual process and providing empirical feedback on prompt revisions, the toolkit contributes to a more structured approach to prompt engineering—a domain that remains under-explored relative to model training and fine-tuning. Conclusion Llama Prompt Ops represents a targeted effort by Meta to reduce friction in the prompt migration process and improve alignment between prompt formats and Llama’s operational semantics. Its utility lies in its simplicity, reproducibility, and focus on measurable outcomes, making it a relevant addition for teams deploying or evaluating Llama in real-world settings. Check out the GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. The post Meta Releases Llama Prompt Ops: A Python Package that Automatically Optimizes Prompts for Llama Models appeared first on MarkTechPost.

Meta Releases Llama Prompt Ops: A Python Package that Automatically Optimizes Prompts for Llama Models Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Middle-Layer Representation Alignment for Cross-Lingual Transfer in Fine-Tuned LLMs

arXiv:2502.14830v2 Announce Type: replace Abstract: While large language models demonstrate remarkable capabilities at task-specific applications through fine-tuning, extending these benefits across diverse languages is essential for broad accessibility. However, effective cross-lingual transfer is hindered by LLM performance gaps across languages and the scarcity of fine-tuning data in many languages. Through analysis of LLM internal representations from over 1,000+ language pairs, we discover that middle layers exhibit the strongest potential for cross-lingual alignment. Building on this finding, we propose a middle-layer alignment objective integrated into task-specific training. Our experiments on slot filling, machine translation, and structured text generation show consistent improvements in cross-lingual transfer, especially to lower-resource languages. The method is robust to the choice of alignment languages and generalizes to languages unseen during alignment. Furthermore, we show that separately trained alignment modules can be merged with existing task-specific modules, improving cross-lingual capabilities without full re-training. Our code is publicly available (https://github.com/dannigt/mid-align).

Middle-Layer Representation Alignment for Cross-Lingual Transfer in Fine-Tuned LLMs Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Dynamic Context-Aware Streaming Pretrained Language Model For Inverse Text Normalization

arXiv:2505.24229v1 Announce Type: new Abstract: Inverse Text Normalization (ITN) is crucial for converting spoken Automatic Speech Recognition (ASR) outputs into well-formatted written text, enhancing both readability and usability. Despite its importance, the integration of streaming ITN within streaming ASR remains largely unexplored due to challenges in accuracy, efficiency, and adaptability, particularly in low-resource and limited-context scenarios. In this paper, we introduce a streaming pretrained language model for ITN, leveraging pretrained linguistic representations for improved robustness. To address streaming constraints, we propose Dynamic Context-Aware during training and inference, enabling adaptive chunk size adjustments and the integration of right-context information. Experimental results demonstrate that our method achieves accuracy comparable to non-streaming ITN and surpasses existing streaming ITN models on a Vietnamese dataset, all while maintaining low latency, ensuring seamless integration into ASR systems.

Dynamic Context-Aware Streaming Pretrained Language Model For Inverse Text Normalization Leggi l'articolo »

it_IT