YouZum

Committee

AI, Committee, Actualités, Uncategorized

Thinking with Many Minds: Using Large Language Models for Multi-Perspective Problem-Solving

arXiv:2501.02348v2 Announce Type: replace Abstract: Complex problem-solving requires cognitive flexibility–the capacity to entertain multiple perspectives while preserving their distinctiveness. This flexibility replicates the “wisdom of crowds” within a single individual, allowing them to “think with many minds.” While mental simulation enables imagined deliberation, cognitive constraints limit its effectiveness. We propose synthetic deliberation, a Large Language Model (LLM)-based method that simulates discourse between agents embodying diverse perspectives, as a solution. Using a custom GPT-based model, we showcase its benefits: concurrent processing of multiple viewpoints without cognitive degradation, parallel exploration of perspectives, and precise control over viewpoint synthesis. By externalizing the deliberative process and distributing cognitive labor between parallel search and integration, synthetic deliberation transcends mental simulation’s limitations. This approach shows promise for strategic planning, policymaking, and conflict resolution.

Thinking with Many Minds: Using Large Language Models for Multi-Perspective Problem-Solving Lire l’article »

AI, Committee, Actualités, Uncategorized

DeMeVa at LeWiDi-2025: Modeling Perspectives with In-Context Learning and Label Distribution Learning

arXiv:2509.09524v1 Announce Type: new Abstract: This system paper presents the DeMeVa team’s approaches to the third edition of the Learning with Disagreements shared task (LeWiDi 2025; Leonardelli et al., 2025). We explore two directions: in-context learning (ICL) with large language models, where we compare example sampling strategies; and label distribution learning (LDL) methods with RoBERTa (Liu et al., 2019b), where we evaluate several fine-tuning methods. Our contributions are twofold: (1) we show that ICL can effectively predict annotator-specific annotations (perspectivist annotations), and that aggregating these predictions into soft labels yields competitive performance; and (2) we argue that LDL methods are promising for soft label predictions and merit further exploration by the perspectivist community.

DeMeVa at LeWiDi-2025: Modeling Perspectives with In-Context Learning and Label Distribution Learning Lire l’article »

AI, Committee, Actualités, Uncategorized

Efficient Trie-based Biasing using K-step Prediction for Rare Word Recognition

arXiv:2509.09196v1 Announce Type: new Abstract: Contextual biasing improves rare word recognition of ASR models by prioritizing the output of rare words during decoding. A common approach is Trie-based biasing, which gives “bonus scores” to partial hypothesis (e.g. “Bon”) that may lead to the generation of the rare word (e.g. “Bonham”). If the full word (“Bonham”) isn’t ultimately recognized, the system revokes those earlier bonuses. This revocation is limited to beam search and is computationally expensive, particularly for models with large decoders. To overcome these limitations, we propose adapting ASR models to look ahead and predict multiple steps at once. This avoids the revocation step entirely by better estimating whether a partial hypothesis will lead to the generation of the full rare word. By fine-tuning Whisper with only 10 hours of synthetic data, our method reduces the word error rate on the NSC Part 2 test set from 30.86% to 12.19%.

Efficient Trie-based Biasing using K-step Prediction for Rare Word Recognition Lire l’article »

AI, Committee, Actualités, Uncategorized

Optimizing Length Compression in Large Reasoning Models

arXiv:2506.14755v2 Announce Type: replace-cross Abstract: Large Reasoning Models (LRMs) have achieved remarkable success, yet they often suffer from producing unnecessary and verbose reasoning chains. We identify a core aspect of this issue as “invalid thinking” — models tend to repeatedly double-check their work after having derived the correct answer. To address this specific inefficiency, we move beyond the general principles of Efficacy and Efficiency to propose two new, fine-grained principles: Brevity, which advocates for eliminating redundancy, and Sufficiency, which ensures critical reasoning steps are preserved. Guided by these principles, we introduce LC-R1, a post-training method based on Group Relative Policy Optimization (GRPO). LC-R1 employs a novel combination of a Length Reward for overall conciseness and a Compress Reward that is specifically designed to remove the invalid portion of the thinking process. Extensive experiments on multiple reasoning benchmarks demonstrate that LC-R1 achieves a significant reduction in sequence length (~50%) with only a marginal (~2%) drop in accuracy, achieving a favorable trade-off point on the Pareto frontier that prioritizes high compression. Our analysis further validates the robustness of LC-R1 and provides valuable insights for developing more powerful yet computationally efficient LRMs. Our code is released at https://github.com/zxiangx/LC-R1.

Optimizing Length Compression in Large Reasoning Models Lire l’article »

AI, Committee, Actualités, Uncategorized

Meet mmBERT: An Encoder-only Language Model Pretrained on 3T Tokens of Multilingual Text in over 1800 Languages and 2–4× Faster than Previous Models

Table of contents Why was a new multilingual encoder needed? Understanding the architecture of mmBERT What training data and phases were used? What new training strategies were introduced? How does mmBERT perform on benchmarks? How does mmBERT handle low-resource languages? What efficiency gains does mmBERT achieve? Summary Why was a new multilingual encoder needed? XLM-RoBERTa (XLM-R) has dominated multilingual NLP for more than 5 years, an unusually long reign in AI research. While encoder-only models like BERT and RoBERTa were central to early progress, most research energy shifted toward decoder-based generative models. Encoders, however, remain more efficient and often outperform decoders on embedding, retrieval, and classification tasks. Despite this, multilingual encoder development stalled. A team of researchers from Johns Hopkins University propose mmBERT that addresses this gap by delivering a modern encoder, surpassesing XLM-R and rivals recent large-scale models such as OpenAI’s o3 and Google’s Gemini 2.5 Pro. Understanding the architecture of mmBERT mmBERT comes in two main configurations: Base model: 22 transformer layers, 1152 hidden dimension, ~307M parameters (110M non-embedding). Small model: ~140M parameters (42M non-embedding). It adopts the Gemma 2 tokenizer with a 256k vocabulary, rotary position embeddings (RoPE), and FlashAttention2 for efficiency. Sequence length is extended from 1024 to 8192 tokens, using unpadded embeddings and sliding-window attention. This allows mmBERT to process contexts nearly an order of magnitude longer than XLM-R while maintaining faster inference. What training data and phases were used? mmBERT was trained on 3 trillion tokens spanning 1,833 languages. Data sources include FineWeb2, Dolma, MegaWika v2, ProLong, StarCoder, and others. English makes up only ~10–34% of the corpus depending on the phase. Training was done in three stages: Pre-training: 2.3T tokens across 60 languages and code. Mid-training: 600B tokens across 110 languages, focused on higher-quality sources. Decay phase: 100B tokens covering 1,833 languages, emphasizing low-resource adaptation. What new training strategies were introduced? Three main innovations drive mmBERT’s performance: Annealed Language Learning (ALL): Languages are introduced gradually (60 → 110 → 1833). Sampling distributions are annealed from high-resource to uniform, ensuring low-resource languages gain influence during later stages without overfitting limited data. Inverse Masking Schedule: The masking ratio starts at 30% and decays to 5%, encouraging coarse-grained learning early and fine-grained refinements later. Model Merging Across Decay Variants: Multiple decay-phase models (English-heavy, 110-language, and 1833-language) are combined via TIES merging, leveraging complementary strengths without retraining from scratch. How does mmBERT perform on benchmarks? English NLU (GLUE): mmBERT base achieves 86.3, surpassing XLM-R (83.3) and nearly matching ModernBERT (87.4), despite allocating >75% of training to non-English data. Multilingual NLU (XTREME): mmBERT base scores 72.8 vs. XLM-R’s 70.4, with gains in classification and QA tasks. Embedding tasks (MTEB v2): mmBERT base ties ModernBERT in English (53.9 vs. 53.8) and leads in multilingual (54.1 vs. 52.4 for XLM-R). Code retrieval (CoIR): mmBERT outperforms XLM-R by ~9 points, though EuroBERT remains stronger on proprietary data. How does mmBERT handle low-resource languages? The annealed learning schedule ensures that low-resource languages benefit during later training. On benchmarks like Faroese FoQA and Tigrinya TiQuAD, mmBERT significantly outperforms both o3 and Gemini 2.5 Pro. These results demonstrate that encoder models, if trained carefully, can generalize effectively even in extreme low-resource scenarios. What efficiency gains does mmBERT achieve? mmBERT is 2–4× faster than XLM-R and MiniLM while supporting 8192-token inputs. Notably, it remains faster at 8192 tokens than older encoders were at 512 tokens. This speed boost derives from the ModernBERT training recipe, efficient attention mechanisms, and optimized embeddings. Summary mmBERT comes as the long-overdue replacement for XLM-R, redefining what a multilingual encoder can deliver. It runs 2–4× faster, handles sequences up to 8K tokens, and outperforms prior models on both high-resource benchmarks and low-resource languages that were underserved in the past. Its training recipe—3 trillion tokens paired with annealed language learning, inverse masking, and model merging—shows how careful design can unlock broad generalization without excessive redundancy. The result is an open, efficient, and scalable encoder that not only fills the six-year gap since XLM-R but also provides a robust foundation for the next generation of multilingual NLP systems. Check out the Paper, Model on Hugging Face, GitHub and Technical details. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. The post Meet mmBERT: An Encoder-only Language Model Pretrained on 3T Tokens of Multilingual Text in over 1800 Languages and 2–4× Faster than Previous Models appeared first on MarkTechPost.

Meet mmBERT: An Encoder-only Language Model Pretrained on 3T Tokens of Multilingual Text in over 1800 Languages and 2–4× Faster than Previous Models Lire l’article »

AI, Committee, Actualités, Uncategorized

Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models

arXiv:2502.06130v2 Announce Type: replace-cross Abstract: While recent Large Vision-Language Models (LVLMs) have shown remarkable performance in multi-modal tasks, they are prone to generating hallucinatory text responses that do not align with the given visual input, which restricts their practical applicability in real-world scenarios. In this work, inspired by the observation that the text-to-image generation process is the inverse of image-conditioned response generation in LVLMs, we explore the potential of leveraging text-to-image generative models to assist in mitigating hallucinations in LVLMs. We discover that generative models can offer valuable self-feedback for mitigating hallucinations at both the response and token levels. Building on this insight, we introduce self-correcting Decoding with Generative Feedback (DeGF), a novel training-free algorithm that incorporates feedback from text-to-image generative models into the decoding process to effectively mitigate hallucinations in LVLMs. Specifically, DeGF generates an image from the initial response produced by LVLMs, which acts as an auxiliary visual reference and provides self-feedback to verify and correct the initial response through complementary or contrastive decoding. Extensive experimental results validate the effectiveness of our approach in mitigating diverse types of hallucinations, consistently surpassing state-of-the-art methods across six benchmarks. Code is available at https://github.com/zhangce01/DeGF.

Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models Lire l’article »

AI, Committee, Actualités, Uncategorized

Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation

arXiv:2504.02438v5 Announce Type: replace Abstract: Long-form video processing fundamentally challenges vision-language models (VLMs) due to the high computational costs of handling extended temporal sequences. Existing token pruning and feature merging methods often sacrifice critical temporal dependencies or dilute semantic information. We introduce differential distillation, a principled approach that systematically preserves task-relevant information while suppressing redundancy. Based on this principle, we develop ViLAMP, a hierarchical video-language model that processes hour-long videos at “mixed precision” through two key mechanisms: (1) differential keyframe selection that maximizes query relevance while maintaining temporal distinctiveness at the frame level and (2) differential feature merging that preserves query-salient features in non-keyframes at the patch level. Hence, ViLAMP retains full information in keyframes while reducing non-keyframes to their most salient features, resembling mixed-precision training. Extensive experiments demonstrate ViLAMP’s superior performance across four video understanding benchmarks, particularly on long-form content. Notably, ViLAMP can process ultra-long videos (up to 10K frames) on a single NVIDIA A100 GPU, achieving substantial computational efficiency while maintaining state-of-the-art performance. Code and model are available at https://github.com/steven-ccq/ViLAMP.

Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation Lire l’article »

AI, Committee, Actualités, Uncategorized

Streaming Sequence-to-Sequence Learning with Delayed Streams Modeling

arXiv:2509.08753v1 Announce Type: new Abstract: We introduce Delayed Streams Modeling (DSM), a flexible formulation for streaming, multimodal sequence-to-sequence learning. Sequence-to-sequence generation is often cast in an offline manner, where the model consumes the complete input sequence before generating the first output timestep. Alternatively, streaming sequence-to-sequence rely on learning a policy for choosing when to advance on the input stream, or write to the output stream. DSM instead models already time-aligned streams with a decoder-only language model. By moving the alignment to a pre-processing step,and introducing appropriate delays between streams, DSM provides streaming inference of arbitrary output sequences, from any input combination, making it applicable to many sequence-to-sequence problems. In particular, given text and audio streams, automatic speech recognition (ASR) corresponds to the text stream being delayed, while the opposite gives a text-to-speech (TTS) model. We perform extensive experiments for these two major sequence-to-sequence tasks, showing that DSM provides state-of-the-art performance and latency while supporting arbitrary long sequences, being even competitive with offline baselines. Code, samples and demos are available at https://github.com/kyutai-labs/delayed-streams-modeling

Streaming Sequence-to-Sequence Learning with Delayed Streams Modeling Lire l’article »

AI, Committee, Actualités, Uncategorized

Scaling Truth: The Confidence Paradox in AI Fact-Checking

arXiv:2509.08803v1 Announce Type: cross Abstract: The rise of misinformation underscores the need for scalable and reliable fact-checking solutions. Large language models (LLMs) hold promise in automating fact verification, yet their effectiveness across global contexts remains uncertain. We systematically evaluate nine established LLMs across multiple categories (open/closed-source, multiple sizes, diverse architectures, reasoning-based) using 5,000 claims previously assessed by 174 professional fact-checking organizations across 47 languages. Our methodology tests model generalizability on claims postdating training cutoffs and four prompting strategies mirroring both citizen and professional fact-checker interactions, with over 240,000 human annotations as ground truth. Findings reveal a concerning pattern resembling the Dunning-Kruger effect: smaller, accessible models show high confidence despite lower accuracy, while larger models demonstrate higher accuracy but lower confidence. This risks systemic bias in information verification, as resource-constrained organizations typically use smaller models. Performance gaps are most pronounced for non-English languages and claims originating from the Global South, threatening to widen existing information inequalities. These results establish a multilingual benchmark for future research and provide an evidence base for policy aimed at ensuring equitable access to trustworthy, AI-assisted fact-checking.

Scaling Truth: The Confidence Paradox in AI Fact-Checking Lire l’article »

AI, Committee, Actualités, Uncategorized

Automatic Detection of Inauthentic Templated Responses in English Language Assessments

arXiv:2509.08355v1 Announce Type: new Abstract: In high-stakes English Language Assessments, low-skill test takers may employ memorized materials called “templates” on essay questions to “game” or fool the automated scoring system. In this study, we introduce the automated detection of inauthentic, templated responses (AuDITR) task, describe a machine learning-based approach to this task and illustrate the importance of regularly updating these models in production.

Automatic Detection of Inauthentic Templated Responses in English Language Assessments Lire l’article »

We use cookies to improve your experience and performance on our website. You can learn more at Politique de confidentialité and manage your privacy settings by clicking Settings.

Privacy Preferences

You can choose your cookie settings by turning on/off each type of cookie as you wish, except for essential cookies.

Allow All
Manage Consent Preferences
  • Always Active

Save
fr_FR