YouZum

Noticias

AI, Committee, Noticias, Uncategorized

Analytic Subspace Routing: How Recursive Least Squares Works in Continual Learning of Large Language Model

arXiv:2503.13575v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) possess encompassing capabilities that can process diverse language-related tasks. However, finetuning on LLMs will diminish this general skills and continual finetuning will further cause severe degradation on accumulated knowledge. Recently, Continual Learning (CL) in Large Language Models (LLMs) arises which aims to continually adapt the LLMs to new tasks while maintaining previously learned knowledge and inheriting general skills. Existing techniques either leverage previous data to replay, leading to extra computational costs, or utilize a single parameter-efficient module to learn the downstream task, constraining new knowledge absorption with interference between different tasks. Toward these issues, this paper proposes Analytic Subspace Routing(ASR) to address these challenges. For each task, we isolate the learning within a subspace of deep layers’ features via low-rank adaptation, eliminating knowledge interference between different tasks. Additionally, we propose an analytic routing mechanism to properly utilize knowledge learned in different subspaces. Our approach employs Recursive Least Squares to train a multi-task router model, allowing the router to dynamically adapt to incoming data without requiring access to historical data. Also, the router effectively assigns the current task to an appropriate subspace and has a non-forgetting property of previously learned tasks with a solid theoretical guarantee. Experimental results demonstrate that our method achieves near-perfect retention of prior knowledge while seamlessly integrating new information, effectively overcoming the core limitations of existing methods. Our code will be released after acceptance.

Analytic Subspace Routing: How Recursive Least Squares Works in Continual Learning of Large Language Model Leer entrada »

AI, Committee, Noticias, Uncategorized

Remember Past, Anticipate Future: Learning Continual Multimodal Misinformation Detectors

arXiv:2507.05939v1 Announce Type: new Abstract: Nowadays, misinformation articles, especially multimodal ones, are widely spread on social media platforms and cause serious negative effects. To control their propagation, Multimodal Misinformation Detection (MMD) becomes an active topic in the community to automatically identify misinformation. Previous MMD methods focus on supervising detectors by collecting offline data. However, in real-world scenarios, new events always continually emerge, making MMD models trained on offline data consistently outdated and ineffective. To address this issue, training MMD models under online data streams is an alternative, inducing an emerging task named continual MMD. Unfortunately, it is hindered by two major challenges. First, training on new data consistently decreases the detection performance on past data, named past knowledge forgetting. Second, the social environment constantly evolves over time, affecting the generalization on future data. To alleviate these challenges, we propose to remember past knowledge by isolating interference between event-specific parameters with a Dirichlet process-based mixture-of-expert structure, and anticipate future environmental distributions by learning a continuous-time dynamics model. Accordingly, we induce a new continual MMD method DAEDCMD. Extensive experiments demonstrate that DAEDCMD can consistently and significantly outperform the compared methods, including six MMD baselines and three continual learning methods.

Remember Past, Anticipate Future: Learning Continual Multimodal Misinformation Detectors Leer entrada »

AI, Committee, Noticias, Uncategorized

Narrowing the Gap: Supervised Fine-Tuning of Open-Source LLMs as a Viable Alternative to Proprietary Models for Pedagogical Tools

arXiv:2507.05305v1 Announce Type: cross Abstract: Frontier Large language models (LLMs) like ChatGPT and Gemini can decipher cryptic compiler errors for novice programmers, but their computational scale, cost, and tendency to over-assist make them problematic for widespread pedagogical adoption. This work demonstrates that smaller, specialised language models, enhanced via Supervised Fine-Tuning (SFT), present a more viable alternative for educational tools. We utilise a new dataset of 40,000 C compiler error explanations, derived from real introductory programming (CS1/2) student-generated programming errors, which we used to fine-tune three open-source models: Qwen3-4B, Llama-3.1-8B, and Qwen3-32B. We performed a dual evaluation, combining expert human reviews with a large-scale automated analysis of 8,000 responses using a validated LLM-as-judge ensemble. Our results show that SFT significantly boosts the pedagogical quality of smaller models, achieving performance comparable to much larger models. We analyse the trade-offs between model size and quality, confirming that fine-tuning compact, efficient models on high-quality, domain-specific data is a potent strategy for creating specialised models to drive educational tools. We provide a replicable methodology to foster broader access to generative AI capabilities in educational contexts.

Narrowing the Gap: Supervised Fine-Tuning of Open-Source LLMs as a Viable Alternative to Proprietary Models for Pedagogical Tools Leer entrada »

AI, Committee, Noticias, Uncategorized

Beyond Weaponization: NLP Security for Medium and Lower-Resourced Languages in Their Own Right

arXiv:2507.03473v1 Announce Type: new Abstract: Despite mounting evidence that multilinguality can be easily weaponized against language models (LMs), works across NLP Security remain overwhelmingly English-centric. In terms of securing LMs, the NLP norm of “English first” collides with standard procedure in cybersecurity, whereby practitioners are expected to anticipate and prepare for worst-case outcomes. To mitigate worst-case outcomes in NLP Security, researchers must be willing to engage with the weakest links in LM security: lower-resourced languages. Accordingly, this work examines the security of LMs for lower- and medium-resourced languages. We extend existing adversarial attacks for up to 70 languages to evaluate the security of monolingual and multilingual LMs for these languages. Through our analysis, we find that monolingual models are often too small in total number of parameters to ensure sound security, and that while multilinguality is helpful, it does not always guarantee improved security either. Ultimately, these findings highlight important considerations for more secure deployment of LMs, for communities of lower-resourced languages.

Beyond Weaponization: NLP Security for Medium and Lower-Resourced Languages in Their Own Right Leer entrada »

AI, Committee, Noticias, Uncategorized

Towards Understanding the Cognitive Habits of Large Reasoning Models

arXiv:2506.21571v2 Announce Type: replace Abstract: Large Reasoning Models (LRMs), which autonomously produce a reasoning Chain of Thought (CoT) before producing final responses, offer a promising approach to interpreting and monitoring model behaviors. Inspired by the observation that certain CoT patterns — e.g., “Wait, did I miss anything?” — consistently emerge across tasks, we explore whether LRMs exhibit human-like cognitive habits. Building on Habits of Mind, a well-established framework of cognitive habits associated with successful human problem-solving, we introduce CogTest, a principled benchmark designed to evaluate LRMs’ cognitive habits. CogTest includes 16 cognitive habits, each instantiated with 25 diverse tasks, and employs an evidence-first extraction method to ensure reliable habit identification. With CogTest, we conduct a comprehensive evaluation of 16 widely used LLMs (13 LRMs and 3 non-reasoning ones). Our findings reveal that LRMs, unlike conventional LLMs, not only exhibit human-like habits but also adaptively deploy them according to different tasks. Finer-grained analyses further uncover patterns of similarity and difference in LRMs’ cognitive habit profiles, particularly certain inter-family similarity (e.g., Qwen-3 models and DeepSeek-R1). Extending the study to safety-related tasks, we observe that certain habits, such as Taking Responsible Risks, are strongly associated with the generation of harmful responses. These findings suggest that studying persistent behavioral patterns in LRMs’ CoTs is a valuable step toward deeper understanding of LLM misbehavior. The code is available at: https://github.com/jianshuod/CogTest.

Towards Understanding the Cognitive Habits of Large Reasoning Models Leer entrada »

AI, Committee, Noticias, Uncategorized

Self-Consistency Preference Optimization

arXiv:2411.04109v3 Announce Type: replace Abstract: Self-alignment, whereby models learn to improve themselves without human annotation, is a rapidly growing research area. However, existing techniques often fail to improve complex reasoning tasks due to the difficulty of assigning correct rewards. An orthogonal approach that is known to improve correctness is self-consistency, a method applied at inference time based on multiple sampling in order to find the most consistent answer. In this work, we extend the self-consistency concept to help train models. We thus introduce self-consistency preference optimization (ScPO), which iteratively trains consistent answers to be preferred over inconsistent ones on unsupervised new problems. We show ScPO leads to large improvements over conventional reward model training on reasoning tasks such as GSM8K and MATH, closing the gap with supervised training with gold answers or preferences, and that combining ScPO with standard supervised learning improves results even further. On ZebraLogic, ScPO finetunes Llama-3 8B to be superior to Llama-3 70B, Gemma-2 27B, and Claude-3 Haiku.

Self-Consistency Preference Optimization Leer entrada »

AI, Committee, Noticias, Uncategorized

Demystifying ChatGPT: How It Masters Genre Recognition

arXiv:2507.03875v1 Announce Type: new Abstract: The introduction of ChatGPT has garnered significant attention within the NLP community and beyond. Previous studies have demonstrated ChatGPT’s substantial advancements across various downstream NLP tasks, highlighting its adaptability and potential to revolutionize language-related applications. However, its capabilities and limitations in genre prediction remain unclear. This work analyzes three Large Language Models (LLMs) using the MovieLens-100K dataset to assess their genre prediction capabilities. Our findings show that ChatGPT, without fine-tuning, outperformed other LLMs, and fine-tuned ChatGPT performed best overall. We set up zero-shot and few-shot prompts using audio transcripts/subtitles from movie trailers in the MovieLens-100K dataset, covering 1682 movies of 18 genres, where each movie can have multiple genres. Additionally, we extended our study by extracting IMDb movie posters to utilize a Vision Language Model (VLM) with prompts for poster information. This fine-grained information was used to enhance existing LLM prompts. In conclusion, our study reveals ChatGPT’s remarkable genre prediction capabilities, surpassing other language models. The integration of VLM further enhances our findings, showcasing ChatGPT’s potential for content-related applications by incorporating visual information from movie posters.

Demystifying ChatGPT: How It Masters Genre Recognition Leer entrada »

AI, Committee, Noticias, Uncategorized

Improving Social Determinants of Health Documentation in French EHRs Using Large Language Models

arXiv:2507.03433v1 Announce Type: new Abstract: Social determinants of health (SDoH) significantly influence health outcomes, shaping disease progression, treatment adherence, and health disparities. However, their documentation in structured electronic health records (EHRs) is often incomplete or missing. This study presents an approach based on large language models (LLMs) for extracting 13 SDoH categories from French clinical notes. We trained Flan-T5-Large on annotated social history sections from clinical notes at Nantes University Hospital, France. We evaluated the model at two levels: (i) identification of SDoH categories and associated values, and (ii) extraction of detailed SDoH with associated temporal and quantitative information. The model performance was assessed across four datasets, including two that we publicly release as open resources. The model achieved strong performance for identifying well-documented categories such as living condition, marital status, descendants, job, tobacco, and alcohol use (F1 score > 0.80). Performance was lower for categories with limited training data or highly variable expressions, such as employment status, housing, physical activity, income, and education. Our model identified 95.8% of patients with at least one SDoH, compared to 2.8% for ICD-10 codes from structured EHR data. Our error analysis showed that performance limitations were linked to annotation inconsistencies, reliance on English-centric tokenizer, and reduced generalizability due to the model being trained on social history sections only. These results demonstrate the effectiveness of NLP in improving the completeness of real-world SDoH data in a non-English EHR system.

Improving Social Determinants of Health Documentation in French EHRs Using Large Language Models Leer entrada »

AI, Committee, Noticias, Uncategorized

What Is Context Engineering in AI? Techniques, Use Cases, and Why It Matters

Introduction: What is Context Engineering? Context engineering refers to the discipline of designing, organizing, and manipulating the context that is fed into large language models (LLMs) to optimize their performance. Rather than fine-tuning the model weights or architectures, context engineering focuses on the input—the prompts, system instructions, retrieved knowledge, formatting, and even the ordering of information. Context engineering isn’t about crafting better prompts. It’s about building systems that deliver the right context, exactly when it’s needed. Imagine an AI assistant asked to write a performance review.→ Poor Context: It only sees the instruction. The result is vague, generic feedback that lacks insight.→ Rich Context: It sees the instruction plus the employee’s goals, past reviews, project outcomes, peer feedback, and manager notes. The result? A nuanced, data-backed review that feels informed and personalized—because it is. This emerging practice is gaining traction due to the increasing reliance on prompt-based models like GPT-4, Claude, and Mistral. The performance of these models is often less about their size and more about the quality of the context they receive. In this sense, context engineering is the equivalent of prompt programming for the era of intelligent agents and retrieval-augmented generation (RAG). Why Do We Need Context Engineering? Token Efficiency: With context windows expanding but still bounded (e.g., 128K in GPT-4-Turbo), efficient context management becomes crucial. Redundant or poorly structured context wastes valuable tokens. Precision and Relevance: LLMs are sensitive to noise. The more targeted and logically arranged the prompt, the higher the likelihood of accurate output. Retrieval-Augmented Generation (RAG): In RAG systems, external data is fetched in real-time. Context engineering helps decide what to retrieve, how to chunk it, and how to present it. Agentic Workflows: When using tools like LangChain or OpenAgents, autonomous agents rely on context to maintain memory, goals, and tool usage. Bad context leads to failure in planning or hallucination. Domain-Specific Adaptation: Fine-tuning is expensive. Structuring better prompts or building retrieval pipelines lets models perform well in specialized tasks with zero-shot or few-shot learning. Key Techniques in Context Engineering Several methodologies and practices are shaping the field: 1. System Prompt Optimization The system prompt is foundational. It defines the LLM’s behavior and style. Techniques include: Role assignment (e.g., “You are a data science tutor”) Instructional framing (e.g., “Think step-by-step”) Constraint imposition (e.g., “Only output JSON”) 2. Prompt Composition and Chaining LangChain popularized the use of prompt templates and chains to modularize prompting. Chaining allows splitting tasks across prompts—for example, decomposing a question, retrieving evidence, then answering. 3. Context Compression With limited context windows, one can: Use summarization models to compress previous conversation Embed and cluster similar content to remove redundancy Apply structured formats (like tables) instead of verbose prose 4. Dynamic Retrieval and Routing RAG pipelines (like those in LlamaIndex and LangChain) retrieve documents from vector stores based on user intent. Advanced setups include: Query rephrasing or expansion before retrieval Multi-vector routing to choose different sources or retrievers Context re-ranking based on relevance and recency 5. Memory Engineering Short-term memory (what’s in the prompt) and long-term memory (retrievable history) need alignment. Techniques include: Context replay (injecting past relevant interactions) Memory summarization Intent-aware memory selection 6. Tool-Augmented Context In agent-based systems, tool usage is context-aware: Tool description formatting Tool history summarization Observations passed between steps Context Engineering vs. Prompt Engineering While related, context engineering is broader and more system-level. Prompt engineering is typically about static, handcrafted input strings. Context engineering encompasses dynamic context construction using embeddings, memory, chaining, and retrieval. As Simon Willison noted, “Context engineering is what we do instead of fine-tuning.” Real-World Applications Customer Support Agents: Feeding prior ticket summaries, customer profile data, and KB docs. Code Assistants: Injecting repo-specific documentation, previous commits, and function usage. Legal Document Search: Context-aware querying with case history and precedents. Education: Personalized tutoring agents with memory of learner behavior and goals. Challenges in Context Engineering Despite its promise, several pain points remain: Latency: Retrieval and formatting steps introduce overhead. Ranking Quality: Poor retrieval hurts downstream generation. Token Budgeting: Choosing what to include/exclude is non-trivial. Tool Interoperability: Mixing tools (LangChain, LlamaIndex, custom retrievers) adds complexity. Emerging Best Practices Combine structured (JSON, tables) and unstructured text for better parsing. Limit each context injection to a single logical unit (e.g., one document or conversation summary). Use metadata (timestamps, authorship) for better sorting and scoring. Log, trace, and audit context injections to improve over time. The Future of Context Engineering Several trends suggest that context engineering will be foundational in LLM pipelines: Model-Aware Context Adaptation: Future models may dynamically request the type or format of context they need. Self-Reflective Agents: Agents that audit their context, revise their own memory, and flag hallucination risk. Standardization: Similar to how JSON became a universal data interchange format, context templates may become standardized for agents and tools. As Andrej Karpathy hinted in a recent post, “Context is the new weight update.” Rather than retraining models, we are now programming them via their context—making context engineering the dominant software interface in the LLM era. Conclusion Context engineering is no longer optional—it is central to unlocking the full capabilities of modern language models. As toolkits like LangChain and LlamaIndex mature and agentic workflows proliferate, mastering context construction becomes as important as model selection. Whether you’re building a retrieval system, coding agent, or a personalized tutor, how you structure the model’s context will increasingly define its intelligence. Sources: https://x.com/tobi/status/1935533422589399127 https://x.com/karpathy/status/1937902205765607626 https://blog.langchain.com/the-rise-of-context-engineering/ https://rlancemartin.github.io/2025/06/23/context_engineering/ https://www.philschmid.de/context-engineering https://blog.langchain.com/context-engineering-for-agents/ https://www.llamaindex.ai/blog/context-engineering-what-it-is-and-techniques-to-consider Feel free to follow us on Twitter, Youtube and Spotify and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. The post What Is Context Engineering in AI? Techniques, Use Cases, and Why It Matters appeared first on MarkTechPost.

What Is Context Engineering in AI? Techniques, Use Cases, and Why It Matters Leer entrada »

es_ES