YouZum

Committee

AI, Committee, ข่าว, Uncategorized

A Survey of Context Engineering for Large Language Models

arXiv:2507.13334v1 Announce Type: new Abstract: The performance of Large Language Models (LLMs) is fundamentally determined by the contextual information provided during inference. This survey introduces Context Engineering, a formal discipline that transcends simple prompt design to encompass the systematic optimization of information payloads for LLMs. We present a comprehensive taxonomy decomposing Context Engineering into its foundational components and the sophisticated implementations that integrate them into intelligent systems. We first examine the foundational components: context retrieval and generation, context processing and context management. We then explore how these components are architecturally integrated to create sophisticated system implementations: retrieval-augmented generation (RAG), memory systems and tool-integrated reasoning, and multi-agent systems. Through this systematic analysis of over 1300 research papers, our survey not only establishes a technical roadmap for the field but also reveals a critical research gap: a fundamental asymmetry exists between model capabilities. While current models, augmented by advanced context engineering, demonstrate remarkable proficiency in understanding complex contexts, they exhibit pronounced limitations in generating equally sophisticated, long-form outputs. Addressing this gap is a defining priority for future research. Ultimately, this survey provides a unified framework for both researchers and engineers advancing context-aware AI.

A Survey of Context Engineering for Large Language Models Read Post »

AI, Committee, ข่าว, Uncategorized

UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning

arXiv:2502.15082v2 Announce Type: replace-cross Abstract: User specifications or legal frameworks often require information to be removed from pretrained models, including large language models (LLMs). This requires deleting or “forgetting” a set of data points from an already-trained model, which typically degrades its performance on other data points. Thus, a balance must be struck between removing information and keeping the model’s other abilities intact, with a failure to balance this trade-off leading to poor deletion or an unusable model. To this end, we propose UPCORE (Utility-Preserving Coreset Selection), a method-agnostic data selection framework for mitigating collateral damage during unlearning. Finding that the model damage is correlated with the variance of the model’s representations on the forget set, we selectively prune the forget set to remove outliers, thereby minimizing model degradation after unlearning. Across three standard unlearning methods, UPCORE consistently achieves a superior balance between the competing objectives of deletion efficacy and model preservation. To better evaluate this trade-off, we introduce a new metric, measuring the area-under-the-curve (AUC) across standard metrics. Our results show that UPCORE improves both standard metrics and AUC, benefiting from positive transfer between the coreset and pruned points while reducing negative transfer from the forget set to points outside of it.

UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning Read Post »

AI, Committee, ข่าว, Uncategorized

Automatically assessing oral narratives of Afrikaans and isiXhosa children

arXiv:2507.13205v1 Announce Type: new Abstract: Developing narrative and comprehension skills in early childhood is critical for later literacy. However, teachers in large preschool classrooms struggle to accurately identify students who require intervention. We present a system for automatically assessing oral narratives of preschool children in Afrikaans and isiXhosa. The system uses automatic speech recognition followed by a machine learning scoring model to predict narrative and comprehension scores. For scoring predicted transcripts, we compare a linear model to a large language model (LLM). The LLM-based system outperforms the linear model in most cases, but the linear system is competitive despite its simplicity. The LLM-based system is comparable to a human expert in flagging children who require intervention. We lay the foundation for automatic oral assessments in classrooms, giving teachers extra capacity to focus on personalised support for children’s learning.

Automatically assessing oral narratives of Afrikaans and isiXhosa children Read Post »

AI, Committee, ข่าว, Uncategorized

Exploiting Adaptive Contextual Masking for Aspect-Based Sentiment Analysis

arXiv:2402.13722v2 Announce Type: replace Abstract: Aspect-Based Sentiment Analysis (ABSA) is a fine-grained linguistics problem that entails the extraction of multifaceted aspects, opinions, and sentiments from the given text. Both standalone and compound ABSA tasks have been extensively used in the literature to examine the nuanced information present in online reviews and social media posts. Current ABSA methods often rely on static hyperparameters for attention-masking mechanisms, which can struggle with context adaptation and may overlook the unique relevance of words in varied situations. This leads to challenges in accurately analyzing complex sentences containing multiple aspects with differing sentiments. In this work, we present adaptive masking methods that remove irrelevant tokens based on context to assist in Aspect Term Extraction and Aspect Sentiment Classification subtasks of ABSA. We show with our experiments that the proposed methods outperform the baseline methods in terms of accuracy and F1 scores on four benchmark online review datasets. Further, we show that the proposed methods can be extended with multiple adaptations and demonstrate a qualitative analysis of the proposed approach using sample text for aspect term extraction.

Exploiting Adaptive Contextual Masking for Aspect-Based Sentiment Analysis Read Post »

AI, Committee, ข่าว, Uncategorized

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

arXiv:2507.06261v3 Announce Type: replace Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving.

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities Read Post »

AI, Committee, ข่าว, Uncategorized

AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles

arXiv:2507.11764v1 Announce Type: new Abstract: This paper presents AI Wizards’ participation in the CLEF 2025 CheckThat! Lab Task 1: Subjectivity Detection in News Articles, classifying sentences as subjective/objective in monolingual, multilingual, and zero-shot settings. Training/development datasets were provided for Arabic, German, English, Italian, and Bulgarian; final evaluation included additional unseen languages (e.g., Greek, Romanian, Polish, Ukrainian) to assess generalization. Our primary strategy enhanced transformer-based classifiers by integrating sentiment scores, derived from an auxiliary model, with sentence representations, aiming to improve upon standard fine-tuning. We explored this sentiment-augmented architecture with mDeBERTaV3-base, ModernBERT-base (English), and Llama3.2-1B. To address class imbalance, prevalent across languages, we employed decision threshold calibration optimized on the development set. Our experiments show sentiment feature integration significantly boosts performance, especially subjective F1 score. This framework led to high rankings, notably 1st for Greek (Macro F1 = 0.51).

AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles Read Post »

AI, Committee, ข่าว, Uncategorized

Multi-domain Multilingual Sentiment Analysis in Industry: Predicting Aspect-based Opinion Quadruples

arXiv:2505.10389v2 Announce Type: replace Abstract: This paper explores the design of an aspect-based sentiment analysis system using large language models (LLMs) for real-world use. We focus on quadruple opinion extraction — identifying aspect categories, sentiment polarity, targets, and opinion expressions from text data across different domains and languages. We investigate whether a single fine-tuned model can effectively handle multiple domain-specific taxonomies simultaneously. We demonstrate that a combined multi-domain model achieves performance comparable to specialized single-domain models while reducing operational complexity. We also share lessons learned for handling non-extractive predictions and evaluating various failure modes when developing LLM-based systems for structured prediction tasks.

Multi-domain Multilingual Sentiment Analysis in Industry: Predicting Aspect-based Opinion Quadruples Read Post »

AI, Committee, ข่าว, Uncategorized

Partitioner Guided Modal Learning Framework

arXiv:2507.11661v1 Announce Type: new Abstract: Multimodal learning benefits from multiple modal information, and each learned modal representations can be divided into uni-modal that can be learned from uni-modal training and paired-modal features that can be learned from cross-modal interaction. Building on this perspective, we propose a partitioner-guided modal learning framework, PgM, which consists of the modal partitioner, uni-modal learner, paired-modal learner, and uni-paired modal decoder. Modal partitioner segments the learned modal representation into uni-modal and paired-modal features. Modal learner incorporates two dedicated components for uni-modal and paired-modal learning. Uni-paired modal decoder reconstructs modal representation based on uni-modal and paired-modal features. PgM offers three key benefits: 1) thorough learning of uni-modal and paired-modal features, 2) flexible distribution adjustment for uni-modal and paired-modal representations to suit diverse downstream tasks, and 3) different learning rates across modalities and partitions. Extensive experiments demonstrate the effectiveness of PgM across four multimodal tasks and further highlight its transferability to existing models. Additionally, we visualize the distribution of uni-modal and paired-modal features across modalities and tasks, offering insights into their respective contributions.

Partitioner Guided Modal Learning Framework Read Post »

AI, Committee, ข่าว, Uncategorized

NeuralOS: A Generative Framework for Simulating Interactive Operating System Interfaces

Transforming Human-Computer Interaction with Generative Interfaces Recent advances in generative models are transforming the way we interact with computers, making experiences more natural, adaptive, and personalized. Early interfaces, command-line tools, and static menus were fixed and required users to adapt to the machine. Now, with the rise of LLMs and multimodal AI, users can engage with systems using everyday language, images, and even video. Newer models are even capable of simulating dynamic environments, such as those found in video games, in real-time. These trends point toward a future where computer interfaces aren’t just responsive, they’re generative, tailoring themselves to our goals, preferences, and the evolving context around us. Evolution of Generative Models for Simulating Environments Recent generative modeling approaches have made significant progress in simulating interactive environments. Early models, such as World Models, utilized latent variables to simulate reinforcement learning tasks, while GameGAN and Genie enabled the imitation of interactive games and the creation of playable 2D worlds. Diffusion-based models have further advanced this field, with tools like GameNGen, MarioVGG, DIAMOND, and GameGen-X simulating iconic and open-world games with remarkable fidelity. Beyond gaming, models such as UniSim simulate real-world scenarios, and Pandora allows video generation controlled by natural language prompts. While these efforts excel at dynamic, visually rich simulations, simulating subtle GUI transitions and precise user input, such as cursor movement, remains a unique and complex challenge. Introducing NeuralOS: A Diffusion-RNN Based OS Simulator Researchers from the University of Waterloo and the National Research Council Canada have introduced NeuralOS. This neural framework simulates operating system interfaces by directly generating screen frames from user inputs, such as mouse movements, clicks, and keystrokes. NeuralOS combines a recurrent neural network to track system state with a diffusion-based renderer to produce realistic GUI images. Trained on large-scale Ubuntu XFCE interaction data, it accurately models application launches and cursor behavior, although fine-grained keyboard input remains a challenge. NeuralOS marks a step toward adaptive, generative user interfaces that could eventually replace traditional static menus with more intuitive, AI-driven interaction. Architectural Design and Training Pipeline of NeuralOS NeuralOS is built on a modular design that mimics the separation of internal logic and GUI rendering found in traditional operating systems. It uses a hierarchical RNN to track user-driven state changes and a latent-space diffusion model to generate screen visuals. User inputs, such as cursor movements and key presses, are encoded and processed by the RNN, which maintains system memory over time. The renderer then uses these outputs and spatial cursor maps to produce realistic frames. Training involves multiple stages, including pretraining the RNN, joint training, scheduled sampling, and context extension, to handle long-term dependencies, reduce errors, and adapt effectively to real user interactions. Evaluation and Accuracy of Simulated GUI Transitions Due to the high training costs, the NeuralOS team evaluated smaller variants and ablations using a curated set of 730 examples. To assess how well the model localizes the cursor, they trained a regression model. They found that NeuralOS predicted cursor positions with great accuracy within approximately 1.5 pixels, far outperforming models without spatial encoding. For state transitions such as opening apps, NeuralOS achieved 37.7% accuracy across 73 challenging transition types, significantly outperforming the baseline. Ablation studies revealed that removing joint training resulted in blurry outputs and missing cursors, whereas skipping scheduled sampling led to a rapid decline in prediction quality over time. Conclusion: Toward Fully Generative Operating Systems In conclusion, NeuralOS is a framework that simulates operating system interfaces using generative models. It blends an RNN to track system states with a diffusion model that renders screen images based on user actions. Trained on Ubuntu desktop interactions, NeuralOS can generate realistic screen sequences and predict mouse behavior; however, handling detailed keyboard input remains challenging. While the model shows promise, it’s limited by its low resolution, slow speed (1.8 fps), and inability to perform complex OS tasks, such as installing software or accessing the internet. Future work may focus on language-driven controls, better performance, and expanding functionality beyond current OS boundaries. Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Ready to connect with 1 Million+ AI Devs/Engineers/Researchers? See how NVIDIA, LG AI Research, and top AI companies leverage MarkTechPost to reach their target audience [Learn More] The post NeuralOS: A Generative Framework for Simulating Interactive Operating System Interfaces appeared first on MarkTechPost.

NeuralOS: A Generative Framework for Simulating Interactive Operating System Interfaces Read Post »

th