YouZum

Committee

AI, Committee, 新闻, Uncategorized

MemAgent: A Reinforcement Learning Framework Redefining Long-Context Processing in LLMs

Handling extremely long documents remains a persistent challenge for large language models (LLMs). Even with techniques such as length extrapolation and sparse attention, models often suffer from performance degradation and high computational costs. To address this, researchers from ByteDance Seed and Tsinghua University introduce MemAgent, a reinforcement learning-based memory agent designed to enable long-context processing with linear complexity and minimal performance loss. Limitations of Existing Approaches Current solutions for long-context modeling fall into three main categories: Length Extrapolation Methods (e.g., NTK, PI, YaRN, DCA): Extend the context window via positional embedding manipulations. However, they often face performance degradation and scaling issues. Sparse and Linear Attention Mechanisms: Reduce attention complexity to O(n) but typically require retraining from scratch and rely on fixed patterns or human-defined rules. Context Compression: Use token-level or external memory modules to condense long inputs but often disrupt standard generation and struggle with extrapolation. These approaches fail to deliver all three critical attributes: arbitrary input length support, consistent accuracy, and efficient linear complexity. MemAgent: Human-Like Memory Strategy Inspired by how humans summarize key information while ignoring noise, MemAgent processes input as a stream of evidence. At each step, it reads a document chunk and an internal memory, overwriting the latter with updated, compressed context. Key innovations: Fixed-Length Token-Based Memory: Compresses essential information while maintaining model compatibility. Segment-Wise Overwrite Mechanism: Supports infinite text lengths without growing memory. Linear Complexity: Memory update and decoding cost remain constant per chunk. Multi-Conv RL Training with GRPO MemAgent treats each document chunk interaction as an independent dialogue. It is trained via Group Relative Policy Optimization (GRPO) within a multi-conversation RL pipeline called DAPO, enabling reward-driven memory update. Key elements include: Rule-Based Verifier: Calculates outcome rewards by comparing model answers with multiple ground truths. Token-Level RL Signal: Applied uniformly across conversations stemming from a sample. This setup encourages memory compression focused on answer-relevant information and discards distractors. Performance Evaluation Using the RULER benchmark and synthetic datasets from HotpotQA and SQuAD, MemAgent was trained with an 8K context window and extrapolated up to 3.5 million tokens. Model 224K 896K 3.5M Qwen2.5-Instruct-14B-1M 37.5% 0.0% N/A QwenLong-L1-32B 17.2% 11.7% N/A RL-MemAgent-14B 81.3% 77.3% 78.1% MemAgent maintained over 95% accuracy on RULER benchmarks (8K to 512K tokens) and consistently outperformed long-context and distillation-based baselines. Case Study: Multi-Hop QA Given the query “The director of the romantic comedy ‘Big Stone Gap’ is based in what New York city?”, MemAgent progressively tracked relevant content across 3 chunks: Recognized unrelated content but retained location information. Maintained memory against irrelevant chunks. Correctly updated memory upon encountering Adriana Trigiani’s biography. Final answer: Greenwich Village, New York City. Theoretical Foundation and Complexity MemAgent reformulates the autoregressive model using latent memory variables (m₁…mₖ): p(x₁:N) = ∑ₘ₁:ₖ ∏ₖ p(cₖ | mₖ₋₁) * p(mₖ | cₖ, mₖ₋₁) This enables O(N) compute cost and human-readable intermediate memory—unlike attention-based feature compression. RL is essential, as memory updates are discrete and can’t be learned via backpropagation. Conclusion MemAgent offers a scalable and efficient solution to the long-context trilemma: unlimited input length, near-lossless accuracy, and linear complexity. Its RL-based overwrite memory mechanism allows LLMs to read, abstract, and generate over multi-million-token inputs without architectural modification. FAQs Q1: What is MemAgent?MemAgent is a reinforcement learning-based framework that equips LLMs with memory tokens to handle extremely long contexts efficiently. Q2: How is it different from attention or extrapolation methods?Unlike attention-based scaling or extrapolation techniques, MemAgent uses token-based memory updated via reinforcement learning. Q3: What models can MemAgent be applied to?Any Transformer-based LLM. No changes to the model architecture are required. Q4: How does it scale with input size?It maintains linear computational complexity regardless of input length by fixing the memory size. Q5: What are the applications of MemAgent?Long-document QA, agent memory systems, legal document review, scientific literature analysis, and real-time decision-making with large evidence bases. Check out the Paper. All credit for this research goes to the researchers of this project. Sponsorship Opportunity: Reach the most influential AI developers in US and Europe. 1M+ monthly readers, 500K+ community builders, infinite possibilities. [Explore Sponsorship] The post MemAgent: A Reinforcement Learning Framework Redefining Long-Context Processing in LLMs appeared first on MarkTechPost.

MemAgent: A Reinforcement Learning Framework Redefining Long-Context Processing in LLMs Read Post »

AI, Committee, 新闻, Uncategorized

NVIDIA AI Releases OpenReasoning-Nemotron: A Suite of Reasoning-Enhanced LLMs Distilled from DeepSeek R1 0528

NVIDIA AI has introduced OpenReasoning-Nemotron, a family of large language models (LLMs) designed to excel in complex reasoning tasks across mathematics, science, and code. This model suite—comprising 1.5B, 7B, 14B, and 32B parameter versions—has been distilled from the 671B DeepSeek R1 0528 model, capturing its high-level reasoning capabilities in significantly smaller and more efficient models. The release positions NVIDIA as a leading contributor to the open-source LLM ecosystem, delivering models that push state-of-the-art (SOTA) performance while remaining commercially permissive and widely accessible via Hugging Face. Model Overview and Architecture Distillation from DeepSeek R1 0528 (671B) At the heart of OpenReasoning-Nemotron lies a distillation strategy that transfers reasoning ability from DeepSeek R1—a massive 671B parameter model—into smaller architectures. The process prioritizes reasoning generalization over raw token prediction, enabling compact models to perform effectively on structured, high-cognition tasks. The distillation dataset emphasizes mathematics, science, and programming languages, aligning model capabilities with key reasoning domains. Model Variants and Specs Model Name Parameters Intended Use Hugging Face Page OpenReasoning-Nemotron-1.5B 1.5B Entry-level reasoning and inference Link OpenReasoning-Nemotron-7B 7B Mid-scale reasoning, good for code/math Link OpenReasoning-Nemotron-14B 14B Advanced reasoning capabilities Link OpenReasoning-Nemotron-32B 32B Near frontier-model performance in logic-intensive tasks Link All models are compatible with transformer architectures, support FP16/INT8 quantization, and are optimized for NVIDIA GPUs and NeMo frameworks. Performance Benchmarks These models set new state-of-the-art pass@1 scores for their size class across multiple reasoning benchmarks: Model GPQA MMLU‑PRO HLE LiveCodeBench SciCode AIME24 AIME25 HMMT Feb 2025 1.5B 31.6 47.5 5.5 28.6 2.2 55.5 45.6 31.5 7B 61.1 71.9 8.3 63.3 16.2 84.7 78.2 63.5 14B 71.6 77.5 10.1 67.8 23.5 87.8 82.0 71.2 32B 73.1 80.0 11.9 70.2 28.5 89.2 84.0 73.8 All quoted scores are pass@1 without GenSelect. GenSelect (Heavy Mode) Using Generative Selection with 64 candidates (“GenSelect”), performance further improves, especially at 32B: 32B achieves: AIME24 89.2 → 93.3, AIME25 84.0 → 90.0, HMMT 73.8 → 96.7, LiveCodeBench 70.2 → 75.3. This demonstrates strong emergent reasoning performance at scale. Training Data and Reasoning Specialization The training corpus is a distilled, high-quality subset of the DeepSeek R1 0528 dataset. Key features include: Heavily curated reasoning data from math, science, and CS disciplines. Prompt-engineered fine-tuning designed to reinforce multi-step thought chains. Emphasis on logical consistency, constraint satisfaction, and symbolic reasoning. This deliberate curation ensures strong alignment with real-world reasoning problems found in both academia and applied ML domains. Open and Ecosystem Integration All four OpenReasoning-Nemotron models are released under an open and commercially permissive license, with model cards, evaluation scripts, and inference-ready weights available on Hugging Face: OpenReasoning-Nemotron-1.5B OpenReasoning-Nemotron-7B OpenReasoning-Nemotron-14B OpenReasoning-Nemotron-32B These models are designed to plug into the NVIDIA NeMo framework, and support TensorRT-LLM, ONNX, and Hugging Face Transformers toolchains, facilitating rapid deployment in production and research settings. Key Use Cases Math tutors and theorem solvers Scientific QA agents and medical reasoning systems Code generation and debugging assistants Chain-of-thought multi-hop question answering Synthetic data generation for structured domains Conclusion NVIDIA’s OpenReasoning-Nemotron models offer a pragmatic, open-source path toward scaling reasoning ability without frontier-scale compute costs. By distilling from the 671B DeepSeek R1 and targeting high-leverage reasoning domains, these models deliver a powerful balance of accuracy, efficiency, and accessibility. For developers, researchers, and enterprises working on logic-intensive AI applications, OpenReasoning-Nemotron provides a compelling foundation—free from the trade-offs that often accompany proprietary or overgeneralized models. Frequently Asked Questions (FAQs) Q1. What benchmarks are supported?GPQA, MMLU-PRO, HLE, LiveCodeBench, SciCode, AIME 2024/25, HMMT Feb 2025 (pass@1). Q2. How much data was used?A distillation corpus of 5 million reasoning log examples across domains, generated by DeepSeek‑R1‑0528. Q3. Is reinforcement learning used?No—models are trained purely via SFT, preserving efficiency while enabling future RL research. Q4. Can I scale reasoning with GenSelect?Yes. Using GenSelect significantly boosts performance—32B jumps from 73.8 to 96.7 on HMMT with 64 candidates. Check out the Technical details. All credit for this research goes to the researchers of this project. Sponsorship Opportunity: Reach the most influential AI developers in US and Europe. 1M+ monthly readers, 500K+ community builders, infinite possibilities. [Explore Sponsorship] The post NVIDIA AI Releases OpenReasoning-Nemotron: A Suite of Reasoning-Enhanced LLMs Distilled from DeepSeek R1 0528 appeared first on MarkTechPost.

NVIDIA AI Releases OpenReasoning-Nemotron: A Suite of Reasoning-Enhanced LLMs Distilled from DeepSeek R1 0528 Read Post »

AI, Committee, 新闻, Uncategorized

AegisLLM: Scaling LLM Security Through Adaptive Multi-Agent Systems at Inference Time

The Growing Threat Landscape for LLMs LLMs are key targets for fast-evolving attacks, including prompt injection, jailbreaking, and sensitive data exfiltration. It is necessary to adapt defense mechanisms that move beyond static safeguards because of the fluid nature of these threats. Current LLM security techniques suffer due to their reliance on static, training-time interventions. Static filters and guardrails are fragile against minor adversarial tweaks, while training-time adjustments fail to generalize to unseen attacks after deployment. Machine unlearning often fails to erase knowledge completely, leaving sensitive information vulnerable to resurfacing. Current safety and security scaling mainly focuses on training-time methods, with limited exploration of test-time and system-level safety. Why Existing LLM Security Methods Are Insufficient RLHF and safety fine-tuning methods attempt to align models during training but show limited effectiveness against novel post-deployment attacks. System-level guardrails and red-teaming strategies provide additional protection layers, yet prove brittle against adversarial perturbations. Unlearning unsafe behaviors shows promise in specific scenarios, yet fails to achieve complete knowledge suppression. Multi-agent architectures are effective in distributing complex tasks, but their direct application to LLM security remains unexplored. Agentic optimization methods like TEXTGRAD and OPTO utilize structured feedback for iterative refinement, and DSPy facilitates prompt optimization for multi-stage pipelines. However, they are not applied systematically to security enhancement at inference time. AegisLLM: An Adaptive Inference-Time Security Framework Researchers from the University of Maryland, Lawrence Livermore National Laboratory, and Capital One have proposed AegisLLM (Adaptive Agentic Guardrails for LLM Security), a framework to improve LLM security through a cooperative, inference-time multi-agent system. It utilizes a structured agentic system of LLM-powered autonomous agents that continuously monitor, analyze, and reduce adversarial threats. The key components of AegisLLM are Orchestrator, Deflector, Responder, and Evaluator. Through automated prompt optimization and Bayesian learning, the system refines its defense capabilities without model retraining. This architecture allows real-time adaptation to evolving attack strategies, providing scalable, inference-time security while preserving the model’s utility. Coordinated Agent Pipeline and Prompt Optimization AegisLLM operates through a coordinated pipeline of specialized agents, each responsible for distinct functions while working in concert to ensure output safety. All agents are guided by carefully designed system prompts and user input at test time. Each agent is governed by a system prompt that encodes its specialized role and behavior, but manually crafted prompts typically fall short of optimal performance in high-stakes security scenarios. Therefore, the system automatically optimizes each agent’s system prompt to maximize effectiveness through an iterative optimization process. At each iteration, the system samples a batch of queries and evaluates them using candidate prompt configurations for specific agents. Benchmarking AegisLLM: WMDP, TOFU, and Jailbreaking Defense On the WMDP benchmark using Llama-3-8B, AegisLLM achieves the lowest accuracy on restricted topics among all methods, with WMDP-Cyber and WMDP-Bio accuracies approaching to 25% theoretical minimum. On the TOFU benchmark, it achieves near-perfect flagging accuracy across Llama-3-8B, Qwen2.5-72B, and DeepSeek-R1 models, with Qwen2.5-72B almost 100% accuracy on all subsets. In jailbreaking defense, results show strong performance against attack attempts while maintaining appropriate responses to legitimate queries on StrongREJECT and PHTest. AegisLLM achieves a 0.038 StrongREJECT score, competitive with state-of-the-art methods, and an 88.5% compliance rate without requiring extensive training, enhancing defense capabilities. Conclusion: Reframing LLM Security as Agentic Inference-Time Coordination In conclusion, researchers introduced AegisLLM, a framework that reframes LLM security as a dynamic, multi-agent system operating at inference time. AegisLLM’s success highlights that one should approach security as an emergent behavior from coordinated, specialized agents, rather than a static model characteristic. This shift from static, training-time interventions to adaptive, inference-time defense mechanisms solves the limitations of current methods while providing real-time adaptability against evolving threats. Frameworks like AegisLLM that enable dynamic, scalable security will become increasingly important for responsible AI deployment as language models continue to advance in capability. Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Sponsorship Opportunity Reach the most influential AI developers worldwide. 1M+ monthly readers, 500K+ community builders, infinite possibilities. [Explore Sponsorship] The post AegisLLM: Scaling LLM Security Through Adaptive Multi-Agent Systems at Inference Time appeared first on MarkTechPost.

AegisLLM: Scaling LLM Security Through Adaptive Multi-Agent Systems at Inference Time Read Post »

AI, Committee, 新闻, Uncategorized

EG-CFG: Enhancing Code Generation with Real-Time Execution Feedback

LLMs have made impressive strides in generating code for various programming tasks. However, they mostly rely on recognizing patterns from static code examples rather than understanding how the code behaves during execution. This often leads to programs that look correct but fail when run. While recent methods introduce iterative refinement and self-debugging, they typically act in separate steps, generating, testing, and then revising. Unlike human programmers who constantly run fragments of code and adjust based on real-time output, these models cannot integrate execution feedback continuously, limiting their effectiveness in producing truly functional code. The Role of Program Synthesis and Prompting in Code Generation Program synthesis has long been used to evaluate LLMs and automate code generation benchmarks, such as MBPP, HumanEval, and CodeContests, by testing models on various coding challenges. While prompting strategies, such as few-shot and Chain-of-Thought, have improved performance, newer methods now incorporate feedback loops that utilize tools or execution results to refine outputs. Some frameworks even assign tasks to multiple LLM agents, each tackling different aspects of the problem. However, most approaches still rely on simple decoding methods. Unlike traditional strategies, newer guidance techniques, such as CFG, offer a more dynamic approach but haven’t yet been widely applied with real-time execution feedback. Introducing EG-CFG: Execution-Guided Code Generation from Tel Aviv University Researchers at Tel Aviv University have introduced EG-CFG, a new method for code generation that actively utilizes execution feedback during the generation process, a technique commonly employed by human programmers. Instead of waiting until the end, EG-CFG evaluates partial code as it’s being written, guiding the model toward correct and executable outputs. It uses a beam search to generate multiple code options, runs them, and integrates runtime outcomes to influence the next steps. This real-time feedback loop significantly boosts performance across standard benchmarks, such as MBPP, HumanEval, and CodeContests, even surpassing closed-source models, while also enabling efficient parallel reasoning and dynamic exploration. How EG-CFG Works: Real-Time Feedback Meets Beam Search and AST Parsing The EG-CFG method improves code generation by guiding language models using real-time execution feedback during inference. For a given programming task, it generates partial code solutions and explores multiple continuations using beam search. These continuations are checked for syntax using AST parsing, and only valid ones are executed on test cases to gather detailed runtime traces, including variable states and errors. This feedback is then injected into the model’s prompt to inform future predictions. A guidance mechanism interpolates between the model’s standard output and feedback-informed suggestions, helping the model refine its solution step by step until it passes all test cases. Benchmark Results: EG-CFG Outperforms GPT-4 and Claude on HumanEval and MBPP-ET The EG-CFG method was tested using two versions of DeepSeek models: a 1.3B parameter model locally and the larger V3-0324 model through an API. It was evaluated on five code benchmarks: MBPP, HumanEval, CodeContests, MBPP-ET, and HumanEval-ET. On HumanEval, EG-CFG with DeepSeek V3 solved 90.1% of the tasks correctly, outperforming GPT-4 (85.5%) and Claude 2 (83.2%). On MBPP-ET, it achieved an 81.4% accuracy rate, setting a new benchmark. Notably, the smaller 1.3B model also showed strong gains, improving from 46.3% to 61.7% on HumanEval when guided with EG-CFG. An ablation study confirmed the importance of components like dynamic feedback and beam search in driving these results. Conclusion: EG-CFG Simulates Human Debugging to Advance Code Generation In conclusion, the EG-CFG method introduces a new way to generate code using language models by incorporating real-time execution feedback during generation. Unlike traditional approaches that rely on static patterns, EG-CFG simulates how human programmers test and refine code. It uses beam search to explore possible code completions, tests them with real inputs, and then guides generation based on the results. This happens line by line, ensuring feedback is both structured and actionable. The method also supports multiple agents working in parallel, boosting efficiency. EG-CFG achieves top accuracy across standard benchmarks, showing strong results even on complex coding tasks and with smaller models. Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Sponsorship Opportunity Reach the most influential AI developers worldwide. 1M+ monthly readers, 500K+ community builders, infinite possibilities. [Explore Sponsorship] The post EG-CFG: Enhancing Code Generation with Real-Time Execution Feedback appeared first on MarkTechPost.

EG-CFG: Enhancing Code Generation with Real-Time Execution Feedback Read Post »

AI, Committee, 新闻, Uncategorized

You Don’t Need to Share Data to Train a Language Model Anymore—FlexOlmo Demonstrates How

The development of large-scale language models (LLMs) has historically required centralized access to extensive datasets, many of which are sensitive, copyrighted, or governed by usage restrictions. This constraint severely limits the participation of data-rich organizations operating in regulated or proprietary environments. FlexOlmo—introduced by researchers at the Allen Institute for AI and collaborators—proposes a modular training and inference framework that enables LLM development under data governance constraints. Current LLMs….. Current LLM training pipelines rely on aggregating all training data into a single corpus, which imposes a static inclusion decision and eliminates the possibility of opt-out post-training. This approach is incompatible with: Regulatory regimes (e.g., HIPAA, GDPR, data sovereignty laws), License-bound datasets (e.g., non-commercial or attribution-restricted), Context-sensitive data (e.g., internal source code, clinical records). FlexOlmo addresses two objectives: Decentralized, modular training: Allow independently trained modules on disjoint, locally held datasets. Inference-time flexibility: Enable deterministic opt-in/opt-out mechanisms for dataset contributions without retraining. Model Architecture: Expert Modularity via Mixture-of-Experts (MoE) FlexOlmo builds upon a Mixture-of-Experts (MoE) architecture where each expert corresponds to a feedforward network (FFN) module trained independently. A fixed public model (denoted as M<sub>pub</sub>) serves as the shared anchor. Each data owner trains an expert M<sub>i</sub> using their private dataset D<sub>i</sub>, while all attention layers and other non-expert parameters remain frozen. Key architectural components: Sparse activation: Only a subset of expert modules is activated per input token. Expert routing: Token-to-expert assignment is governed by a router matrix derived from domain-informed embeddings, eliminating the need for joint training. Bias regularization: A negative bias term is introduced to calibrate selection across independently trained experts, preventing over-selection of any single expert. This design maintains interoperability among modules while enabling selective inclusion during inference. Asynchronous and Isolated Optimization Each expert M<sub>i</sub> is trained via a constrained procedure to ensure alignment with M<sub>pub</sub>. Specifically: Training is performed on a hybrid MoE instance comprising M<sub>i</sub> and M<sub>pub</sub>. The M<sub>pub</sub> expert and shared attention layers are frozen. Only the FFNs corresponding to M<sub>i</sub> and the router embeddings r<sub>i</sub> are updated. To initialize r<sub>i</sub>, a set of samples from D<sub>i</sub> is embedded using a pretrained encoder, and their average forms the router embedding. Optional lightweight router tuning can further improve performance using proxy data from the public corpus. Dataset Construction: FLEXMIX The training corpus, FLEXMIX, is divided into: A public mix, composed of general-purpose web data. Seven closed sets simulating non-shareable domains: News, Reddit, Code, Academic Text, Educational Text, Creative Writing, and Math. Each expert is trained on a disjoint subset, with no joint data access. This setup approximates real-world usage where organizations cannot pool data due to legal, ethical, or operational constraints. Evaluation and Baseline Comparisons FlexOlmo was evaluated on 31 benchmark tasks across 10 categories, including general language understanding (e.g., MMLU, AGIEval), generative QA (e.g., GEN5), code generation (e.g., Code4), and mathematical reasoning (e.g., Math2). Baseline methods include: Model soup: Averaging weights of individually fine-tuned models. Branch-Train-Merge (BTM): Weighted ensembling of output probabilities. BTX: Converting independently trained dense models into a MoE via parameter transplant. Prompt-based routing: Using instruction-tuned classifiers to route queries to experts. Compared to these methods, FlexOlmo achieves: A 41% average relative improvement over the base public model. A 10.1% improvement over the strongest merging baseline (BTM). The gains are especially notable on tasks aligned with closed domains, confirming the utility of specialized experts. Architectural Analysis Several controlled experiments reveal the contribution of architectural decisions: Removing expert-public coordination during training significantly degrades performance. Randomly initialized router embeddings reduce inter-expert separability. Disabling the bias term skews expert selection, particularly when merging more than two experts. Token-level routing patterns show expert specialization at specific layers. For instance, mathematical input activates the math expert at deeper layers, while introductory tokens rely on the public model. This behavior underlines the model’s expressivity compared to single-expert routing strategies. Opt-Out and Data Governance A key feature of FlexOlmo is deterministic opt-out capability. Removing an expert from the router matrix fully removes its influence at inference time. Experiments show that removing the News expert reduces performance on NewsG but leaves other tasks unaffected, confirming the localized influence of each expert. Privacy Considerations Training data extraction risks were evaluated using known attack methods. Results indicate: 0.1% extraction rate for a public-only model. 1.6% for a dense model trained on the math dataset. 0.7% for FlexOlmo with the math expert included. While these rates are low, differential privacy (DP) training can be applied independently to each expert for stronger guarantees. The architecture does not preclude the use of DP or encrypted training methods. Scalability The FlexOlmo methodology was applied to an existing strong baseline (OLMo-2 7B), pretrained on 4T tokens. Incorporating two additional experts (Math, Code) improved average benchmark performance from 49.8 to 52.8, without retraining the core model. This demonstrates scalability and compatibility with existing training pipelines. Conclusion FlexOlmo introduces a principled framework for building modular LLMs under data governance constraints. Its design supports distributed training on locally maintained datasets and enables inference-time inclusion/exclusion of dataset influence. Empirical results confirm its competitiveness against both monolithic and ensemble-based baselines. The architecture is particularly applicable to environments with: Data locality requirements, Dynamic data use policies, Regulatory compliance constraints. FlexOlmo provides a viable pathway for constructing performant language models while adhering to real-world data access boundaries. Check out the Paper, Model on Hugging Face and Codes. All credit for this research goes to the researchers of this project. Sponsorship Opportunity: Reach the most influential AI developers in US and Europe. 1M+ monthly readers, 500K+ community builders, infinite possibilities. [Explore Sponsorship] The post You Don’t Need to Share Data to Train a Language Model Anymore—FlexOlmo Demonstrates How appeared first on MarkTechPost.

You Don’t Need to Share Data to Train a Language Model Anymore—FlexOlmo Demonstrates How Read Post »

AI, Committee, 新闻, Uncategorized

A Survey of Context Engineering for Large Language Models

arXiv:2507.13334v1 Announce Type: new Abstract: The performance of Large Language Models (LLMs) is fundamentally determined by the contextual information provided during inference. This survey introduces Context Engineering, a formal discipline that transcends simple prompt design to encompass the systematic optimization of information payloads for LLMs. We present a comprehensive taxonomy decomposing Context Engineering into its foundational components and the sophisticated implementations that integrate them into intelligent systems. We first examine the foundational components: context retrieval and generation, context processing and context management. We then explore how these components are architecturally integrated to create sophisticated system implementations: retrieval-augmented generation (RAG), memory systems and tool-integrated reasoning, and multi-agent systems. Through this systematic analysis of over 1300 research papers, our survey not only establishes a technical roadmap for the field but also reveals a critical research gap: a fundamental asymmetry exists between model capabilities. While current models, augmented by advanced context engineering, demonstrate remarkable proficiency in understanding complex contexts, they exhibit pronounced limitations in generating equally sophisticated, long-form outputs. Addressing this gap is a defining priority for future research. Ultimately, this survey provides a unified framework for both researchers and engineers advancing context-aware AI.

A Survey of Context Engineering for Large Language Models Read Post »

AI, Committee, 新闻, Uncategorized

UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning

arXiv:2502.15082v2 Announce Type: replace-cross Abstract: User specifications or legal frameworks often require information to be removed from pretrained models, including large language models (LLMs). This requires deleting or “forgetting” a set of data points from an already-trained model, which typically degrades its performance on other data points. Thus, a balance must be struck between removing information and keeping the model’s other abilities intact, with a failure to balance this trade-off leading to poor deletion or an unusable model. To this end, we propose UPCORE (Utility-Preserving Coreset Selection), a method-agnostic data selection framework for mitigating collateral damage during unlearning. Finding that the model damage is correlated with the variance of the model’s representations on the forget set, we selectively prune the forget set to remove outliers, thereby minimizing model degradation after unlearning. Across three standard unlearning methods, UPCORE consistently achieves a superior balance between the competing objectives of deletion efficacy and model preservation. To better evaluate this trade-off, we introduce a new metric, measuring the area-under-the-curve (AUC) across standard metrics. Our results show that UPCORE improves both standard metrics and AUC, benefiting from positive transfer between the coreset and pruned points while reducing negative transfer from the forget set to points outside of it.

UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning Read Post »

zh_CN