YouZum

AI

AI, Committee, Noticias, Uncategorized

GFlowPO: Generative Flow Network as a Language Model Prompt Optimizer

arXiv:2602.03358v1 Announce Type: cross Abstract: Finding effective prompts for language models (LMs) is critical yet notoriously difficult: the prompt space is combinatorially large, rewards are sparse due to expensive target-LM evaluation. Yet, existing RL-based prompt optimizers often rely on on-policy updates and a meta-prompt sampled from a fixed distribution, leading to poor sample efficiency. We propose GFlowPO, a probabilistic prompt optimization framework that casts prompt search as a posterior inference problem over latent prompts regularized by a meta-prompted reference-LM prior. In the first step, we fine-tune a lightweight prompt-LM with an off-policy Generative Flow Network (GFlowNet) objective, using a replay-based training policy that reuses past prompt evaluations to enable sample-efficient exploration. In the second step, we introduce Dynamic Memory Update (DMU), a training-free mechanism that updates the meta-prompt by injecting both (i) diverse prompts from a replay buffer and (ii) top-performing prompts from a small priority queue, thereby progressively concentrating the search process on high-reward regions. Across few-shot text classification, instruction induction benchmarks, and question answering tasks, GFlowPO consistently outperforms recent discrete prompt optimization baselines.

GFlowPO: Generative Flow Network as a Language Model Prompt Optimizer Leer entrada »

AI, Committee, Noticias, Uncategorized

Capturing Classic Authorial Style in Long-Form Story Generation with GRPO Fine-Tuning

arXiv:2512.05747v2 Announce Type: replace Abstract: Evaluating and optimising authorial style in long-form story generation remains challenging because style is often assessed with ad hoc prompting and is frequently conflated with overall writing quality. We propose a two-stage pipeline. First, we train a dedicated style-similarity judge by fine-tuning a sentence-transformer with authorship-verification supervision, and calibrate its similarity outputs into a bounded $[0,1]$ reward. Second, we use this judge as the primary reward in Group Relative Policy Optimization (GRPO) to fine-tune an 8B story generator for style-conditioned writing, avoiding the accept/reject supervision required by Direct Preference Optimization (DPO). Across four target authors (Mark Twain, Jane Austen, Charles Dickens, Thomas Hardy), the GRPO-trained 8B model achieves higher style scores than open-weight baselines, with an average style score of 0.893 across authors. These results suggest that AV-calibrated reward modelling provides a practical mechanism for controllable style transfer in long-form generation under a moderate model size and training budget.

Capturing Classic Authorial Style in Long-Form Story Generation with GRPO Fine-Tuning Leer entrada »

AI, Committee, Noticias, Uncategorized

Proactive defense against LLM Jailbreak

arXiv:2510.05052v2 Announce Type: replace-cross Abstract: The proliferation of powerful large language models (LLMs) has necessitated robust safety alignment, yet these models remain vulnerable to evolving adversarial attacks, including multi-turn jailbreaks that iteratively search for successful queries. Current defenses, which are primarily reactive and static, often fail to handle these iterative attacks. In this paper, we introduce ProAct, a novel proactive defense framework designed to disrupt and mislead these iterative search jailbreak methods. Our core idea is to intentionally mislead these jailbreak methods into thinking that the model has been jailbroken with “spurious responses”. These misleading responses provide false signals to the attacker’s internal optimization loop, causing the adversarial search to terminate prematurely and effectively jailbreaking the jailbreak. By conducting extensive experiments across state-of-the-art LLMs, jailbreaking frameworks, and safety benchmarks, we demonstrate that our method consistently and significantly reduces attack success rates by up to 94% without affecting utility. When combined with other defense fraeworks, it further reduces the latest attack strategies’ success rate to 0%. ProActrepresents an orthogonal defense strategy that serves as an additional guardrail to enhance LLM safety against the most effective jailbreaking attacks.

Proactive defense against LLM Jailbreak Leer entrada »

AI, Committee, Noticias, Uncategorized

TurkBench: A Benchmark for Evaluating Turkish Large Language Models

arXiv:2601.07020v2 Announce Type: replace Abstract: With the recent surge in the development of large language models, the need for comprehensive and language-specific evaluation benchmarks has become critical. While significant progress has been made in evaluating English-language models, benchmarks for other languages, particularly those with unique linguistic characteristics such as Turkish, remain less developed. Our study introduces TurkBench, a comprehensive benchmark designed to assess the capabilities of generative large language models in the Turkish language. TurkBench involves 8,151 data samples across 21 distinct subtasks. These are organized under six main categories of evaluation: Knowledge, Language Understanding, Reasoning, Content Moderation, Turkish Grammar and Vocabulary, and Instruction Following. The diverse range of tasks and the culturally relevant data would provide researchers and developers with a valuable tool for evaluating their models and identifying areas for improvement. We further publish our benchmark for online submissions at https://huggingface.co/turkbench

TurkBench: A Benchmark for Evaluating Turkish Large Language Models Leer entrada »

AI, Committee, Noticias, Uncategorized

Pursuing Best Industrial Practices for Retrieval-Augmented Generation in the Medical Domain

arXiv:2602.03368v1 Announce Type: new Abstract: While retrieval augmented generation (RAG) has been swiftly adopted in industrial applications based on large language models (LLMs), there is no consensus on what are the best practices for building a RAG system in terms of what are the components, how to organize these components and how to implement each component for the industrial applications, especially in the medical domain. In this work, we first carefully analyze each component of the RAG system and propose practical alternatives for each component. Then, we conduct systematic evaluations on three types of tasks, revealing the best practices for improving the RAG system and how LLM-based RAG systems make trade-offs between performance and efficiency.

Pursuing Best Industrial Practices for Retrieval-Augmented Generation in the Medical Domain Leer entrada »

AI, Committee, Noticias, Uncategorized

Restoring Exploration after Post-Training: Latent Exploration Decoding for Large Reasoning Models

arXiv:2602.01698v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) have recently achieved strong mathematical and code reasoning performance through Reinforcement Learning (RL) post-training. However, we show that modern reasoning post-training induces an unintended exploration collapse: temperature-based sampling no longer increases pass@$n$ accuracy. Empirically, the final-layer posterior of post-trained LRMs exhibit sharply reduced entropy, while the entropy of intermediate layers remains relatively high. Motivated by this entropy asymmetry, we propose Latent Exploration Decoding (LED), a depth-conditioned decoding strategy. LED aggregates intermediate posteriors via cumulative sum and selects depth configurations with maximal entropy as exploration candidates. Without additional training or parameters, LED consistently improves pass@1 and pass@16 accuracy by 0.61 and 1.03 percentage points across multiple reasoning benchmarks and models. Project page: https://GitHub.com/Xiaomi-Research/LED.

Restoring Exploration after Post-Training: Latent Exploration Decoding for Large Reasoning Models Leer entrada »

AI, Committee, Noticias, Uncategorized

Understanding QA generation: Extracting Parametric and Contextual Knowledge with CQA for Low Resource Bangla Language

arXiv:2602.01451v1 Announce Type: new Abstract: Question-Answering (QA) models for low-resource languages like Bangla face challenges due to limited annotated data and linguistic complexity. A key issue is determining whether models rely more on pre-encoded (parametric) knowledge or contextual input during answer generation, as existing Bangla QA datasets lack the structure required for such analysis. We introduce BanglaCQA, the first Counterfactual QA dataset in Bangla, by extending a Bangla dataset while integrating counterfactual passages and answerability annotations. In addition, we propose fine-tuned pipelines for encoder-decoder language-specific and multilingual baseline models, and prompting-based pipelines for decoder-only LLMs to disentangle parametric and contextual knowledge in both factual and counterfactual scenarios. Furthermore, we apply LLM-based and human evaluation techniques that measure answer quality based on semantic similarity. We also present a detailed analysis of how models perform across different QA settings in low-resource languages, and show that Chain-of-Thought (CoT) prompting reveals a uniquely effective mechanism for extracting parametric knowledge in counterfactual scenarios, particularly in decoder-only LLMs. Our work not only introduces a novel framework for analyzing knowledge sources in Bangla QA but also uncovers critical findings that open up broader directions for counterfactual reasoning in low-resource language settings.

Understanding QA generation: Extracting Parametric and Contextual Knowledge with CQA for Low Resource Bangla Language Leer entrada »

AI, Committee, Noticias, Uncategorized

Geometric-disentangelment Unlearning

arXiv:2511.17100v4 Announce Type: replace-cross Abstract: Large language models (LLMs) can internalize private or harmful content, motivating unlearning that removes a forget set while preserving retaining knowledge. However, forgetting updates often cause collateral degradation on retaining knowledge, creating a persistent trade-off. Existing LLM unlearning methods are often heuristic, and other theoretical approaches rely on offline feature constructions that do not capture update-time forget-retain interaction in LLMs. To address this limitation, we aim to develop an LLM unlearning method that reduces the forget-retain trade-off with theoretical guarantees. We take a first-principles view by formalizing “no side effects” as local retain invariance under small parameter updates, and prove an equivalence under optimizer-induced geometry: the retain loss is locally invariant if and only if the update direction is orthogonal to the subspace spanned by retain gradients. Based on the insight, we propose Geometric-disentanglement Unlearning (GU), a lightweight and theoretically grounded projection that can be plug-and-play to existing gradient-based unlearning methods to mitigate forget-retain side effects. Experiments on TOFU, MUSE, and WMDP-cyber show that GU strengthens forgetting while reducing retain drift. When added to SimNPO, it achieves up to 62% improved forgetting Extraction Strength (ES) and 31% higher retain ES. We open-sourced our code in https://github.com/Lemutisme/Geometric-Unlearning.

Geometric-disentangelment Unlearning Leer entrada »

AI, Committee, Noticias, Uncategorized

SignX: Continuous Sign Recognition in Compact Pose-Rich Latent Space

arXiv:2504.16315v3 Announce Type: replace-cross Abstract: The complexity of sign language data processing brings many challenges. The current approach to recognition of ASL signs aims to translate RGB sign language videos through pose information into English-based ID Glosses, which serve to uniquely identify ASL signs. This paper proposes SignX, a novel framework for continuous sign language recognition in compact pose-rich latent space. First, we construct a unified latent representation that encodes heterogeneous pose formats (SMPLer-X, DWPose, Mediapipe, PrimeDepth, and Sapiens Segmentation) into a compact, information-dense space. Second, we train a ViT-based Video2Pose module to extract this latent representation directly from raw videos. Finally, we develop a temporal modeling and sequence refinement method that operates entirely in this latent space. This multi-stage design achieves end-to-end sign language recognition while significantly reducing computational consumption. Experimental results demonstrate that SignX achieves state-of-the-art accuracy on continuous sign language recognition.

SignX: Continuous Sign Recognition in Compact Pose-Rich Latent Space Leer entrada »

AI, Committee, Noticias, Uncategorized

Microbes could extract the metal needed for cleantech

In a pine forest on Michigan’s Upper Peninsula, the only active nickel mine in the US is nearing the end of its life. At a time when carmakers want the metal for electric-vehicle batteries, nickel concentration at Eagle Mine is falling and could soon drop too low to warrant digging. But earlier this year, the mine’s owner started testing a new process that could eke out a bit more nickel. In a pair of shipping containers recently installed at the mine’s mill, a fermentation-derived broth developed by the startup Allonnia is mixed with concentrated ore to capture and remove impurities. The process allows nickel production from lower-quality ore.  Kent Sorenson, Allonnia’s chief technology officer, says this approach could help companies continue operating sites that, like Eagle Mine, have burned through their best ore. “The low-hanging fruit is to keep mining the mines that we have,” he says.  Demand for nickel, copper, and rare earth elements is rapidly increasing amid the explosive growth of metal-intensive data centers, electric cars, and renewable energy projects. But producing these metals is becoming harder and more expensive because miners have already exploited the best resources. Like the age-old technique of rolling up the end of a toothpaste tube, Allonnia’s broth is one of a number of ways that biotechnology could help miners squeeze more metal out of aging mines, mediocre ore, or piles of waste. The mining industry has intentionally seeded copper ore with microbes for decades. At current copper bioleaching sites, miners pile crushed copper ore into heaps and add sulfuric acid. Acid-loving bacteria like Acidithiobacillus ferrooxidans colonize the mound. A chemical the organisms produce breaks the bond between sulfur and copper molecules to liberate the metal. Until now, beyond maintaining the acidity and blowing air into the heap, there wasn’t much more miners could do to encourage microbial growth. But Elizabeth Dennett, CEO of the startup Endolith, says the decreasing cost of genetic tools is making it possible to manage the communities of microbes in a heap more actively. “The technology we’re using now didn’t exist a few years ago,” she says. Endolith analyzes bits of DNA and RNA in the copper-rich liquid that flows out of an ore heap to characterize the microbes living inside. Combined with a suite of chemical analyses, the information helps the company determine which microbes to sprinkle on a heap to optimize extraction.  Endolith scientists use columns filled with copper ore to test the firm’s method of actively managing microbes in the ore to increase metal extraction.ENDOLITH In lab tests on ore from the mining firm BHP, Endolith’s active techniques outperformed passive bioleaching approaches. In November, the company raised $16.5 million to move from its Denver lab to heaps in active mines. Despite these promising early results, Corale Brierley, an engineer who has worked on metal bioleaching systems since the 1970s, questions whether companies like Endolith that add additional microbes to ore will successfully translate their processes to commercial scales. “What guarantees are you going to give the company that those organisms will actually grow?” Brierley asks. Big mining firms that have already optimized every hose, nut, and bolt in their process won’t be easy to convince either, says Diana Rasner, an analyst covering mining technology for the research firm Cleantech Group.  “They are acutely aware of what it takes to scale these technologies because they know the industry,” she says. “They’ll be your biggest supporters, but they’re going to be your biggest critics.” In addition to technical challenges, Rasner points out that venture-capital-backed biotechnology startups will struggle to deliver the quick returns their investors seek. Mining companies want lots of data before adopting a new process, which could take years of testing to compile. “This is not software,” Rasner says.   Nuton, a subsidiary of the mining giant Rio Tinto, is a good example. The company has been working for decades on a copper bioleaching process that uses a blend of archaea and bacteria strains, plus some chemical additives. But it started demonstrating the technology only late last year, at a mine in Arizona.  Nuton is testing an improved bioleaching process at Gunnison Copper’s Johnson Camp mine in Arizona.NUTON While Endolith and Nuton use naturally occurring microbes, the startup 1849 is hoping to achieve a bigger performance boost by genetically engineering microbes. “You can do what mining companies have traditionally done,” says CEO Jai Padmakumar. “Or you can try to take the moonshot bet and engineer them. If you get that, you have a huge win.” Genetic engineering would allow 1849 to tailor its microbes to the specific challenges facing a customer. But engineering organisms can also make them harder to grow, warns Buz Barstow, a Cornell University microbiologist who studies applications for biotechnology in mining. Other companies are trying to avoid that trade-off by applying the products of microbial fermentation, rather than live organisms. Alta Resource Technologies, which closed a $28 million investment round in December, is engineering microbes that make proteins capable of extracting and separating rare earth elements. Similarly, the startup REEgen, based in Ithaca, New York, relies on the organic acids produced by an engineered strain of Gluconobacter oxydans to extract rare earth elements from ore and from waste materials like metal recycling slag, coal ash, or old electronics. “The microbes are the manufacturing,” says CEO Alexa Schmitz, an alumna of Barstow’s lab. To make a dent in the growing demand for metal, this new wave of biotechnologies will have to go beyond copper and gold, says Barstow. In 2024, he started a project to map out genes that could be useful for extracting and separating a wider range of metals. Even with the challenges ahead, he says, biotechnology has the potential to transform mining the way fracking changed natural gas. “Biomining is one of these areas where the need … is big enough,” he says.  The challenge will be moving fast enough to keep up with growing demand.

Microbes could extract the metal needed for cleantech Leer entrada »

We use cookies to improve your experience and performance on our website. You can learn more at Política de privacidad and manage your privacy settings by clicking Settings.

Privacy Preferences

You can choose your cookie settings by turning on/off each type of cookie as you wish, except for essential cookies.

Allow All
Manage Consent Preferences
  • Always Active

Save
es_ES