YouZum

Committee

AI, Committee, Notizie, Uncategorized

It Takes Two: Your GRPO Is Secretly DPO

arXiv:2510.00977v1 Announce Type: cross Abstract: Group Relative Policy Optimization (GRPO) is a prominent reinforcement learning algorithm for post-training Large Language Models (LLMs). It is commonly believed that GRPO necessitates a large group size to ensure stable training via precise statistical estimation, which incurs substantial computational overhead. In this work, we challenge this assumption by reframing GRPO as a form of contrastive learning, which reveals a fundamental connection to Direct Preference Optimization (DPO). Motivated by DPO’s empirical success, we investigate the minimal two-rollout case (2-GRPO), a configuration previously deemed infeasible. We provide a rigorous theoretical analysis to validate 2-GRPO and demonstrate empirically that it achieves performance on par with 16-GRPO, despite using only 1/8 of the rollouts and reducing training time by over 70%.

It Takes Two: Your GRPO Is Secretly DPO Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

ServiceNow AI Releases Apriel-1.5-15B-Thinker: An Open-Weights Multimodal Reasoning Model that Hits Frontier-Level Performance on a Single-GPU Budget

ServiceNow AI Research Lab has released Apriel-1.5-15B-Thinker, a 15-billion-parameter open-weights multimodal reasoning model trained with a data-centric mid-training recipe—continual pretraining followed by supervised fine-tuning—without reinforcement learning or preference optimization. The model attains an Artificial Analysis Intelligence Index score of 52 with 8x cost savings compared to SOTA. The checkpoint ships under an MIT license on Hugging Face. So, What’s new in it for me? Frontier-level composite score at small scale. The model reports Artificial Analysis Intelligence Index (AAI) = 52, matching DeepSeek-R1-0528 on that combined metric while being dramatically smaller. AAI aggregates 10 third-party evaluations (MMLU-Pro, GPQA Diamond, Humanity’s Last Exam, LiveCodeBench, SciCode, AIME 2025, IFBench, AA-LCR, Terminal-Bench Hard, τ²-Bench Telecom). Single-GPU deployability. The model card states the 15B checkpoint “fits on a single GPU,” targeting on-premises and air-gapped deployments with fixed memory and latency budgets. Open weights and reproducible pipeline. Weights, training recipe, and evaluation protocol are public for independent verification. https://huggingface.co/ServiceNow-AI/Apriel-1.5-15b-Thinker Ok! I got it but what is it’s training mechanism? Base and upscaling. Apriel-1.5-15B-Thinker starts from Mistral’s Pixtral-12B-Base-2409 multimodal decoder-vision stack. The research team applies depth upscaling—increasing decoder layers from 40→48—then projection-network realignment to align the vision encoder with the enlarged decoder. This avoids pretraining from scratch while preserving single-GPU deployability. CPT (Continual Pretraining). Two stages: (1) mixed text+image data to build foundational reasoning and document/diagram understanding; (2) targeted synthetic visual tasks (reconstruction, matching, detection, counting) to sharpen spatial and compositional reasoning. Sequence lengths extend to 32k and 16k tokens respectively, with selective loss placement on response tokens for instruction-formatted samples. SFT (Supervised Fine-Tuning). High-quality, reasoning-trace instruction data for math, coding, science, tool use, and instruction following; two additional SFT runs (stratified subset; longer-context) are weight-merged to form the final checkpoint. No RL (reinforcement learning) or RLAIF (reinforcement learning from AI feedback). Data note. ~25% of the depth-upscaling text mix derives from NVIDIA’s Nemotron collection. O’ Wow! Tell me about it’s results then? Key text benchmarks (pass@1 / accuracy). AIME 2025 (American Invitational Mathematics Examination 2025): 87.5–88% GPQA Diamond (Graduate-Level Google-Proof Question Answering, Diamond split): ≈71% IFBench (Instruction-Following Benchmark): ~62 τ²-Bench (Tau-squared Bench) Telecom: ~68 LiveCodeBench (functional code correctness): ~72.8 Using VLMEvalKit for reproducibility, Apriel scores competitively across MMMU / MMMU-Pro (Massive Multi-discipline Multimodal Understanding), LogicVista, MathVision, MathVista, MathVerse, MMStar, CharXiv, AI2D, BLINK, with stronger results on documents/diagrams and text-dominant math imagery. https://huggingface.co/ServiceNow-AI/Apriel-1.5-15b-Thinker/blob/main/Apriel-1.5-Thinker.pdf Lets Summarize everything Apriel-1.5-15B-Thinker demonstrates that careful mid-training (continual pretraining + supervised fine-tuning, no reinforcement learning) can deliver a 52 on the Artificial Analysis Intelligence Index (AAI) while remaining deployable on a single graphics processing unit. Reported task-level scores (for example, AIME 2025 ≈88, GPQA Diamond ≈71, IFBench ≈62, Tau-squared Bench Telecom ≈68) align with the model card and place the 15-billion-parameter checkpoint in the most cost-efficient band of current open-weights reasoners. For enterprises, that combination—open weights, reproducible recipe, and single-GPU latency—makes Apriel a practical baseline to evaluate before considering larger closed systems. The post ServiceNow AI Releases Apriel-1.5-15B-Thinker: An Open-Weights Multimodal Reasoning Model that Hits Frontier-Level Performance on a Single-GPU Budget appeared first on MarkTechPost.

ServiceNow AI Releases Apriel-1.5-15B-Thinker: An Open-Weights Multimodal Reasoning Model that Hits Frontier-Level Performance on a Single-GPU Budget Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

One-Token Rollout: Guiding Supervised Fine-Tuning of LLMs with Policy Gradient

arXiv:2509.26313v1 Announce Type: new Abstract: Supervised fine-tuning (SFT) is the predominant method for adapting large language models (LLMs), yet it often struggles with generalization compared to reinforcement learning (RL). In this work, we posit that this performance disparity stems not just from the loss function, but from a more fundamental difference: SFT learns from a fixed, pre-collected dataset, whereas RL utilizes on-policy data sampled from the current policy. Building on this hypothesis, we introduce one-token rollout (OTR), a novel fine-tuning algorithm that guides SFT with the policy gradient method. OTR reframes the autoregressive learning process by treating each token generation as a single-step reinforcement learning trajectory. At each step, it performs a Monte Carlo “rollout” by sampling multiple candidate tokens from the current policy’s distribution. The ground-truth token from the supervised data is then used to provide a reward signal to these samples. Guided by policy gradient, our algorithm repurposes static, off-policy supervised data into a dynamic, on-policy signal at the token level, capturing the generalization benefits of on-policy learning while bypassing the costly overhead of full sentence generation. Through extensive experiments on a diverse suite of challenging benchmarks spanning mathematical reasoning, code generation, and general domain reasoning, we demonstrate that OTR consistently outperforms standard SFT. Our findings establish OTR as a powerful and practical alternative for fine-tuning LLMs and provide compelling evidence that the on-policy nature of data is a critical driver of generalization, offering a promising new direction for fine-tuning LLMs.

One-Token Rollout: Guiding Supervised Fine-Tuning of LLMs with Policy Gradient Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

How to Build an Advanced Agentic Retrieval-Augmented Generation (RAG) System with Dynamic Strategy and Smart Retrieval?

In this tutorial, we walk through the implementation of an Agentic Retrieval-Augmented Generation (RAG) system. We design it so that the agent does more than just retrieve documents; it actively decides when retrieval is needed, selects the best retrieval strategy, and synthesizes responses with contextual awareness. By combining embeddings, FAISS indexing, and a mock LLM, we create a practical demonstration of how agentic decision-making can elevate the standard RAG pipeline into something more adaptive and intelligent. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser import numpy as np import faiss from sentence_transformers import SentenceTransformer import json import re from typing import List, Dict, Any, Optional from dataclasses import dataclass from enum import Enum class MockLLM: def generate(self, prompt: str, max_tokens: int = 150) -> str: prompt_lower = prompt.lower() if “decide whether to retrieve” in prompt_lower: if any(word in prompt_lower for word in [“specific”, “recent”, “data”, “facts”, “when”, “who”, “what”]): return “RETRIEVE: The query requires specific factual information that needs to be retrieved.” else: return “NO_RETRIEVE: This is a general question that can be answered with existing knowledge.” elif “choose retrieval strategy” in prompt_lower: if “comparison” in prompt_lower or “versus” in prompt_lower: return “STRATEGY: multi_query – Need to retrieve information about multiple entities for comparison.” elif “recent” in prompt_lower or “latest” in prompt_lower: return “STRATEGY: temporal – Focus on recent information.” else: return “STRATEGY: semantic – Standard semantic similarity search.” elif “synthesize” in prompt_lower and “context:” in prompt_lower: return “Based on the retrieved information, here’s a comprehensive answer that combines multiple sources and provides specific details with proper context.” return “This is a mock response. In practice, use a real LLM like OpenAI’s GPT or similar.” class RetrievalStrategy(Enum): SEMANTIC = “semantic” MULTI_QUERY = “multi_query” TEMPORAL = “temporal” HYBRID = “hybrid” @dataclass class Document: id: str content: str metadata: Dict[str, Any] embedding: Optional[np.ndarray] = None We set up the foundation of our Agentic RAG system. We define a mock LLM to simulate decision-making, create a retrieval strategy enum, and design a Document dataclass so we can structure and manage our knowledge base efficiently. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser class AgenticRAGSystem: def __init__(self, model_name: str = “all-MiniLM-L6-v2″): self.encoder = SentenceTransformer(model_name) self.llm = MockLLM() self.documents: List[Document] = [] self.index: Optional[faiss.Index] = None def add_documents(self, documents: List[Dict[str, Any]]) -> None: print(f”Processing {len(documents)} documents…”) for i, doc in enumerate(documents): doc_obj = Document( id=doc.get(‘id’, str(i)), content=doc[‘content’], metadata=doc.get(‘metadata’, {}) ) self.documents.append(doc_obj) contents = [doc.content for doc in self.documents] embeddings = self.encoder.encode(contents, show_progress_bar=True) for doc, embedding in zip(self.documents, embeddings): doc.embedding = embedding dimension = embeddings.shape[1] self.index = faiss.IndexFlatIP(dimension) faiss.normalize_L2(embeddings) self.index.add(embeddings.astype(‘float32’)) print(f”Knowledge base built with {len(self.documents)} documents”) We build the core of our Agentic RAG system. We initialize the embedding model, set up the FAISS index, and add documents by encoding their contents into vectors, enabling fast and accurate semantic retrieval from our knowledge base. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser def decide_retrieval(self, query: str) -> bool: decision_prompt = f””” Analyze the following query and decide whether to retrieve information: Query: “{query}” Decide whether to retrieve information from the knowledge base. Consider if this needs specific facts, recent data, or can be answered generally. Respond with either: RETRIEVE: [reason] or NO_RETRIEVE: [reason] “”” response = self.llm.generate(decision_prompt) should_retrieve = response.startswith(“RETRIEVE:”) print(f” Agent Decision: {‘Retrieve’ if should_retrieve else ‘Direct Answer’}”) print(f” Reasoning: {response.split(‘:’, 1)[1].strip() if ‘:’ in response else response}”) return should_retrieve def choose_strategy(self, query: str) -> RetrievalStrategy: strategy_prompt = f””” Choose the best retrieval strategy for this query: Query: “{query}” Available strategies: – semantic: Standard similarity search – multi_query: Multiple related queries (for comparisons) – temporal: Focus on recent information – hybrid: Combination approach Choose retrieval strategy and explain why. Respond with: STRATEGY: [strategy_name] – [reasoning] “”” response = self.llm.generate(strategy_prompt) if “multi_query” in response.lower(): strategy = RetrievalStrategy.MULTI_QUERY elif “temporal” in response.lower(): strategy = RetrievalStrategy.TEMPORAL elif “hybrid” in response.lower(): strategy = RetrievalStrategy.HYBRID else: strategy = RetrievalStrategy.SEMANTIC print(f” Retrieval Strategy: {strategy.value}”) print(f” Reasoning: {response.split(‘-‘, 1)[1].strip() if ‘-‘ in response else response}”) return strategy We give our agent the ability to think before it fetches. We first determine if a query truly requires retrieval, then we select the most suitable strategy: semantic, multi-query, temporal, or hybrid. This allows us to target the correct context with clear, printed reasoning for each step. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser def retrieve_documents(self, query: str, strategy: RetrievalStrategy, k: int = 3) -> List[Document]: if not self.index: print(” No knowledge base available”) return [] if strategy == RetrievalStrategy.MULTI_QUERY: queries = [query, f”advantages of {query}”, f”disadvantages of {query}”] all_docs = [] for q in queries: docs = self._semantic_search(q, k=2) all_docs.extend(docs) seen_ids = set() unique_docs = [] for doc in all_docs: if doc.id not in seen_ids: unique_docs.append(doc) seen_ids.add(doc.id) return unique_docs[:k] elif strategy == RetrievalStrategy.TEMPORAL: docs = self._semantic_search(query, k=k*2) docs_with_dates = [(doc, doc.metadata.get(‘date’, ‘1900-01-01’)) for doc in docs] docs_with_dates.sort(key=lambda x: x[1], reverse=True) return [doc for doc, _ in docs_with_dates[:k]] else: return self._semantic_search(query, k=k) def _semantic_search(self, query: str, k: int) -> List[Document]: query_embedding = self.encoder.encode([query]) faiss.normalize_L2(query_embedding) scores, indices = self.index.search(query_embedding.astype(‘float32’), k) results = [] for score, idx in zip(scores[0], indices[0]): if idx < len(self.documents): results.append(self.documents[idx]) return results def synthesize_response(self, query: str, retrieved_docs: List[Document]) -> str: if not retrieved_docs: return self.llm.generate(f”Answer this query: {query}”) context = “nn”.join([f”Document {i+1}: {doc.content}” for i, doc in enumerate(retrieved_docs)]) synthesis_prompt = f””” Query: {query} Context: {context} Synthesize a comprehensive answer using the provided context. Be specific and reference the information sources when relevant. “”” return self.llm.generate(synthesis_prompt, max_tokens=200) We implement how we actually fetch and use knowledge. We perform semantic search, branch into multi-query or temporal re-ranking when needed, deduplicate results, and then synthesize a focused answer from the retrieved context. In doing so, we maintain efficient, transparent, and tightly aligned retrieval. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser def query(self, query: str) -> str: print(f”n Processing Query: ‘{query}'”) print(“=” * 50) if not self.decide_retrieval(query): print(“n Generating direct response…”) return self.llm.generate(f”Answer this query: {query}”) strategy = self.choose_strategy(query) print(f”n Retrieving documents using {strategy.value} strategy…”) retrieved_docs = self.retrieve_documents(query, strategy) print(f” Retrieved {len(retrieved_docs)} documents”)

How to Build an Advanced Agentic Retrieval-Augmented Generation (RAG) System with Dynamic Strategy and Smart Retrieval? Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

End-to-End Aspect-Guided Review Summarization at Scale

arXiv:2509.26103v1 Announce Type: new Abstract: We present a scalable large language model (LLM)-based system that combines aspect-based sentiment analysis (ABSA) with guided summarization to generate concise and interpretable product review summaries for the Wayfair platform. Our approach first extracts and consolidates aspect-sentiment pairs from individual reviews, selects the most frequent aspects for each product, and samples representative reviews accordingly. These are used to construct structured prompts that guide the LLM to produce summaries grounded in actual customer feedback. We demonstrate the real-world effectiveness of our system through a large-scale online A/B test. Furthermore, we describe our real-time deployment strategy and release a dataset of 11.8 million anonymized customer reviews covering 92,000 products, including extracted aspects and generated summaries, to support future research in aspect-guided review summarization.

End-to-End Aspect-Guided Review Summarization at Scale Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm

arXiv:2509.23946v2 Announce Type: replace-cross Abstract: Chain-of-Thought (CoT) and its variants have markedly advanced the reasoning abilities of Large Language Models (LLMs), yet their monolithic and auto-regressive architecture inherently conflates high-level strategic planning with low-level step-by-step execution, leading to computational inefficiency, limited exploration of reasoning paths, and reduced interpretability. To overcome these issues, we propose the Explore-Execute Chain ($E^2C$), a structured reasoning framework that decouples reasoning into two distinct phases: an exploratory phase that stochastically generates succinct high-level plans, followed by an execution phase that deterministically carries out the chosen plan. Our approach incorporates a two-stage training methodology, which combines Supervised Fine-Tuning (SFT) – augmented by a novel data generation algorithm enforcing strict plan adherence – with a subsequent Reinforcement Learning (RL) stage that capitalizes on the informativeness of exploration and reinforces the determinism of execution. This decomposition enables an efficient test-time scaling strategy: on AIME’2024, $E^2C$ Test Time Scaling reaches 58.1% accuracy using

Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Vocabulary Customization for Efficient Domain-Specific LLM Deployment

arXiv:2509.26124v1 Announce Type: new Abstract: When using an LLM to process text outside the training domain(s), an often overlooked factor is vocabulary mismatch, where the general-domain tokenizer fails to capture frequent domain-specific terms, leading to higher token fertility and thus a decrease in processing speed due to suboptimal sub-word splits. We address this limitation by augmenting the pretrained vocabulary with a set of domain-specific tokens. To this end, we design an algorithm that extends an existing tokenizer while guaranteeing it never decreases tokenization efficiency: every input sequence is segmented into at most the same number of tokens as before. Evaluated on real-world e-Commerce use-cases, the augmented tokenizer significantly shortens input sequences by up to 20% and reduces inference latency on downstream tasks while preserving predictive quality. We further analyze secondary effects, such as the impact on forward pass speed and the rate at which the model adopts the newly introduced tokens, to illustrate the broader benefits of vocabulary adaptation.

Vocabulary Customization for Efficient Domain-Specific LLM Deployment Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Cross-modal RAG: Sub-dimensional Text-to-Image Retrieval-Augmented Generation

arXiv:2505.21956v3 Announce Type: replace-cross Abstract: Text-to-image generation increasingly demands access to domain-specific, fine-grained, and rapidly evolving knowledge that pretrained models cannot fully capture, necessitating the integration of retrieval methods. Existing Retrieval-Augmented Generation (RAG) methods attempt to address this by retrieving globally relevant images, but they fail when no single image contains all desired elements from a complex user query. We propose Cross-modal RAG, a novel framework that decomposes both queries and images into sub-dimensional components, enabling subquery-aware retrieval and generation. Our method introduces a hybrid retrieval strategy – combining a sub-dimensional sparse retriever with a dense retriever – to identify a Pareto-optimal set of images, each contributing complementary aspects of the query. During generation, a multimodal large language model is guided to selectively condition on relevant visual features aligned to specific subqueries, ensuring subquery-aware image synthesis. Extensive experiments on MS-COCO, Flickr30K, WikiArt, CUB, and ImageNet-LT demonstrate that Cross-modal RAG significantly outperforms existing baselines in the retrieval and further contributes to generation quality, while maintaining high efficiency.

Cross-modal RAG: Sub-dimensional Text-to-Image Retrieval-Augmented Generation Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding

arXiv:2509.15235v4 Announce Type: replace-cross Abstract: Speculative decoding is a widely adopted technique for accelerating inference in large language models (LLMs), yet its application to vision-language models (VLMs) remains underexplored, with existing methods achieving only modest speedups (

ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Can Large Language Models Express Uncertainty Like Human?

arXiv:2509.24202v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used in high-stakes settings, where overconfident responses can mislead users. Reliable confidence estimation has been shown to enhance trust and task accuracy. Yet existing methods face practical barriers: logits are often hidden, multi-sampling is computationally expensive, and verbalized numerical uncertainty (e.g., giving a 0-100 score) deviates from natural communication. We revisit linguistic confidence (LC), where models express uncertainty through hedging language (e.g., probably, might), offering a lightweight and human-centered alternative. To advance this direction, we (1) release the first diverse, large-scale dataset of hedging expressions with human-annotated confidence scores, and (2) propose a lightweight mapper that converts hedges into confidence scores at near-zero cost. Building on these resources, we (3) conduct the first systematic study of LC across modern LLMs and QA benchmarks, revealing that while most LLMs underperform in expressing reliable LC, carefully designed prompting achieves competitive calibration and discriminability. Finally, we (4) introduce a fine-tuning framework that further improves LC reliability. Taken together, our work positions linguistic confidence as a scalable, efficient, and human-aligned approach to LLM uncertainty estimation, and calls for deeper exploration of this promising yet underexplored direction.

Can Large Language Models Express Uncertainty Like Human? Leggi l'articolo »

We use cookies to improve your experience and performance on our website. You can learn more at Politica sulla privacy and manage your privacy settings by clicking Settings.

Privacy Preferences

You can choose your cookie settings by turning on/off each type of cookie as you wish, except for essential cookies.

Allow All
Manage Consent Preferences
  • Always Active

Save
it_IT