YouZum

Committee

AI, Committee, Actualités, Uncategorized

Neuphonic Open-Sources NeuTTS Air: A 748M-Parameter On-Device Speech Language Model with Instant Voice Cloning

Neuphonic has released NeuTTS Air, an open-source text-to-speech (TTS) speech language model designed to run locally in real time on CPUs. The Hugging Face model card lists 748M parameters (Qwen2 architecture) and ships in GGUF quantizations (Q4/Q8), enabling inference through llama.cpp/llama-cpp-python without cloud dependencies. It is licensed under Apache-2.0 and includes a runnable demo and examples. So, what is new? NeuTTS Air couples a 0.5B-class Qwen backbone with Neuphonic’s NeuCodec audio codec. Neuphonic positions the system as a “super-realistic, on-device” TTS LM that clones a voice from ~3 seconds of reference audio and synthesizes speech in that style, targeting voice agents and privacy-sensitive applications. The model card and repository explicitly emphasize real-time CPU generation and small-footprint deployment. Key Features Realism at sub-1B scale: Human-like prosody and timbre preservation for a ~0.7B (Qwen2-class) text-to-speech LM. On-device deployment: Distributed in GGUF (Q4/Q8) with CPU-first paths; suitable for laptops, phones, and Raspberry Pi-class boards. Instant speaker cloning: Style transfer from ~3 seconds of reference audio (reference WAV + transcript). Compact LM+codec stack: Qwen 0.5B backbone paired with NeuCodec (0.8 kbps / 24 kHz) to balance latency, footprint, and output quality. Explain the model architecture and runtime path? Backbone: Qwen 0.5B used as a lightweight LM to condition speech generation; the hosted artifact is reported as 748M params under the qwen2 architecture on Hugging Face. Codec: NeuCodec provides low-bitrate acoustic tokenization/decoding; it targets 0.8 kbps with 24 kHz output, enabling compact representations for efficient on-device use. Quantization & format: Prebuilt GGUF backbones (Q4/Q8) are available; the repo includes instructions for llama-cpp-python and an optional ONNX decoder path. Dependencies: Uses espeak for phonemization; examples and a Jupyter notebook are provided for end-to-end synthesis. On-device performance focus NeuTTS Air showcases ‘real-time generation on mid-range devices‘ and offers CPU-first defaults; GGUF quantization is intended for laptops and single-board computers. While no fps/RTF numbers are published on the card, the distribution targets local inference without a GPU and demonstrates a working flow through the provided examples and Space. Voice cloning workflow NeuTTS Air requires (1) a reference WAV and (2) the transcript text for that reference. It encodes the reference to style tokens and then synthesizes arbitrary text in the reference speaker’s timbre. The Neuphonic team recommends 3–15 s clean, mono audio and provides pre-encoded samples. Privacy, responsibility, and watermarking Neuphonic frames the model for on-device privacy (no audio/text leaves the machine without user’s approval) and notes that all generated audio includes a Perth (Perceptual Threshold) watermarker to support responsible use and provenance. How it compares? Open, local TTS systems exist (e.g., GGUF-based pipelines), but NeuTTS Air is notable for packaging a small LM + neural codec with instant cloning, CPU-first quantizations, and watermarking under a permissive license. The “world’s first super-realistic, on-device speech LM” phrasing is the vendor’s claim; the verifiable facts are the size, formats, cloning procedure, license, and provided runtimes. Our Comments The focus is on system trade-offs: a ~0.7B Qwen-class backbone with GGUF quantization paired with NeuCodec at 0.8 kbps/24 kHz is a pragmatic recipe for real-time, CPU-only TTS that preserves timbre using ~3–15 s style references while keeping latency and memory predictable. The Apache-2.0 licensing and built-in watermarking are deployment-friendly, but publishing RTF/latency on commodity CPUs and cloning-quality vs. reference-length curves would enable rigorous benchmarking against existing local pipelines. Operationally, an offline path with minimal dependencies (eSpeak, llama.cpp/ONNX) lowers privacy/compliance risk for edge agents without sacrificing intelligibility. Check out the Model Card on Hugging Face and GitHub Page. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well. The post Neuphonic Open-Sources NeuTTS Air: A 748M-Parameter On-Device Speech Language Model with Instant Voice Cloning appeared first on MarkTechPost.

Neuphonic Open-Sources NeuTTS Air: A 748M-Parameter On-Device Speech Language Model with Instant Voice Cloning Lire l’article »

AI, Committee, Actualités, Uncategorized

Thinking Machines Launches Tinker: A Low-Level Training API that Abstracts Distributed LLM Fine-Tuning without Hiding the Knobs

Thinking Machines has released Tinker, a Python API that lets researchers and engineers write training loops locally while the platform executes them on managed distributed GPU clusters. The pitch is narrow and technical: keep full control of data, objectives, and optimization steps; hand off scheduling, fault tolerance, and multi-node orchestration. The service is in private beta with a waitlist and starts free, moving to usage-based pricing “in the coming weeks.” Alright, but tell me what it is? Tinker exposes low-level primitives—not high-level “train()” wrappers. Core calls include forward_backward, optim_step, save_state, and sample, giving users direct control over gradient computation, optimizer stepping, checkpointing, and evaluation/inference inside custom loops. A typical workflow: instantiate a LoRA training client against a base model (e.g., Llama-3.2-1B), iterate forward_backward/optim_step, persist state, then obtain a sampling client to evaluate or export weights. https://thinkingmachines.ai/tinker/ Key Features Open-weights model coverage. Fine-tune families such as Llama and Qwen, including large mixture-of-experts variants (e.g., Qwen3-235B-A22B). LoRA-based post-training. Tinker implements Low-Rank Adaptation (LoRA) rather than full fine-tuning; their technical note (“LoRA Without Regret”) argues LoRA can match full FT for many practical workloads—especially RL—under the right setup. Portable artifacts. Download trained adapter weights for use outside Tinker (e.g., with your preferred inference stack/provider). What runs on it? The Thinking Machines team positions Tinker as a managed post-training platform for open-weights models from small LLMs up to large mixture-of-experts systems, a good example would be Qwen-235B-A22B as a supported model. Switching models is intentionally minimal—change a string identifier and rerun. Under the hood, runs are scheduled on Thinking Machines’ internal clusters; the LoRA approach enables shared compute pools and lower utilization overhead. https://thinkingmachines.ai/tinker/ Tinker Cookbook: Reference Training Loops and Post-Training Recipes To reduce boilerplate while keeping the core API lean, the team published the Tinker Cookbook (Apache-2.0). It contains ready-to-use reference loops for supervised learning and reinforcement learning, plus worked examples for RLHF (three-stage SFT → reward modeling → policy RL), math-reasoning rewards, tool-use / retrieval-augmented tasks, prompt distillation, and multi-agent setups. The repo also ships utilities for LoRA hyperparameter calculation and integrations for evaluation (e.g., InspectAI). Who’s already using it? Early users include groups at Princeton (Gödel prover team), Stanford (Rotskoff Chemistry), UC Berkeley (SkyRL, async off-policy multi-agent/tool-use RL), and Redwood Research (RL on Qwen3-32B for control tasks). Tinker is private beta as of now with waitlist sign-up. The service is free to start, with usage-based pricing planned shortly; organizations are asked to contact the team directly for onboarding. My thoughts/ comments I like that Tinker exposes low-level primitives (forward_backward, optim_step, save_state, sample) instead of a monolithic train()—it keeps objective design, reward shaping, and evaluation in my control while offloading multi-node orchestration to their managed clusters. The LoRA-first posture is pragmatic for cost and turnaround, and their own analysis argues LoRA can match full fine-tuning when configured correctly, but I’d still want transparent logs, deterministic seeds, and per-step telemetry to verify reproducibility and drift. The Cookbook’s RLHF and SL reference loops are useful starting points, yet I’ll judge the platform on throughput stability, checkpoint portability, and guardrails for data governance (PII handling, audit trails) during real workloads. Overall I prefer Tinker’s open, flexible API: it lets me customize open-weight LLMs via explicit training-loop primitives while the service handles distributed execution. Compared with closed systems, this preserves algorithmic control (losses, RLHF workflows, data handling) and lowers the barrier for new practitioners to experiment and iterate. Check out the Technical details and Sign up for our waitlist here. If you’re a university or organization looking for wide scale access, contact tinker@thinkingmachines.ai.  Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well. The post Thinking Machines Launches Tinker: A Low-Level Training API that Abstracts Distributed LLM Fine-Tuning without Hiding the Knobs appeared first on MarkTechPost.

Thinking Machines Launches Tinker: A Low-Level Training API that Abstracts Distributed LLM Fine-Tuning without Hiding the Knobs Lire l’article »

AI, Committee, Actualités, Uncategorized

Phantom: General Backdoor Attacks on Retrieval Augmented Language Generation

arXiv:2405.20485v3 Announce Type: replace-cross Abstract: Retrieval Augmented Generation (RAG) expands the capabilities of modern large language models (LLMs), by anchoring, adapting, and personalizing their responses to the most relevant knowledge sources. It is particularly useful in chatbot applications, allowing developers to customize LLM output without expensive retraining. Despite their significant utility in various applications, RAG systems present new security risks. In this work, we propose a novel attack that allows an adversary to inject a single malicious document into a RAG system’s knowledge base, and mount a backdoor poisoning attack. We design Phantom, a general two-stage optimization framework against RAG systems, that crafts a malicious poisoned document leading to an integrity violation in the model’s output. First, the document is constructed to be retrieved only when a specific naturally occurring trigger sequence of tokens appears in the victim’s queries. Second, the document is further optimized with crafted adversarial text that induces various adversarial objectives on the LLM output, including refusal to answer, reputation damage, privacy violations, and harmful behaviors.We demonstrate our attacks on multiple open-source LLM architectures, including Gemma, Vicuna, and Llama, and show that they transfer to closed-source models such as GPT-3.5 Turbo and GPT-4. Finally, we successfully demonstrate our attack on an end-to-end black-box production RAG system: NVIDIA’s “Chat with RTX”.

Phantom: General Backdoor Attacks on Retrieval Augmented Language Generation Lire l’article »

AI, Committee, Actualités, Uncategorized

Research on the Integration of Embodied Intelligence and Reinforcement Learning in Textual Domains

arXiv:2510.01076v1 Announce Type: new Abstract: This article addresses embodied intelligence and reinforcement learning integration in the field of text processing, aiming to enhance text handling with more intelligence on the basis of embodied intelligence’s perception and action superiority and reinforcement learning’s decision optimization capability. Through detailed theoretical explanation and experimental exploration, a novel integration model is introduced. This model has been demonstrated to be very effective in a wide range oftext processing tasks, validating its applicative potential

Research on the Integration of Embodied Intelligence and Reinforcement Learning in Textual Domains Lire l’article »

AI, Committee, Actualités, Uncategorized

ALARB: An Arabic Legal Argument Reasoning Benchmark

arXiv:2510.00694v1 Announce Type: new Abstract: We introduce ALARB, a dataset and suite of tasks designed to evaluate the reasoning capabilities of large language models (LLMs) within the Arabic legal domain. While existing Arabic benchmarks cover some knowledge-intensive tasks such as retrieval and understanding, substantial datasets focusing specifically on multistep reasoning for Arabic LLMs, especially in open-ended contexts, are lacking. The dataset comprises over 13K commercial court cases from Saudi Arabia, with each case including the facts presented, the reasoning of the court, the verdict, as well as the cited clauses extracted from the regulatory documents. We define a set of challenging tasks leveraging this dataset and reflecting the complexity of real-world legal reasoning, including verdict prediction, completion of reasoning chains in multistep legal arguments, and identification of relevant regulations based on case facts. We benchmark a representative selection of current open and closed Arabic LLMs on these tasks and demonstrate the dataset’s utility for instruction tuning. Notably, we show that instruction-tuning a modest 12B parameter model using ALARB significantly enhances its performance in verdict prediction and Arabic verdict generation, reaching a level comparable to that of GPT-4o.

ALARB: An Arabic Legal Argument Reasoning Benchmark Lire l’article »

AI, Committee, Actualités, Uncategorized

It Takes Two: Your GRPO Is Secretly DPO

arXiv:2510.00977v1 Announce Type: cross Abstract: Group Relative Policy Optimization (GRPO) is a prominent reinforcement learning algorithm for post-training Large Language Models (LLMs). It is commonly believed that GRPO necessitates a large group size to ensure stable training via precise statistical estimation, which incurs substantial computational overhead. In this work, we challenge this assumption by reframing GRPO as a form of contrastive learning, which reveals a fundamental connection to Direct Preference Optimization (DPO). Motivated by DPO’s empirical success, we investigate the minimal two-rollout case (2-GRPO), a configuration previously deemed infeasible. We provide a rigorous theoretical analysis to validate 2-GRPO and demonstrate empirically that it achieves performance on par with 16-GRPO, despite using only 1/8 of the rollouts and reducing training time by over 70%.

It Takes Two: Your GRPO Is Secretly DPO Lire l’article »

AI, Committee, Actualités, Uncategorized

ServiceNow AI Releases Apriel-1.5-15B-Thinker: An Open-Weights Multimodal Reasoning Model that Hits Frontier-Level Performance on a Single-GPU Budget

ServiceNow AI Research Lab has released Apriel-1.5-15B-Thinker, a 15-billion-parameter open-weights multimodal reasoning model trained with a data-centric mid-training recipe—continual pretraining followed by supervised fine-tuning—without reinforcement learning or preference optimization. The model attains an Artificial Analysis Intelligence Index score of 52 with 8x cost savings compared to SOTA. The checkpoint ships under an MIT license on Hugging Face. So, What’s new in it for me? Frontier-level composite score at small scale. The model reports Artificial Analysis Intelligence Index (AAI) = 52, matching DeepSeek-R1-0528 on that combined metric while being dramatically smaller. AAI aggregates 10 third-party evaluations (MMLU-Pro, GPQA Diamond, Humanity’s Last Exam, LiveCodeBench, SciCode, AIME 2025, IFBench, AA-LCR, Terminal-Bench Hard, τ²-Bench Telecom). Single-GPU deployability. The model card states the 15B checkpoint “fits on a single GPU,” targeting on-premises and air-gapped deployments with fixed memory and latency budgets. Open weights and reproducible pipeline. Weights, training recipe, and evaluation protocol are public for independent verification. https://huggingface.co/ServiceNow-AI/Apriel-1.5-15b-Thinker Ok! I got it but what is it’s training mechanism? Base and upscaling. Apriel-1.5-15B-Thinker starts from Mistral’s Pixtral-12B-Base-2409 multimodal decoder-vision stack. The research team applies depth upscaling—increasing decoder layers from 40→48—then projection-network realignment to align the vision encoder with the enlarged decoder. This avoids pretraining from scratch while preserving single-GPU deployability. CPT (Continual Pretraining). Two stages: (1) mixed text+image data to build foundational reasoning and document/diagram understanding; (2) targeted synthetic visual tasks (reconstruction, matching, detection, counting) to sharpen spatial and compositional reasoning. Sequence lengths extend to 32k and 16k tokens respectively, with selective loss placement on response tokens for instruction-formatted samples. SFT (Supervised Fine-Tuning). High-quality, reasoning-trace instruction data for math, coding, science, tool use, and instruction following; two additional SFT runs (stratified subset; longer-context) are weight-merged to form the final checkpoint. No RL (reinforcement learning) or RLAIF (reinforcement learning from AI feedback). Data note. ~25% of the depth-upscaling text mix derives from NVIDIA’s Nemotron collection. O’ Wow! Tell me about it’s results then? Key text benchmarks (pass@1 / accuracy). AIME 2025 (American Invitational Mathematics Examination 2025): 87.5–88% GPQA Diamond (Graduate-Level Google-Proof Question Answering, Diamond split): ≈71% IFBench (Instruction-Following Benchmark): ~62 τ²-Bench (Tau-squared Bench) Telecom: ~68 LiveCodeBench (functional code correctness): ~72.8 Using VLMEvalKit for reproducibility, Apriel scores competitively across MMMU / MMMU-Pro (Massive Multi-discipline Multimodal Understanding), LogicVista, MathVision, MathVista, MathVerse, MMStar, CharXiv, AI2D, BLINK, with stronger results on documents/diagrams and text-dominant math imagery. https://huggingface.co/ServiceNow-AI/Apriel-1.5-15b-Thinker/blob/main/Apriel-1.5-Thinker.pdf Lets Summarize everything Apriel-1.5-15B-Thinker demonstrates that careful mid-training (continual pretraining + supervised fine-tuning, no reinforcement learning) can deliver a 52 on the Artificial Analysis Intelligence Index (AAI) while remaining deployable on a single graphics processing unit. Reported task-level scores (for example, AIME 2025 ≈88, GPQA Diamond ≈71, IFBench ≈62, Tau-squared Bench Telecom ≈68) align with the model card and place the 15-billion-parameter checkpoint in the most cost-efficient band of current open-weights reasoners. For enterprises, that combination—open weights, reproducible recipe, and single-GPU latency—makes Apriel a practical baseline to evaluate before considering larger closed systems. The post ServiceNow AI Releases Apriel-1.5-15B-Thinker: An Open-Weights Multimodal Reasoning Model that Hits Frontier-Level Performance on a Single-GPU Budget appeared first on MarkTechPost.

ServiceNow AI Releases Apriel-1.5-15B-Thinker: An Open-Weights Multimodal Reasoning Model that Hits Frontier-Level Performance on a Single-GPU Budget Lire l’article »

AI, Committee, Actualités, Uncategorized

How to Build an Advanced Agentic Retrieval-Augmented Generation (RAG) System with Dynamic Strategy and Smart Retrieval?

In this tutorial, we walk through the implementation of an Agentic Retrieval-Augmented Generation (RAG) system. We design it so that the agent does more than just retrieve documents; it actively decides when retrieval is needed, selects the best retrieval strategy, and synthesizes responses with contextual awareness. By combining embeddings, FAISS indexing, and a mock LLM, we create a practical demonstration of how agentic decision-making can elevate the standard RAG pipeline into something more adaptive and intelligent. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser import numpy as np import faiss from sentence_transformers import SentenceTransformer import json import re from typing import List, Dict, Any, Optional from dataclasses import dataclass from enum import Enum class MockLLM: def generate(self, prompt: str, max_tokens: int = 150) -> str: prompt_lower = prompt.lower() if “decide whether to retrieve” in prompt_lower: if any(word in prompt_lower for word in [“specific”, “recent”, “data”, “facts”, “when”, “who”, “what”]): return “RETRIEVE: The query requires specific factual information that needs to be retrieved.” else: return “NO_RETRIEVE: This is a general question that can be answered with existing knowledge.” elif “choose retrieval strategy” in prompt_lower: if “comparison” in prompt_lower or “versus” in prompt_lower: return “STRATEGY: multi_query – Need to retrieve information about multiple entities for comparison.” elif “recent” in prompt_lower or “latest” in prompt_lower: return “STRATEGY: temporal – Focus on recent information.” else: return “STRATEGY: semantic – Standard semantic similarity search.” elif “synthesize” in prompt_lower and “context:” in prompt_lower: return “Based on the retrieved information, here’s a comprehensive answer that combines multiple sources and provides specific details with proper context.” return “This is a mock response. In practice, use a real LLM like OpenAI’s GPT or similar.” class RetrievalStrategy(Enum): SEMANTIC = “semantic” MULTI_QUERY = “multi_query” TEMPORAL = “temporal” HYBRID = “hybrid” @dataclass class Document: id: str content: str metadata: Dict[str, Any] embedding: Optional[np.ndarray] = None We set up the foundation of our Agentic RAG system. We define a mock LLM to simulate decision-making, create a retrieval strategy enum, and design a Document dataclass so we can structure and manage our knowledge base efficiently. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser class AgenticRAGSystem: def __init__(self, model_name: str = “all-MiniLM-L6-v2″): self.encoder = SentenceTransformer(model_name) self.llm = MockLLM() self.documents: List[Document] = [] self.index: Optional[faiss.Index] = None def add_documents(self, documents: List[Dict[str, Any]]) -> None: print(f”Processing {len(documents)} documents…”) for i, doc in enumerate(documents): doc_obj = Document( id=doc.get(‘id’, str(i)), content=doc[‘content’], metadata=doc.get(‘metadata’, {}) ) self.documents.append(doc_obj) contents = [doc.content for doc in self.documents] embeddings = self.encoder.encode(contents, show_progress_bar=True) for doc, embedding in zip(self.documents, embeddings): doc.embedding = embedding dimension = embeddings.shape[1] self.index = faiss.IndexFlatIP(dimension) faiss.normalize_L2(embeddings) self.index.add(embeddings.astype(‘float32’)) print(f”Knowledge base built with {len(self.documents)} documents”) We build the core of our Agentic RAG system. We initialize the embedding model, set up the FAISS index, and add documents by encoding their contents into vectors, enabling fast and accurate semantic retrieval from our knowledge base. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser def decide_retrieval(self, query: str) -> bool: decision_prompt = f””” Analyze the following query and decide whether to retrieve information: Query: “{query}” Decide whether to retrieve information from the knowledge base. Consider if this needs specific facts, recent data, or can be answered generally. Respond with either: RETRIEVE: [reason] or NO_RETRIEVE: [reason] “”” response = self.llm.generate(decision_prompt) should_retrieve = response.startswith(“RETRIEVE:”) print(f” Agent Decision: {‘Retrieve’ if should_retrieve else ‘Direct Answer’}”) print(f” Reasoning: {response.split(‘:’, 1)[1].strip() if ‘:’ in response else response}”) return should_retrieve def choose_strategy(self, query: str) -> RetrievalStrategy: strategy_prompt = f””” Choose the best retrieval strategy for this query: Query: “{query}” Available strategies: – semantic: Standard similarity search – multi_query: Multiple related queries (for comparisons) – temporal: Focus on recent information – hybrid: Combination approach Choose retrieval strategy and explain why. Respond with: STRATEGY: [strategy_name] – [reasoning] “”” response = self.llm.generate(strategy_prompt) if “multi_query” in response.lower(): strategy = RetrievalStrategy.MULTI_QUERY elif “temporal” in response.lower(): strategy = RetrievalStrategy.TEMPORAL elif “hybrid” in response.lower(): strategy = RetrievalStrategy.HYBRID else: strategy = RetrievalStrategy.SEMANTIC print(f” Retrieval Strategy: {strategy.value}”) print(f” Reasoning: {response.split(‘-‘, 1)[1].strip() if ‘-‘ in response else response}”) return strategy We give our agent the ability to think before it fetches. We first determine if a query truly requires retrieval, then we select the most suitable strategy: semantic, multi-query, temporal, or hybrid. This allows us to target the correct context with clear, printed reasoning for each step. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser def retrieve_documents(self, query: str, strategy: RetrievalStrategy, k: int = 3) -> List[Document]: if not self.index: print(” No knowledge base available”) return [] if strategy == RetrievalStrategy.MULTI_QUERY: queries = [query, f”advantages of {query}”, f”disadvantages of {query}”] all_docs = [] for q in queries: docs = self._semantic_search(q, k=2) all_docs.extend(docs) seen_ids = set() unique_docs = [] for doc in all_docs: if doc.id not in seen_ids: unique_docs.append(doc) seen_ids.add(doc.id) return unique_docs[:k] elif strategy == RetrievalStrategy.TEMPORAL: docs = self._semantic_search(query, k=k*2) docs_with_dates = [(doc, doc.metadata.get(‘date’, ‘1900-01-01’)) for doc in docs] docs_with_dates.sort(key=lambda x: x[1], reverse=True) return [doc for doc, _ in docs_with_dates[:k]] else: return self._semantic_search(query, k=k) def _semantic_search(self, query: str, k: int) -> List[Document]: query_embedding = self.encoder.encode([query]) faiss.normalize_L2(query_embedding) scores, indices = self.index.search(query_embedding.astype(‘float32’), k) results = [] for score, idx in zip(scores[0], indices[0]): if idx < len(self.documents): results.append(self.documents[idx]) return results def synthesize_response(self, query: str, retrieved_docs: List[Document]) -> str: if not retrieved_docs: return self.llm.generate(f”Answer this query: {query}”) context = “nn”.join([f”Document {i+1}: {doc.content}” for i, doc in enumerate(retrieved_docs)]) synthesis_prompt = f””” Query: {query} Context: {context} Synthesize a comprehensive answer using the provided context. Be specific and reference the information sources when relevant. “”” return self.llm.generate(synthesis_prompt, max_tokens=200) We implement how we actually fetch and use knowledge. We perform semantic search, branch into multi-query or temporal re-ranking when needed, deduplicate results, and then synthesize a focused answer from the retrieved context. In doing so, we maintain efficient, transparent, and tightly aligned retrieval. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser def query(self, query: str) -> str: print(f”n Processing Query: ‘{query}'”) print(“=” * 50) if not self.decide_retrieval(query): print(“n Generating direct response…”) return self.llm.generate(f”Answer this query: {query}”) strategy = self.choose_strategy(query) print(f”n Retrieving documents using {strategy.value} strategy…”) retrieved_docs = self.retrieve_documents(query, strategy) print(f” Retrieved {len(retrieved_docs)} documents”)

How to Build an Advanced Agentic Retrieval-Augmented Generation (RAG) System with Dynamic Strategy and Smart Retrieval? Lire l’article »

AI, Committee, Actualités, Uncategorized

End-to-End Aspect-Guided Review Summarization at Scale

arXiv:2509.26103v1 Announce Type: new Abstract: We present a scalable large language model (LLM)-based system that combines aspect-based sentiment analysis (ABSA) with guided summarization to generate concise and interpretable product review summaries for the Wayfair platform. Our approach first extracts and consolidates aspect-sentiment pairs from individual reviews, selects the most frequent aspects for each product, and samples representative reviews accordingly. These are used to construct structured prompts that guide the LLM to produce summaries grounded in actual customer feedback. We demonstrate the real-world effectiveness of our system through a large-scale online A/B test. Furthermore, we describe our real-time deployment strategy and release a dataset of 11.8 million anonymized customer reviews covering 92,000 products, including extracted aspects and generated summaries, to support future research in aspect-guided review summarization.

End-to-End Aspect-Guided Review Summarization at Scale Lire l’article »

AI, Committee, Actualités, Uncategorized

Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm

arXiv:2509.23946v2 Announce Type: replace-cross Abstract: Chain-of-Thought (CoT) and its variants have markedly advanced the reasoning abilities of Large Language Models (LLMs), yet their monolithic and auto-regressive architecture inherently conflates high-level strategic planning with low-level step-by-step execution, leading to computational inefficiency, limited exploration of reasoning paths, and reduced interpretability. To overcome these issues, we propose the Explore-Execute Chain ($E^2C$), a structured reasoning framework that decouples reasoning into two distinct phases: an exploratory phase that stochastically generates succinct high-level plans, followed by an execution phase that deterministically carries out the chosen plan. Our approach incorporates a two-stage training methodology, which combines Supervised Fine-Tuning (SFT) – augmented by a novel data generation algorithm enforcing strict plan adherence – with a subsequent Reinforcement Learning (RL) stage that capitalizes on the informativeness of exploration and reinforces the determinism of execution. This decomposition enables an efficient test-time scaling strategy: on AIME’2024, $E^2C$ Test Time Scaling reaches 58.1% accuracy using

Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm Lire l’article »

We use cookies to improve your experience and performance on our website. You can learn more at Politique de confidentialité and manage your privacy settings by clicking Settings.

Privacy Preferences

You can choose your cookie settings by turning on/off each type of cookie as you wish, except for essential cookies.

Allow All
Manage Consent Preferences
  • Always Active

Save
fr_FR