YouZum

Committee

AI, Committee, News, Uncategorized

MoE Architecture Comparison: Qwen3 30B-A3B vs. GPT-OSS 20B

This article provides a technical comparison between two recently released Mixture-of-Experts (MoE) transformer models: Alibaba’s Qwen3 30B-A3B (released April 2025) and OpenAI’s GPT-OSS 20B (released August 2025). Both models represent distinct approaches to MoE architecture design, balancing computational efficiency with performance across different deployment scenarios. Model Overview Feature Qwen3 30B-A3B GPT-OSS 20B Total Parameters 30.5B 21B Active Parameters 3.3B 3.6B Number of Layers 48 24 MoE Experts 128 (8 active) 32 (4 active) Attention Architecture Grouped Query Attention Grouped Multi-Query Attention Query/Key-Value Heads 32Q / 4KV 64Q / 8KV Context Window 32,768 (ext. 262,144) 128,000 Vocabulary Size 151,936 o200k_harmony (~200k) Quantization Standard precision Native MXFP4 Release Date April 2025 August 2025 Sources: Qwen3 Official Documentation, OpenAI GPT-OSS Documentation Qwen3 30B-A3B Technical Specifications Architecture Details Qwen3 30B-A3B employs a deep transformer architecture with 48 layers, each containing a Mixture-of-Experts configuration with 128 experts per layer. The model activates 8 experts per token during inference, achieving a balance between specialization and computational efficiency. Attention Mechanism The model utilizes Grouped Query Attention (GQA) with 32 query heads and 4 key-value heads³. This design optimizes memory usage while maintaining attention quality, particularly beneficial for long-context processing. Context and Multilingual Support Native context length: 32,768 tokens Extended context: Up to 262,144 tokens (latest variants) Multilingual support: 119 languages and dialects Vocabulary: 151,936 tokens using BPE tokenization Unique Features Qwen3 incorporates a hybrid reasoning system supporting both “thinking” and “non-thinking” modes, allowing users to control computational overhead based on task complexity. GPT-OSS 20B Technical Specifications Architecture Details GPT-OSS 20B features a 24-layer transformer with 32 MoE experts per layer⁸. The model activates 4 experts per token, emphasizing wider expert capacity over fine-grained specialization. Attention Mechanism The model implements Grouped Multi-Query Attention with 64 query heads and 8 key-value heads arranged in groups of 8¹⁰. This configuration supports efficient inference while maintaining attention quality across the wider architecture. Context and Optimization Native context length: 128,000 tokens Quantization: Native MXFP4 (4.25-bit precision) for MoE weights Memory efficiency: Runs on 16GB memory with quantization Tokenizer: o200k_harmony (superset of GPT-4o tokenizer) Performance Characteristics GPT-OSS 20B uses alternating dense and locally banded sparse attention patterns similar to GPT-3, with Rotary Positional Embedding (RoPE) for positional encoding¹⁵. Architectural Philosophy Comparison Depth vs. Width Strategy Qwen3 30B-A3B emphasizes depth and expert diversity: 48 layers enable multi-stage reasoning and hierarchical abstraction 128 experts per layer provide fine-grained specialization Suitable for complex reasoning tasks requiring deep processing GPT-OSS 20B prioritizes width and computational density: 24 layers with larger experts maximize per-layer representational capacity Fewer but more powerful experts (32 vs 128) increase individual expert capability Optimized for efficient single-pass inference MoE Routing Strategies Qwen3: Routes tokens through 8 of 128 experts, encouraging diverse, context-sensitive processing paths and modular decision-making. GPT-OSS: Routes tokens through 4 of 32 experts, maximizing per-expert computational power and delivering concentrated processing per inference step. Memory and Deployment Considerations Qwen3 30B-A3B Memory requirements: Variable based on precision and context length Deployment: Optimized for cloud and edge deployment with flexible context extension Quantization: Supports various quantization schemes post-training GPT-OSS 20B Memory requirements: 16GB with native MXFP4 quantization, ~48GB in bfloat16 Deployment: Designed for consumer hardware compatibility Quantization: Native MXFP4 training enables efficient inference without quality degradation Performance Characteristics Qwen3 30B-A3B Excels in mathematical reasoning, coding, and complex logical tasks Strong performance in multilingual scenarios across 119 languages Thinking mode provides enhanced reasoning capabilities for complex problems GPT-OSS 20B Achieves performance comparable to OpenAI o3-mini on standard benchmarks Optimized for tool use, web browsing, and function calling Strong chain-of-thought reasoning with adjustable reasoning effort levels Use Case Recommendations Choose Qwen3 30B-A3B for: Complex reasoning tasks requiring multi-stage processing Multilingual applications across diverse languages Scenarios requiring flexible context length extension Applications where thinking/reasoning transparency is valued Choose GPT-OSS 20B for: Resource-constrained deployments requiring efficiency Tool-calling and agentic applications Rapid inference with consistent performance Edge deployment scenarios with limited memory Conclusion Qwen3 30B-A3B and GPT-OSS 20B represent complementary approaches to MoE architecture design. Qwen3 emphasizes depth, expert diversity, and multilingual capability, making it suitable for complex reasoning applications. GPT-OSS 20B prioritizes efficiency, tool integration, and deployment flexibility, positioning it for practical production environments with resource constraints. Both models demonstrate the evolution of MoE architectures beyond simple parameter scaling, incorporating sophisticated design choices that align architectural decisions with intended use cases and deployment scenarios. Note: This article is inspired from the reddit post and diagram shared by Sebastian Raschka. Sources Qwen3 30B-A3B Model Card – Hugging Face Qwen3 Technical Blog Qwen3 30B-A3B Base Specifications Qwen3 30B-A3B Instruct 2507 Qwen3 Official Documentation Qwen Tokenizer Documentation Qwen3 Model Features OpenAI GPT-OSS Introduction GPT-OSS GitHub Repository GPT-OSS 20B – Groq Documentation OpenAI GPT-OSS Technical Details Hugging Face GPT-OSS Blog OpenAI GPT-OSS 20B Model Card OpenAI GPT-OSS Introduction NVIDIA GPT-OSS Technical Blog Hugging Face GPT-OSS Blog Qwen3 Performance Analysis OpenAI GPT-OSS Model Card GPT-OSS 20B Capabilities The post MoE Architecture Comparison: Qwen3 30B-A3B vs. GPT-OSS 20B appeared first on MarkTechPost.

MoE Architecture Comparison: Qwen3 30B-A3B vs. GPT-OSS 20B Read Post »

AI, Committee, News, Uncategorized

Evaluating LLMs on Real-World Forecasting Against Expert Forecasters

arXiv:2507.04562v3 Announce Type: replace-cross Abstract: Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks, but their ability to forecast future events remains understudied. A year ago, large language models struggle to come close to the accuracy of a human crowd. I evaluate state-of-the-art LLMs on 464 forecasting questions from Metaculus, comparing their performance against top forecasters. Frontier models achieve Brier scores that ostensibly surpass the human crowd but still significantly underperform a group of experts.

Evaluating LLMs on Real-World Forecasting Against Expert Forecasters Read Post »

AI, Committee, News, Uncategorized

PyLate: Flexible Training and Retrieval for Late Interaction Models

arXiv:2508.03555v1 Announce Type: cross Abstract: Neural ranking has become a cornerstone of modern information retrieval. While single vector search remains the dominant paradigm, it suffers from the shortcoming of compressing all the information into a single vector. This compression leads to notable performance degradation in out-of-domain, long-context, and reasoning-intensive retrieval tasks. Multi-vector approaches pioneered by ColBERT aim to address these limitations by preserving individual token embeddings and computing similarity via the MaxSim operator. This architecture has demonstrated superior empirical advantages, including enhanced out-of-domain generalization, long-context handling, and performance in complex retrieval scenarios. Despite these compelling empirical results and clear theoretical advantages, the practical adoption and public availability of late interaction models remain low compared to their single-vector counterparts, primarily due to a lack of accessible and modular tools for training and experimenting with such models. To bridge this gap, we introduce PyLate, a streamlined library built on top of Sentence Transformers to support multi-vector architectures natively, inheriting its efficient training, advanced logging, and automated model card generation while requiring minimal code changes to code templates users are already familiar with. By offering multi-vector-specific features such as efficient indexes, PyLate aims to accelerate research and real-world application of late interaction models, thereby unlocking their full potential in modern IR systems. Finally, PyLate has already enabled the development of state-of-the-art models, including GTE-ModernColBERT and Reason-ModernColBERT, demonstrating its practical utility for both research and production environments.

PyLate: Flexible Training and Retrieval for Late Interaction Models Read Post »

AI, Committee, News, Uncategorized

Pre-trained Transformer-Based Approach for Arabic Question Answering : A Comparative Study

arXiv:2111.05671v2 Announce Type: replace Abstract: Question answering(QA) is one of the most challenging yet widely investigated problems in Natural Language Processing (NLP). Question-answering (QA) systems try to produce answers for given questions. These answers can be generated from unstructured or structured text. Hence, QA is considered an important research area that can be used in evaluating text understanding systems. A large volume of QA studies was devoted to the English language, investigating the most advanced techniques and achieving state-of-the-art results. However, research efforts in the Arabic question-answering progress at a considerably slower pace due to the scarcity of research efforts in Arabic QA and the lack of large benchmark datasets. Recently many pre-trained language models provided high performance in many Arabic NLP problems. In this work, we evaluate the state-of-the-art pre-trained transformers models for Arabic QA using four reading comprehension datasets which are Arabic-SQuAD, ARCD, AQAD, and TyDiQA-GoldP datasets. We fine-tuned and compared the performance of the AraBERTv2-base model, AraBERTv0.2-large model, and AraELECTRA model. In the last, we provide an analysis to understand and interpret the low-performance results obtained by some models.

Pre-trained Transformer-Based Approach for Arabic Question Answering : A Comparative Study Read Post »

AI, Committee, News, Uncategorized

CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward

arXiv:2508.03686v1 Announce Type: new Abstract: Answer verification is crucial not only for evaluating large language models (LLMs) by matching their unstructured outputs against standard answers, but also serves as the reward model to guide LLM optimization. Most evaluation frameworks rely on regularized matching or employ general LLMs for answer verification, which demands extensive, repetitive customization for regex rules or evaluation prompts. Two fundamental limitations persist in current methodologies: 1) the absence of comprehensive benchmarks that systematically evaluate verification capabilities across different LLMs; and 2) the nascent stage of verifier development, where existing approaches lack both the robustness to handle complex edge cases and the generalizability across different domains. In this work, we develop CompassVerifier, an accurate and robust lightweight verifier model for evaluation and outcome reward. It demonstrates multi-domain competency spanning math, knowledge, and diverse reasoning tasks, with the capability to process various answer types, including multi-subproblems, formulas, and sequence answers, while effectively identifying abnormal/invalid responses. We introduce VerifierBench benchmark comprising model outputs collected from multiple data sources, augmented through manual analysis of metaerror patterns to enhance CompassVerifier. We anticipate that CompassVerifier and VerifierBench will facilitate answer verification, evaluation protocols, and reinforcement learning research. Code and dataset are available at https://github.com/open-compass/CompassVerifier.

CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward Read Post »

AI, Committee, News, Uncategorized

Cross-lingual Opinions and Emotions Mining in Comparable Documents

arXiv:2508.03112v1 Announce Type: new Abstract: Comparable texts are topic-aligned documents in multiple languages that are not direct translations. They are valuable for understanding how a topic is discussed across languages. This research studies differences in sentiments and emotions across English-Arabic comparable documents. First, texts are annotated with sentiment and emotion labels. We apply a cross-lingual method to label documents with opinion classes (subjective/objective), avoiding reliance on machine translation. To annotate with emotions (anger, disgust, fear, joy, sadness, surprise), we manually translate the English WordNet-Affect (WNA) lexicon into Arabic, creating bilingual emotion lexicons used to label the comparable corpora. We then apply a statistical measure to assess the agreement of sentiments and emotions in each source-target document pair. This comparison is especially relevant when the documents originate from different sources. To our knowledge, this aspect has not been explored in prior literature. Our study includes English-Arabic document pairs from Euronews, BBC, and Al-Jazeera (JSC). Results show that sentiment and emotion annotations align when articles come from the same news agency and diverge when they come from different ones. The proposed method is language-independent and generalizable to other language pairs.

Cross-lingual Opinions and Emotions Mining in Comparable Documents Read Post »

AI, Committee, News, Uncategorized

Google AI Releases LangExtract: An Open Source Python Library that Extracts Structured Data from Unstructured Text Documents

In today’s data-driven world, valuable insights are often buried in unstructured text—be it clinical notes, lengthy legal contracts, or customer feedback threads. Extracting meaningful, traceable information from these documents is both a technical and practical challenge. Google AI’s new open-source Python library, LangExtract, is designed to address this gap directly, using LLMs like Gemini to deliver powerful, automated extraction with traceability and transparency at its core. Key Innovations of LangExtract 1. Declarative and Traceable Extraction LangExtract lets users define custom extraction tasks using natural language instructions and high-quality “few-shot” examples. This empowers developers and analysts to specify exactly which entities, relationships, or facts to extract, and in what structure. Crucially, every extracted piece of information is tied directly back to its source text—enabling validation, auditing, and end-to-end traceability. 2. Domain Versatility The library works not just in tech demos but in critical real-world domains—including health (clinical notes, medical reports), finance (summaries, risk documents), law (contracts), research literature, and even the arts (analyzing Shakespeare). Original use cases include automatic extraction of medications, dosages, and administration details from clinical documents, as well as relationships and emotions from plays or literature. 3. Schema Enforcement with LLMs Powered by Gemini and compatible with other LLMs, LangExtract enables enforcement of custom output schemas (like JSON), so results aren’t just accurate—they’re immediately usable in downstream databases, analytics, or AI pipelines. It solves traditional LLM weaknesses around hallucination and schema drift by grounding outputs to both user instructions and actual source text. 4. Scalability and Visualization Handles Large Volumes: LangExtract efficiently processes long documents by chunking, parallelizing, and aggregating results. Interactive Visualization: Developers can generate interactive HTML reports, viewing each extracted entity with context by highlighting its location in the original document—making auditing and error analysis seamless. Smooth Integration: Works in Google Colab, Jupyter, or as standalone HTML files, supporting a rapid feedback loop for developers and researchers. 5. Installation and Usage Install easily with pip: Copy CodeCopiedUse a different Browser pip install langextract Example Workflow (Extracting Character Info from Shakespeare): Copy CodeCopiedUse a different Browser import langextract as lx import textwrap # 1. Define your prompt prompt = textwrap.dedent(“”” Extract characters, emotions, and relationships in order of appearance. Use exact text for extractions. Do not paraphrase or overlap entities. Provide meaningful attributes for each entity to add context. “””) # 2. Give a high-quality example examples = [ lx.data.ExampleData( text=”ROMEO. But soft! What light through yonder window breaks? It is the east, and Juliet is the sun.”, extractions=[ lx.data.Extraction(extraction_class=”character”, extraction_text=”ROMEO”, attributes={“emotional_state”: “wonder”}), lx.data.Extraction(extraction_class=”emotion”, extraction_text=”But soft!”, attributes={“feeling”: “gentle awe”}), lx.data.Extraction(extraction_class=”relationship”, extraction_text=”Juliet is the sun”, attributes={“type”: “metaphor”}), ], ) ] # 3. Extract from new text input_text = “Lady Juliet gazed longingly at the stars, her heart aching for Romeo” result = lx.extract( text_or_documents=input_text, prompt_description=prompt, examples=examples, model_id=”gemini-2.5-pro” ) # 4. Save and visualize results lx.io.save_annotated_documents([result], output_name=”extraction_results.jsonl”) html_content = lx.visualize(“extraction_results.jsonl”) with open(“visualization.html”, “w”) as f: f.write(html_content) This results in structured, source-anchored JSON outputs, plus an interactive HTML visualization for easy review and demonstration. Specialized & Real-World Applications Medicine: Extracts medications, dosages, timing, and links them back to source sentences. Powered by insights from research conducted on accelerating medical information extraction, LangExtract’s approach is directly applicable to structuring clinical and radiology reports—improving clarity and supporting interoperability. Finance & Law: Automatically pulls relevant clauses, terms, or risks from dense legal or financial text, ensuring every output can be traced back to its context. Research & Data Mining: Streamlines high-throughput extraction from thousands of scientific papers. The team even provides a demonstration called RadExtract for structuring radiology reports—highlighting not just what was extracted, but exactly where the information appeared in the original input. How LangExtract Compares Feature Traditional Approaches LangExtract Approach Schema Consistency Often manual/error-prone Enforced via instructions & few-shot examples Result Traceability Minimal All output linked to input text Scaling to Long Texts Windowed, lossy Chunked + parallel extraction, then aggregation Visualization Custom, usually absent Built-in, interactive HTML reports Deployment Rigid, model-specific Gemini-first, open to other LLMs & on-premises In Summary LangExtract presents a new era for extracting structured, actionable data from text—delivering: Declarative, explainable extraction Traceable results backed by source context Instant visualization for rapid iteration Easy integration into any Python workflow Check out the GitHub Page and Technical Blog. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. The post Google AI Releases LangExtract: An Open Source Python Library that Extracts Structured Data from Unstructured Text Documents appeared first on MarkTechPost.

Google AI Releases LangExtract: An Open Source Python Library that Extracts Structured Data from Unstructured Text Documents Read Post »

AI, Committee, News, Uncategorized

CUPID: Evaluating Personalized and Contextualized Alignment of LLMs from Interactions

arXiv:2508.01674v1 Announce Type: new Abstract: Personalization of Large Language Models (LLMs) often assumes users hold static preferences that reflect globally in all tasks. In reality, humans hold dynamic preferences that change depending on the context. As users interact with an LLM in various contexts, they naturally reveal their contextual preferences, which a model must infer and apply in future contexts to ensure alignment. To assess this, we introduce CUPID, a benchmark of 756 human-curated interaction session histories between users and LLM-based chat assistants. In each interaction session, the user provides a request in a specific context and expresses their preference through multi-turn feedback. Given a new user request and prior interaction sessions, our benchmark assesses whether LLMs can infer the preference relevant to this request and generate a response that satisfies this preference. With CUPID, we evaluated 10 open and proprietary LLMs, revealing that state-of-the-art LLMs struggle to infer preferences from multi-turn interactions and fail to discern what previous context is relevant to a new request — under 50% precision and 65% recall. Our work highlights the need to advance LLM capabilities for more contextually personalized interactions and proposes CUPID as a resource to drive these improvements.

CUPID: Evaluating Personalized and Contextualized Alignment of LLMs from Interactions Read Post »

AI, Committee, News, Uncategorized

FinCoT: Grounding Chain-of-Thought in Expert Financial Reasoning

arXiv:2506.16123v2 Announce Type: replace Abstract: This paper presents FinCoT, a structured chain-of-thought (CoT) prompting framework that embeds domain-specific expert financial reasoning blueprints to guide large language models’ behaviors. We identify three main prompting styles in financial NLP (FinNLP): (1) standard prompting (zero-shot), (2) unstructured CoT (free-form reasoning), and (3) structured CoT (with explicitly structured reasoning steps). Prior work has mainly focused on the first two, while structured CoT remains underexplored and lacks domain expertise incorporation. Therefore, we evaluate all three prompting approaches across ten CFA-style financial domains and introduce FinCoT as the first structured finance-specific prompting approach incorporating blueprints from domain experts. FinCoT improves the accuracy of a general-purpose model, Qwen3-8B-Base, from 63.2% to 80.5%, and boosts Fin-R1 (7B), a finance-specific model, from 65.7% to 75.7%, while reducing output length by up to 8.9x and 1.16x compared to structured CoT methods, respectively. We find that FinCoT proves most effective for models lacking financial post-training. Our findings show that FinCoT does not only improve performance and reduce inference costs but also yields more interpretable and expert-aligned reasoning traces.

FinCoT: Grounding Chain-of-Thought in Expert Financial Reasoning Read Post »

AI, Committee, News, Uncategorized

One Trigger Token Is Enough: A Defense Strategy for Balancing Safety and Usability in Large Language Models

arXiv:2505.07167v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) have been extensively used across diverse domains, including virtual assistants, automated code generation, and scientific research. However, they remain vulnerable to jailbreak attacks, which manipulate the models into generating harmful responses despite safety alignment. Recent studies have shown that current safety-aligned LLMs often undergo the shallow safety alignment, where the first few tokens largely determine whether the response will be harmful. Through comprehensive observations, we find that safety-aligned LLMs and various defense strategies generate highly similar initial tokens in their refusal responses, which we define as safety trigger tokens. Building on this insight, we propose texttt{D-STT}, a simple yet effective defense algorithm that identifies and explicitly decodes safety trigger tokens of the given safety-aligned LLM to trigger the model’s learned safety patterns. In this process, the safety trigger is constrained to a single token, which effectively preserves model usability by introducing minimum intervention in the decoding process. Extensive experiments across diverse jailbreak attacks and benign prompts demonstrate that ours significantly reduces output harmfulness while preserving model usability and incurring negligible response time overhead, outperforming ten baseline methods.

One Trigger Token Is Enough: A Defense Strategy for Balancing Safety and Usability in Large Language Models Read Post »

We use cookies to improve your experience and performance on our website. You can learn more at Privacy Policy and manage your privacy settings by clicking Settings.

Privacy Preferences

You can choose your cookie settings by turning on/off each type of cookie as you wish, except for essential cookies.

Allow All
Manage Consent Preferences
  • Always Active

Save
en_US