News
News
MCP and the innovation paradox: Why open standards will save AI from itself
Much like HTTP and REST standardized how web applications connect to services, MCP standardizes how...
McBE: A Multi-task Chinese Bias Evaluation Benchmark for Large Language Models
arXiv:2507.02088v2 Announce Type: replace Abstract: As large language models (LLMs) are increasingly applied to various...
Max It or Miss It: Benchmarking LLM On Solving Extremal Problems
arXiv:2510.12997v2 Announce Type: replace-cross Abstract: Test-time scaling has enabled Large Language Models (LLMs) with remarkable...
Matter-of-Fact: A Benchmark for Verifying the Feasibility of Literature-Supported Claims in Materials Science
arXiv:2506.04410v1 Announce Type: cross Abstract: Contemporary approaches to assisted scientific discovery use language models to...
Master Generative AI in 2025 | Live Online Training
Continue reading on Medium »...
Masked Gated Linear Unit
arXiv:2506.23225v1 Announce Type: cross Abstract: Gated Linear Units (GLUs) have become essential components in the...
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
arXiv:2509.16197v1 Announce Type: cross Abstract: Unified multimodal Large Language Models (LLMs) that can both understand...
Manus has kick-started an AI agent boom in China
Last year, China saw a boom in foundation models, the do-everything large language models that...
MAM: Modular Multi-Agent Framework for Multi-Modal Medical Diagnosis via Role-Specialized Collaboration
arXiv:2506.19835v1 Announce Type: new Abstract: Recent advancements in medical Large Language Models (LLMs) have showcased...
MALLM: Multi-Agent Large Language Models Framework
arXiv:2509.11656v1 Announce Type: cross Abstract: Multi-agent debate (MAD) has demonstrated the ability to augment collective...
MacroBench: A Novel Testbed for Web Automation Scripts via Large Language Models
arXiv:2510.04363v2 Announce Type: replace-cross Abstract: We introduce MacroBench, a code-first benchmark that evaluates whether LLMs...
M$^3$FinMeeting: A Multilingual, Multi-Sector, and Multi-Task Financial Meeting Understanding Evaluation Dataset
arXiv:2506.02510v1 Announce Type: new Abstract: Recent breakthroughs in large language models (LLMs) have led to...
