Notizie
Notizie
Meet mmBERT: An Encoder-only Language Model Pretrained on 3T Tokens of Multilingual Text in over 1800 Languages and 2–4× Faster than Previous Models
Table of contents Why was a new multilingual encoder needed? Understanding the architecture of mmBERT...
Meet Elysia: A New Open-Source Python Framework Redefining Agentic RAG Systems with Decision Trees and Smarter Data Handling
If you’ve ever tried to build a agentic RAG system that actually works well, you...
Meet dots.ocr: A New 1.7B Vision-Language Model that Achieves SOTA Performance on Multilingual Document Parsing
dots.ocr is an open-source vision-language transformer model developed for multilingual document layout parsing and optical...
Meet Cathy Tie, Bride of “China’s Frankenstein”
Since the Chinese biophysicist He Jiankui was released from prison in 2022, he has sought...
Meet BioReason: The World’s First Reasoning Model in Biology that Enables AI to Reason about Genomics like a Biology Expert
A major hurdle in using AI for genomics is the lack of interpretable, step-by-step reasoning...
MedPAIR: Measuring Physicians and AI Relevance Alignment in Medical Question Answering
arXiv:2505.24040v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated remarkable performance on various...
Measuring Reasoning Utility in LLMs via Conditional Entropy Reduction
arXiv:2508.20395v1 Announce Type: new Abstract: Recent advancements in large language models (LLMs) often rely on...
MDAR: A Multi-scene Dynamic Audio Reasoning Benchmark
arXiv:2509.22461v1 Announce Type: cross Abstract: The ability to reason from audio, including speech, paralinguistic cues...
MCP-Universe benchmark shows GPT-5 fails more than half of real-world orchestration tasks
A new benchmark from Salesforce research evaluates model and agentic performance on real-life enterprise tasks.Read...
MCP and the innovation paradox: Why open standards will save AI from itself
Much like HTTP and REST standardized how web applications connect to services, MCP standardizes how...
McBE: A Multi-task Chinese Bias Evaluation Benchmark for Large Language Models
arXiv:2507.02088v2 Announce Type: replace Abstract: As large language models (LLMs) are increasingly applied to various...
Max It or Miss It: Benchmarking LLM On Solving Extremal Problems
arXiv:2510.12997v2 Announce Type: replace-cross Abstract: Test-time scaling has enabled Large Language Models (LLMs) with remarkable...






