ニュース
ニュース
MemAgent: A Reinforcement Learning Framework Redefining Long-Context Processing in LLMs
Handling extremely long documents remains a persistent challenge for large language models (LLMs). Even with...
Meet Trackio: The Free, Local-First, Open-Source Experiment Tracker Python Library that Simplifies and Enhances Machine Learning Workflows
Experiment tracking is an essential part of modern machine learning workflows. Whether you’re tweaking hyperparameters...
Meet NovelSeek: A Unified Multi-Agent Framework for Autonomous Scientific Research from Hypothesis Generation to Experimental Validation
Scientific research across fields like chemistry, biology, and artificial intelligence has long relied on human...
Meet Elysia: A New Open-Source Python Framework Redefining Agentic RAG Systems with Decision Trees and Smarter Data Handling
If you’ve ever tried to build a agentic RAG system that actually works well, you...
Meet dots.ocr: A New 1.7B Vision-Language Model that Achieves SOTA Performance on Multilingual Document Parsing
dots.ocr is an open-source vision-language transformer model developed for multilingual document layout parsing and optical...
Meet Cathy Tie, Bride of “China’s Frankenstein”
Since the Chinese biophysicist He Jiankui was released from prison in 2022, he has sought...
Meet BioReason: The World’s First Reasoning Model in Biology that Enables AI to Reason about Genomics like a Biology Expert
A major hurdle in using AI for genomics is the lack of interpretable, step-by-step reasoning...
MedPAIR: Measuring Physicians and AI Relevance Alignment in Medical Question Answering
arXiv:2505.24040v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated remarkable performance on various...
Measuring Reasoning Utility in LLMs via Conditional Entropy Reduction
arXiv:2508.20395v1 Announce Type: new Abstract: Recent advancements in large language models (LLMs) often rely on...
MCP-Universe benchmark shows GPT-5 fails more than half of real-world orchestration tasks
A new benchmark from Salesforce research evaluates model and agentic performance on real-life enterprise tasks.Read...
MCP and the innovation paradox: Why open standards will save AI from itself
Much like HTTP and REST standardized how web applications connect to services, MCP standardizes how...
McBE: A Multi-task Chinese Bias Evaluation Benchmark for Large Language Models
arXiv:2507.02088v2 Announce Type: replace Abstract: As large language models (LLMs) are increasingly applied to various...