新闻
新闻
Under the Influence: Quantifying Persuasion and Vigilance in Large Language Models
arXiv:2602.21262v2 Announce Type: replace Abstract: With increasing integration of Large Language Models (LLMs) into areas...
Uncertainty-Aware Budget Allocation for Adaptive Test-Time Reasoning
arXiv:2605.26849v1 Announce Type: new Abstract: Sampling multiple responses improves language model reasoning, but uniform compute...
Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers
arXiv:2504.19254v4 Announce Type: replace Abstract: Hallucinations are a persistent problem with Large Language Models (LLMs)...
UltraCUA: A Foundation Computer-Use Agents Model that Bridges the Gap between General-Purpose GUI Agents and Specialized API-based Agents
Computer-use agents have been limited to primitives. They click, they type, they scroll. Long action...
TwinVoice: A Multi-dimensional Benchmark Towards Digital Twins via LLM Persona Simulation
arXiv:2510.25536v1 Announce Type: new Abstract: Large Language Models (LLMs) are exhibiting emergent human-like abilities and...
Tutorial: Exploring SHAP-IQ Visualizations
In this tutorial, we’ll explore a range of SHAP-IQ visualizations that provide insights into how...
Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions
arXiv:2501.01872v5 Announce Type: replace Abstract: Large language models, despite extensive alignment with human values and...
TurkBench: A Benchmark for Evaluating Turkish Large Language Models
arXiv:2601.07020v2 Announce Type: replace Abstract: With the recent surge in the development of large language...
TUMS: Enhancing Tool-use Abilities of LLMs with Multi-structure Handlers
arXiv:2505.08402v1 Announce Type: new Abstract: Recently, large language models(LLMs) have played an increasingly important role...
TuCo: Measuring the Contribution of Fine-Tuning to Individual Responses of LLMs
arXiv:2506.23423v1 Announce Type: new Abstract: Past work has studied the effects of fine-tuning on large...
TSEmbed: Unlocking Task Scaling in Universal Multimodal Embeddings
arXiv:2603.04772v1 Announce Type: new Abstract: Despite the exceptional reasoning capabilities of Multimodal Large Language Models...
Trustworthy Data-driven Chronological Age Estimation from Panoramic Dental Images
arXiv:2601.12960v1 Announce Type: new Abstract: Integrating deep learning into healthcare enables personalized care but raises...

