News
News
Fast, Slow, and Tool-augmented Thinking for LLMs: A Review
arXiv:2508.12265v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated remarkable progress in reasoning...
FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning
arXiv:2505.08054v1 Announce Type: new Abstract: Safety alignment approaches in large language models (LLMs) often lead...
False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize
arXiv:2509.03888v1 Announce Type: new Abstract: Large Language Models (LLMs) can comply with harmful instructions, raising...
Failure by Interference: Language Models Make Balanced Parentheses Errors When Faulty Mechanisms Overshadow Sound Ones
arXiv:2507.00322v1 Announce Type: new Abstract: Despite remarkable advances in coding capabilities, language models (LMs) still...
Exploring the Escalation of Source Bias in User, Data, and Recommender System Feedback Loop
arXiv:2405.17998v2 Announce Type: replace-cross Abstract: Recommender systems are essential for information access, allowing users to...
Exploring Procedural Data Generation for Automatic Acoustic Guitar Fingerpicking Transcription
arXiv:2508.07987v1 Announce Type: cross Abstract: Automatic transcription of acoustic guitar fingerpicking performances remains a challenging...
Exploring LLM Autoscoring Reliability in Large-Scale Writing Assessments Using Generalizability Theory
arXiv:2507.19980v1 Announce Type: new Abstract: This study investigates the estimation of reliability for large language...
Exploration of Plan-Guided Summarization for Narrative Texts: the Case of Small Language Models
arXiv:2504.09071v2 Announce Type: replace Abstract: Plan-guided summarization attempts to reduce hallucinations in small language models...
Exploiting Adaptive Contextual Masking for Aspect-Based Sentiment Analysis
arXiv:2402.13722v2 Announce Type: replace Abstract: Aspect-Based Sentiment Analysis (ABSA) is a fine-grained linguistics problem that...
Explaining Length Bias in LLM-Based Preference Evaluations
arXiv:2407.01085v4 Announce Type: replace-cross Abstract: The use of large language models (LLMs) as judges, particularly...
ExCyTIn-Bench: Evaluating LLM agents on Cyber Threat Investigation
arXiv:2507.14201v2 Announce Type: replace-cross Abstract: We present ExCyTIn-Bench, the first benchmark to Evaluate an LLM...
Everything you need to know about estimating AI’s energy and emissions burden
When we set out to write a story on the best available estimates for AI’s...