Actualités
Actualités
Evaluating Evidence Grounding Under User Pressure in Instruction-Tuned Language Models
arXiv:2603.20162v1 Announce Type: new Abstract: In contested domains, instruction-tuned language models must balance user-alignment pressures...
Evaluating Creative Short Story Generation in Humans and Large Language Models
arXiv:2411.02316v5 Announce Type: replace Abstract: Story-writing is a fundamental aspect of human imagination, relying heavily...
Evaluating Autoformalization Robustness via Semantically Similar Paraphrasing
arXiv:2511.12784v1 Announce Type: new Abstract: Large Language Models (LLMs) have recently emerged as powerful tools...
Evaluating and Improving Robustness in Large Language Models: A Survey and Future Directions
arXiv:2506.11111v1 Announce Type: new Abstract: Large Language Models (LLMs) have gained enormous attention in recent...
Evaluating $n$-Gram Novelty of Language Models Using Rusty-DAWG
arXiv:2406.13069v4 Announce Type: replace Abstract: How novel are texts generated by language models (LMs) relative...
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees
arXiv:2503.08893v2 Announce Type: replace Abstract: An ideal model evaluation should achieve two goals: identifying where...
EVALOOOP: A Self-Consistency-Centered Framework for Assessing Large Language Model Robustness in Programming
arXiv:2505.12185v5 Announce Type: replace-cross Abstract: Evaluating the programming robustness of large language models (LLMs) is...
Estranged Predictions: Measuring Semantic Category Disruption with Masked Language Modelling
arXiv:2511.08109v1 Announce Type: new Abstract: This paper examines how science fiction destabilises ontological categories by...
Estimating Privacy Leakage of Augmented Contextual Knowledge in Language Models
arXiv:2410.03026v3 Announce Type: replace Abstract: Language models (LMs) rely on their parametric knowledge augmented with...
Estimating LLM Uncertainty with Logits
arXiv:2502.00290v4 Announce Type: replace Abstract: Over the past few years, Large Language Models (LLMs) have...
Erasing Conceptual Knowledge from Language Models
arXiv:2410.02760v3 Announce Type: replace Abstract: In this work, we introduce Erasure of Language Memory (ELM)...
Epistemic Diversity and Knowledge Collapse in Large Language Models
arXiv:2510.04226v4 Announce Type: replace Abstract: Large language models (LLMs) tend to generate lexically, semantically, and...