YouZum

Notizie

Notizie

Evaluating LLMs on Real-World Forecasting Against Expert Forecasters

arXiv:2507.04562v3 Announce Type: replace-cross Abstract: Large language models (LLMs) have demonstrated remarkable capabilities across diverse...

Evaluating Large Language Models for Anxiety, Depression, and Stress Detection: Insights into Prompting Strategies and Synthetic Data

arXiv:2511.07044v2 Announce Type: replace Abstract: Mental health disorders affect over one-fifth of adults globally, yet...

Evaluating Evidence Grounding Under User Pressure in Instruction-Tuned Language Models

arXiv:2603.20162v1 Announce Type: new Abstract: In contested domains, instruction-tuned language models must balance user-alignment pressures...

Evaluating Creative Short Story Generation in Humans and Large Language Models

arXiv:2411.02316v5 Announce Type: replace Abstract: Story-writing is a fundamental aspect of human imagination, relying heavily...

Evaluating Autoformalization Robustness via Semantically Similar Paraphrasing

arXiv:2511.12784v1 Announce Type: new Abstract: Large Language Models (LLMs) have recently emerged as powerful tools...

Evaluating and Improving Robustness in Large Language Models: A Survey and Future Directions

arXiv:2506.11111v1 Announce Type: new Abstract: Large Language Models (LLMs) have gained enormous attention in recent...

Evaluating $n$-Gram Novelty of Language Models Using Rusty-DAWG

arXiv:2406.13069v4 Announce Type: replace Abstract: How novel are texts generated by language models (LMs) relative...

EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees

arXiv:2503.08893v2 Announce Type: replace Abstract: An ideal model evaluation should achieve two goals: identifying where...

EVALOOOP: A Self-Consistency-Centered Framework for Assessing Large Language Model Robustness in Programming

arXiv:2505.12185v5 Announce Type: replace-cross Abstract: Evaluating the programming robustness of large language models (LLMs) is...

Estranged Predictions: Measuring Semantic Category Disruption with Masked Language Modelling

arXiv:2511.08109v1 Announce Type: new Abstract: This paper examines how science fiction destabilises ontological categories by...

Estimating Privacy Leakage of Augmented Contextual Knowledge in Language Models

arXiv:2410.03026v3 Announce Type: replace Abstract: Language models (LMs) rely on their parametric knowledge augmented with...

Estimating LLM Uncertainty with Logits

arXiv:2502.00290v4 Announce Type: replace Abstract: Over the past few years, Large Language Models (LLMs) have...

We use cookies to improve your experience and performance on our website. You can learn more at Politica sulla privacy and manage your privacy settings by clicking Settings.

Privacy Preferences

You can choose your cookie settings by turning on/off each type of cookie as you wish, except for essential cookies.

Allow All
Manage Consent Preferences
  • Always Active

Save
it_IT