YouZum

Nachrichten

Nachrichten

LLM-as-a-Judge: Can Language Models Be Trusted to Evaluate Other Models?

Exploring the promise, pitfalls, and practical applications of using LLMs to automate AI evaluation — from synthetic...

LLM-as-a-Judge for Reference-less Automatic Code Validation and Refinement for Natural Language to Bash in IT Automation

arXiv:2506.11237v1 Announce Type: cross Abstract: In an effort to automatically evaluate and select the best...

LLM one-shot style transfer for Authorship Attribution and Verification

arXiv:2510.13302v1 Announce Type: new Abstract: Computational stylometry analyzes writing style through quantitative patterns in text...

LLaPa: A Vision-Language Model Framework for Counterfactual-Aware Procedural Planning

arXiv:2507.08496v1 Announce Type: new Abstract: While large language models (LLMs) have advanced procedural planning for...

Llama See, Llama Do: A Mechanistic Perspective on Contextual Entrainment and Distraction in LLMs

arXiv:2505.09338v1 Announce Type: new Abstract: We observe a novel phenomenon, contextual entrainment, across a wide...

Liquid AI’s LFM2-VL-3B Brings a 3B Parameter Vision Language Model (VLM) to Edge-Class Devices

Liquid AI released LFM2-VL-3B, a 3B parameter vision language model for image text to text...

Like humans, AI is forcing institutions to rethink their purpose

Like people undergoing cognitive migration, institutions must reassess what they were made for in this...

Let’s Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM’s Math Capability

arXiv:2505.23703v4 Announce Type: replace-cross Abstract: Enhancing the mathematical reasoning capabilities of LLMs has garnered significant...

LegalEval-Q: A New Benchmark for The Quality Evaluation of LLM-Generated Legal Text

arXiv:2505.24826v1 Announce Type: new Abstract: As large language models (LLMs) are increasingly used in legal...

Learning to Interpret Weight Differences in Language Models

arXiv:2510.05092v3 Announce Type: replace-cross Abstract: Finetuning (pretrained) language models is a standard approach for updating...

We use cookies to improve your experience and performance on our website. You can learn more at Datenschutzrichtlinie and manage your privacy settings by clicking Settings.

Privacy Preferences

You can choose your cookie settings by turning on/off each type of cookie as you wish, except for essential cookies.

Allow All
Manage Consent Preferences
  • Always Active

Save
de_DE