Notizie
Notizie
Delving into Multilingual Ethical Bias: The MSQAD with Statistical Hypothesis Tests for Large Language Models
arXiv:2505.19121v2 Announce Type: replace Abstract: Despite the recent strides in large language models, studies have...
DelvePO: Direction-Guided Self-Evolving Framework for Flexible Prompt Optimization
arXiv:2510.18257v1 Announce Type: new Abstract: Prompt Optimization has emerged as a crucial approach due to...
DefenderBench: A Toolkit for Evaluating Language Agents in Cybersecurity Environments
arXiv:2506.00739v3 Announce Type: replace Abstract: Large language model (LLM) agents have shown impressive capabilities in...
DeepSeek R1-0528 arrives in powerful open source challenge to OpenAI o3 and Google Gemini 2.5 Pro
Additionally, the model’s hallucination rate has been reduced, contributing to more reliable and consistent output.Read...
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
arXiv:2506.11763v1 Announce Type: new Abstract: Deep Research Agents are a prominent category of LLM-based agents...
DeepReinforce Team Introduces CUDA-L1: An Automated Reinforcement Learning (RL) Framework for CUDA Optimization Unlocking 3x More Power from GPUs
Estimated reading time: 6 minutes Table of contents The Breakthrough: Contrastive Reinforcement Learning (Contrastive-RL) How...
Deep Learning-Based Digitization of Overlapping ECG Images with Open-Source Python Code
arXiv:2506.10617v1 Announce Type: cross Abstract: This paper addresses the persistent challenge of accurately digitizing paper-based...
Decoding Neural Emotion Patterns through Natural Language Processing Embeddings
arXiv:2508.09337v1 Announce Type: new Abstract: Understanding how emotional expression in language relates to brain function...
DecMetrics: Structured Claim Decomposition Scoring for Factually Consistent LLM Outputs
arXiv:2509.04483v1 Announce Type: new Abstract: Claim decomposition plays a crucial role in the fact-checking process...
Decision-Oriented Text Evaluation
arXiv:2507.01923v2 Announce Type: replace Abstract: Natural language generation (NLG) is increasingly deployed in high-stakes domains...
Decide less, communicate more: On the construct validity of end-to-end fact-checking in medicine
arXiv:2506.20876v1 Announce Type: new Abstract: Technological progress has led to concrete advancements in tasks that...
Debate or Vote: Which Yields Better Decisions in Multi-Agent Large Language Models?
arXiv:2508.17536v1 Announce Type: new Abstract: Multi-Agent Debate~(MAD) has emerged as a promising paradigm for improving...
 
				 
				

 
				 
					           
					           
					           
					           
					           
					          