Notizie
Notizie
Everyone’s looking to get in on vibe coding — and Google is no different with Stitch, its follow-up to Jules
Google is looking to compete in vibe coding with Stitch, which designs user interfaces (UIs)...
EventHunter: Dynamic Clustering and Ranking of Security Events from Hacker Forum Discussions
arXiv:2507.09762v1 Announce Type: cross Abstract: Hacker forums provide critical early warning signals for emerging cybersecurity...
Evaluation of LLM Vulnerabilities to Being Misused for Personalized Disinformation Generation
arXiv:2412.13666v2 Announce Type: replace Abstract: The capabilities of recent large language models (LLMs) to generate...
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models
arXiv:2412.09645v3 Announce Type: replace-cross Abstract: Recent advancements in visual generative models have enabled high-quality image...
Evaluating Speech-to-Text x LLM x Text-to-Speech Combinations for AI Interview Systems
arXiv:2507.16835v1 Announce Type: cross Abstract: Voice-based conversational AI systems increasingly rely on cascaded architectures combining...
Evaluating Rare Disease Diagnostic Performance in Symptom Checkers: A Synthetic Vignette Simulation Approach
arXiv:2506.19750v2 Announce Type: replace Abstract: Symptom Checkers (SCs) provide users with personalized medical information. To...
Evaluating LLMs on Real-World Forecasting Against Expert Forecasters
arXiv:2507.04562v3 Announce Type: replace-cross Abstract: Large language models (LLMs) have demonstrated remarkable capabilities across diverse...
Evaluating Creative Short Story Generation in Humans and Large Language Models
arXiv:2411.02316v5 Announce Type: replace Abstract: Story-writing is a fundamental aspect of human imagination, relying heavily...
Evaluating and Improving Robustness in Large Language Models: A Survey and Future Directions
arXiv:2506.11111v1 Announce Type: new Abstract: Large Language Models (LLMs) have gained enormous attention in recent...
Evaluating $n$-Gram Novelty of Language Models Using Rusty-DAWG
arXiv:2406.13069v4 Announce Type: replace Abstract: How novel are texts generated by language models (LMs) relative...
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees
arXiv:2503.08893v2 Announce Type: replace Abstract: An ideal model evaluation should achieve two goals: identifying where...
Estimating LLM Uncertainty with Logits
arXiv:2502.00290v4 Announce Type: replace Abstract: Over the past few years, Large Language Models (LLMs) have...