News
News
Scientists’ First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning
arXiv:2506.10521v2 Announce Type: replace-cross Abstract: Scientific discoveries increasingly rely on complex multimodal reasoning based on...
Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation
arXiv:2504.02438v5 Announce Type: replace Abstract: Long-form video processing fundamentally challenges vision-language models (VLMs) due to...
Scaling Truth: The Confidence Paradox in AI Fact-Checking
arXiv:2509.08803v1 Announce Type: cross Abstract: The rise of misinformation underscores the need for scalable and...
Scaling Multimodal Search and Recommendation with Small Language Models via Upside-Down Reinforcement Learning
arXiv:2502.09854v2 Announce Type: replace Abstract: In this work, we investigate how small language models (SLMs)...
Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis
arXiv:2505.13227v2 Announce Type: replace-cross Abstract: Graphical user interface (GUI) grounding, the ability to map natural...
ScaleFormer: Span Representation Cumulation for Long-Context Transformer
arXiv:2511.10029v1 Announce Type: new Abstract: The quadratic complexity of standard self-attention severely limits the application...
ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing
arXiv:2506.19848v1 Announce Type: cross Abstract: This paper presents ScaleCap, an inference-time scalable image captioning strategy...
SAS: Simulated Attention Score
arXiv:2507.07694v1 Announce Type: new Abstract: The attention mechanism is a core component of the Transformer...
Samsung Researchers Introduced ANSE (Active Noise Selection for Generation): A Model-Aware Framework for Improving Text-to-Video Diffusion Models through Attention-Based Uncertainty Estimation
Video generation models have become a core technology for creating dynamic content by transforming text...
Sampling Without Data is Now Scalable: Meta AI Releases Adjoint Sampling for Reward-Driven Generative Modeling
Data Scarcity in Generative Modeling Generative models traditionally rely on large, high-quality datasets to produce...
Sam Altman calls for ‘AI privilege’ as OpenAI clarifies court order to retain temporary and deleted ChatGPT sessions
Should talking to an AI chatbot be protected and privileged information, like talking to a...
Salesforce AI Research Releases VoiceAgentRAG: A Dual-Agent Memory Router that Cuts Voice RAG Retrieval Latency by 316x
In the world of voice AI, the difference between a helpful assistant and an awkward...



