arXiv:2509.15235v4 Announce Type: replace-cross
Abstract: Speculative decoding is a widely adopted technique for accelerating inference in large language models (LLMs), yet its application to vision-language models (VLMs) remains underexplored, with existing methods achieving only modest speedups (
 
				 
				 
				 
					           
					           
					           
					           
					           
					          