YouZum

Context-level Language Modeling by Learning Predictive Context Embeddings

arXiv:2510.20280v3 Announce Type: replace
Abstract: We propose ContextLM, a framework that implicitly learns multi-token prediction by augmenting standard pretraining with an intrinsic next-context prediction objective. ContextLM builds a language model on top of context embeddings that span multiple tokens, enabling better next-token prediction by predicting the next context. Our model is fully compatible with standard autoregressive, token-by-token evaluation paradigms (e.g., perplexity). Extensive experiments with GPT-2 and Pythia backbones (up to 1.5B parameters and 300B training tokens) reveal that ContextLM shifts the Pareto frontier of scaling laws, exhibiting superior efficiency in parameters, training tokens, and FLOPs. Our results show that ContextLM could already achieve the baseline perplexity using 39% fewer parameters and demonstrates robust generalization improvements on extensive downstream tasks under equivalent parameter counts.

We use cookies to improve your experience and performance on our website. You can learn more at นโยบายความเป็นส่วนตัว and manage your privacy settings by clicking Settings.

ตั้งค่าความเป็นส่วนตัว

You can choose your cookie settings by turning on/off each type of cookie as you wish, except for essential cookies.

ยอมรับทั้งหมด
จัดการความเป็นส่วนตัว
  • เปิดใช้งานตลอด

บันทึกการตั้งค่า
th