The Dual-Route Model of Induction

arXiv:2504.03022v2 Announce Type: replace
Abstract: Prior work on in-context copying has shown the existence of induction heads, which attend to and promote individual tokens during copying. In this work we discover a new type of induction head: concept-level induction heads, which copy entire lexical units instead of individual tokens. Concept induction heads learn to attend to the ends of multi-token words throughout training, working in parallel with token-level induction heads to copy meaningful text. We show that these heads are responsible for semantic tasks like word-level translation, whereas token induction heads are vital for tasks that can only be done verbatim (like copying nonsense tokens). These two “routes” operate independently: we show that ablation of token induction heads causes models to paraphrase where they would otherwise copy verbatim. By patching concept induction head outputs, we find that they contain language-independent word representations that mediate natural language translation, suggesting that LLMs represent abstract word meanings independent of language or form.

The Dual-Route Model of Induction

I nostri servizi

Home

Come funziona

Notizie

Prezzi

Supporto

Centro assistenza

Segnala un problema

Dai feedback

Politica sulla privacy

Account utente

Seguici