Surgical Repair of Collapsed Attention Heads in ALiBi Transformers

Posted by palmerschallon 2 hours ago

Counter3Comment1OpenOriginal

Comments

Comment by palmerschallon 2 hours ago

Found that ALiBi positional encoding causes 31-44% of attention heads in BLOOM-family models to collapse — attending almost entirely to token 0 rather than meaningful context. The paper identifies the pathology and a targeted repair. Happy to answer questions.