Surgical Repair of Collapsed Attention Heads in ALiBi Transformers
Posted by palmerschallon 2 hours ago
Comments
Comment by palmerschallon 2 hours ago
Found that ALiBi positional encoding causes 31-44% of attention heads in BLOOM-family models to collapse — attending almost entirely to token 0 rather than meaningful context. The paper identifies the pathology and a targeted repair. Happy to answer questions.