The MAMBA design transformer that has a language modeling head on best (linear layer with weights tied into the input
If passed alongside, the product takes advantage of the preceding state in the many blocks (which is https://k2spiceshop.com/product/liquid-k2-on-paper-online/