Recent advancements in linear-attention models, such as RWKV, have opened up new possibilities for efficient sequence processing by reducing the computational overhead of traditional Transformer ...