I did a little mini research project into trying out KANs in practice for some toy problems, also back in 2024 when they were the hot new thing: https://cprimozic.net/blog/trying-out-kans/
TL;DR KANs are tricker to train than traditional neural networks, but they largely have similar loss values given equivalent parameter counts.
Part of this may be due to the fact that most of the optimizers and other components of the training stack have been tuned over decades for MLPs, and there may well be ways out there to get training to work even better for KANs.
I don't personally find a lot of appeal in KANs for big, deep models like LLMs or anything close to that scale. KANs and their B-Splines are much less hardware-friendly than matrix multiplication. However, they are interesting to me from an interpretability perspective, and there may be some unique possibilities there for smaller cases.
I’m no expert, but looks like you can represent chebyshev polynomials as the determinant of a square matrix, and if they’re all the same size then multiplying the polynomials should be equivalent to multiplying the matrices and taking the determinant afterwards. Given that the matrices follow a very predictable form, this should also be pretty hardware performant I think.
TL;DR KANs are tricker to train than traditional neural networks, but they largely have similar loss values given equivalent parameter counts.
Part of this may be due to the fact that most of the optimizers and other components of the training stack have been tuned over decades for MLPs, and there may well be ways out there to get training to work even better for KANs.
I don't personally find a lot of appeal in KANs for big, deep models like LLMs or anything close to that scale. KANs and their B-Splines are much less hardware-friendly than matrix multiplication. However, they are interesting to me from an interpretability perspective, and there may be some unique possibilities there for smaller cases.
reply