Thursday, April 3, 2025

Over-parameterization: a reprint about a geometric approach to estimating a number of parameters for Large Language and Image Processing Models

I learned how currently Large Language and Image Processing (Object Detection and Classification, Generation and others) models can benefit from over-parameterization in the book "Understanding Deep Learning" by Simon Prince, although at the moment nobody knew how it happens. I pondered on the question and, being an abstract mathematician by training, thought about something which is usually so useless in Data Science that people as a rule ignore it completely, namely an exact solution. It turned out that in this particular case searching for an exact solution might be a reasonable approach. Here is my pre-print about it: http://dx.doi.org/10.13140/RG.2.2.18776.61442