Square Root of Dimensions
1/19/20251 min read


At first glance, the equation seems simple enough. Yet at second glance, it reminds me of a vector projection equation as below.
The softmax attention equation is poetic to say the least. Let's meditate on the same as pasted below.
Equation 1: Softmax Attention from the seminal Attention Is All You Need paper
Equation 2: Scalar projection of a to b
This is where things get interesting. Comparing Equation 1 and Equation 2, we find:
In other words, the square root of the number of dimensions is the metric of the space.
Are there theories out there which use this approximation?
Possibly yes, reminds me of string theory a bit, where square root of 26 is used quite a bit.
Could we posit that machine learning systems functioning is a distributed verification of string theory?