Square Root of Dimensions
1/19/20251 min read


At first glance, the equation seems simple enough. Yet at second glance, it reminds me of a vector projection equation as below.
The softmax attention equation is poetic to say the least. Let's meditate on the same as pasted below.
Equation 1: Softmax Attention from the seminal Attention Is All You Need paper
Equation 2: Scalar projection of a to b
This is where things get interesting. Comparing Equation 1 and Equation 2, we find:
In other words, the square root of the number of dimensions is the metric of the space.
Are there theories out there which use this approximation?
Possibly yes, reminds me of string theory a bit, where square root of 26 is used often
Could we posit that modern AI systems being practically feasible
is a distributed verification of string theory?
