“Structured random matrices and cyclic cumulants: A free probability approach” by Bernard and Hruza

The evolution of some papers of Bernard and Hruza in the context of “Quantum Symmetric Simple Exclusion Process” (QSSEP) have been addressed before in some posts, here and here. Now there is a new version of the latest preprint, with the more precise title “Structured random matrices and cyclic cumulants : A free probability approach“. I think the connection of the considered class of random matrices with our free probability tools (in particular, the operator-valued ones) is getting nicer and nicer. One of the new additions in this version is the proof that applying non-linear functions entrywise to the matrices (as is usually done in the machine learning context) does not lead out of this class and one can actually describe the effect of such an application.

I consider all this as a very interesting new development which connects to many things, and I will try to describe this a bit more concretely in the following.

For the usual unitarily invariant matrix ensembles we know that the main information about the entries which contributes in the limit to the calculation of the matrix distribution are the cyclic classical cumulants (or ”loop expectation values”). Those are cumulants of the entries with cyclic index structure c(x_{i_1,i_2},x_{i_2,i_3},\dots,x_{i_n,i_1}) – and, for the unitarily invariant ensembles, the asymptotics of those cumulants does not depend on the chosen indices i_1,i_2,\dots,i_n. Actually, in the limit, with the right scaling, those give the free cumulants \kappa_n of our limiting matrix distribution. The idea of Bernard and Hruza is now to extend this to an inhomogeneous setting, where the indices matter and thus the limiting information is given by ”local” free cumulants \kappa_n(t_1,\dots,t_n) where the t_k are the limits of i_k/N. For the operator-valued aficionado this has the flavor of operator-valued cumulants (over the diagonals), and this is indeed the case, as shown in the paper. For the case of inhomogeneous Gaussian matrices, where the limits are given by operator-valued semicircular elements, such results are not new, going back to the work of Dima Shlyakhtenko on band matrices, and constitute one of our most beloved indications for the power of operator-valued free probability. The paper of Bernard and Hruza goes much beyond the semicircular case and opens a very interesting direction. The fact that this is motivated by problems in statistical physics makes this even more exciting. Of course, there are many questions arising from this, on which we should follow up. In particular, is there a good version of this not only over diagonal matrices; or, is there a relation with the notion of R-cyclic distributions, which I introduced with Andu and Dima a while ago in this paper?

As I mentioned at the beginning I am also intrigued by the effect of applying non-linear functions to the entries of our matrices. This is something, we usually don’t do in ”classical” random matrix theory, but which has become quite popular in recent years in the machine learning context. There are statements which go under the name of Gaussian equivalence principle, which say that the effect of the non-linearity is in many cases the same as adding an independent Gaussian random noise. Actually, this is usually done for special random matrices, like products of Wishart matrices. However, this seems to be true more general; I was convinced that this is valid for unitarily invariant matrices, but the paper of Bernard and Hruza shows that even in their more general setting one has results of this type.

In order to give a concrete feeling of what I am talking about here, let me give a concrete numerical example for the unitarily invariant case. Actually, I prefer her the real setting, i.e., orthogonal invariant matrices. For me canonical examples of those are given by polynomials in independent GOE, so let us consider here X_N=A_N^2+A_N+B_NA_N+A_NB_N+B_N, where A_N and B_N are independent GOE. The following plot shows the histogram of the eigenvalues of one realization of X_N for $N=5000$.

We apply now a non-linear function to the entries of X_N. Actually, one has to be a bit careful about what this means; in particular, one needs a good scaling and should not act naively on the diagonals, as this would otherwise dominate everything. Here is the precise definition of the entries of the new matrix Y_N.

y_{ij}=\begin{cases} \frac 1{\sqrt{N}} f(\sqrt{N} x_{ij}), & i\not=j\\ 0,& i=j.\end{cases}

For the function f, let’s take the machine learning favorite, namely the ReLU function, i.e., f(x)=\text{ReLU}(x):=\max(0,x). Here is the histogram of the eigenvalues of the corresponding matrix Y_N

But now the Gaussian equivalence principle says that this Y_N should asymptotically have the same distribution as the original matrix perturbed by a noise, i.e., as \theta_1 X_N+\theta_2 Z_N, where Z_N is a GOE matrix, independent from X_N, and where in our case \theta_1=1/2 and \theta_2=\sqrt{5(1-2/\pi)}/2. The following plot superimposes the eigenvalue distribution of \theta_1 X_N+\theta_2 Z_N to the preceding plot for Y_N. Looks convincing, doesn’t it.

One should note that the ReLU functions does not fall into the class of (polynomial or analytic) functions, for which the results are usually proved, but ReLU is explicit enough, so that probably one could also do this more directly in this case. Anyhow, all this should just be an appetizer, there is still quite a bit to formulate and prove in this direction, but I am looking forward to a formidable feast in the end.

Leave a comment