“Structured random matrices and cyclic cumulants: A free probability approach” by Bernard and Hruza

The evolution of some papers of Bernard and Hruza in the context of “Quantum Symmetric Simple Exclusion Process” (QSSEP) have been addressed before in some posts, here and here. Now there is a new version of the latest preprint, with the more precise title “Structured random matrices and cyclic cumulants : A free probability approach“. I think the connection of the considered class of random matrices with our free probability tools (in particular, the operator-valued ones) is getting nicer and nicer. One of the new additions in this version is the proof that applying non-linear functions entrywise to the matrices (as is usually done in the machine learning context) does not lead out of this class and one can actually describe the effect of such an application.

I consider all this as a very interesting new development which connects to many things, and I will try to describe this a bit more concretely in the following.

For the usual unitarily invariant matrix ensembles we know that the main information about the entries which contributes in the limit to the calculation of the matrix distribution are the cyclic classical cumulants (or ”loop expectation values”). Those are cumulants of the entries with cyclic index structure $c(x_{i_1,i_2},x_{i_2,i_3},\dots,x_{i_n,i_1})$ – and, for the unitarily invariant ensembles, the asymptotics of those cumulants does not depend on the chosen indices $i_1,i_2,\dots,i_n$ . Actually, in the limit, with the right scaling, those give the free cumulants $\kappa_n$ of our limiting matrix distribution. The idea of Bernard and Hruza is now to extend this to an inhomogeneous setting, where the indices matter and thus the limiting information is given by ”local” free cumulants $\kappa_n(t_1,\dots,t_n)$ where the $t_k$ are the limits of $i_k/N$ . For the operator-valued aficionado this has the flavor of operator-valued cumulants (over the diagonals), and this is indeed the case, as shown in the paper. For the case of inhomogeneous Gaussian matrices, where the limits are given by operator-valued semicircular elements, such results are not new, going back to the work of Dima Shlyakhtenko on band matrices, and constitute one of our most beloved indications for the power of operator-valued free probability. The paper of Bernard and Hruza goes much beyond the semicircular case and opens a very interesting direction. The fact that this is motivated by problems in statistical physics makes this even more exciting. Of course, there are many questions arising from this, on which we should follow up. In particular, is there a good version of this not only over diagonal matrices; or, is there a relation with the notion of R-cyclic distributions, which I introduced with Andu and Dima a while ago in this paper?

As I mentioned at the beginning I am also intrigued by the effect of applying non-linear functions to the entries of our matrices. This is something, we usually don’t do in ”classical” random matrix theory, but which has become quite popular in recent years in the machine learning context. There are statements which go under the name of Gaussian equivalence principle, which say that the effect of the non-linearity is in many cases the same as adding an independent Gaussian random noise. Actually, this is usually done for special random matrices, like products of Wishart matrices. However, this seems to be true more general; I was convinced that this is valid for unitarily invariant matrices, but the paper of Bernard and Hruza shows that even in their more general setting one has results of this type.

In order to give a concrete feeling of what I am talking about here, let me give a concrete numerical example for the unitarily invariant case. Actually, I prefer her the real setting, i.e., orthogonal invariant matrices. For me canonical examples of those are given by polynomials in independent GOE, so let us consider here $X_N=A_N^2+A_N+B_NA_N+A_NB_N+B_N$ , where $A_N$ and $B_N$ are independent GOE. The following plot shows the histogram of the eigenvalues of one realization of $X_N$ for $N=5000$.

We apply now a non-linear function to the entries of $X_N$ . Actually, one has to be a bit careful about what this means; in particular, one needs a good scaling and should not act naively on the diagonals, as this would otherwise dominate everything. Here is the precise definition of the entries of the new matrix $Y_N$ .

$y_{ij}=\begin{cases} \frac 1{\sqrt{N}} f(\sqrt{N} x_{ij}), & i\not=j\\ 0,& i=j.\end{cases}$

For the function f, let’s take the machine learning favorite, namely the ReLU function, i.e., $f(x)=\text{ReLU}(x):=\max(0,x)$ . Here is the histogram of the eigenvalues of the corresponding matrix $Y_N$

But now the Gaussian equivalence principle says that this $Y_N$ should asymptotically have the same distribution as the original matrix perturbed by a noise, i.e., as $\theta_1 X_N+\theta_2 Z_N$ , where $Z_N$ is a GOE matrix, independent from $X_N$ , and where in our case $\theta_1=1/2$ and $\theta_2=\sqrt{5(1-2/\pi)}/2$ . The following plot superimposes the eigenvalue distribution of $\theta_1 X_N+\theta_2 Z_N$ to the preceding plot for $Y_N$ . Looks convincing, doesn’t it.

One should note that the ReLU functions does not fall into the class of (polynomial or analytic) functions, for which the results are usually proved, but ReLU is explicit enough, so that probably one could also do this more directly in this case. Anyhow, all this should just be an appetizer, there is still quite a bit to formulate and prove in this direction, but I am looking forward to a formidable feast in the end.

Free Probability Theory

A blog on freeing probability from its commutative chains

“Structured random matrices and cyclic cumulants: A free probability approach” by Bernard and Hruza

Leave a comment Cancel reply

Teilen mit:

Related

Leave a comment Cancel reply