Category Archives: Machine Learning

“Structured random matrices and cyclic cumulants: A free probability approach” by Bernard and Hruza

The evolution of some papers of Bernard and Hruza in the context of “Quantum Symmetric Simple Exclusion Process” (QSSEP) have been addressed before in some posts, here and here. Now there is a new version of the latest preprint, with the more precise title “Structured random matrices and cyclic cumulants : A free probability approach“. I think the connection of the considered class of random matrices with our free probability tools (in particular, the operator-valued ones) is getting nicer and nicer. One of the new additions in this version is the proof that applying non-linear functions entrywise to the matrices (as is usually done in the machine learning context) does not lead out of this class and one can actually describe the effect of such an application.

I consider all this as a very interesting new development which connects to many things, and I will try to describe this a bit more concretely in the following.

For the usual unitarily invariant matrix ensembles we know that the main information about the entries which contributes in the limit to the calculation of the matrix distribution are the cyclic classical cumulants (or ”loop expectation values”). Those are cumulants of the entries with cyclic index structure c(x_{i_1,i_2},x_{i_2,i_3},\dots,x_{i_n,i_1}) – and, for the unitarily invariant ensembles, the asymptotics of those cumulants does not depend on the chosen indices i_1,i_2,\dots,i_n. Actually, in the limit, with the right scaling, those give the free cumulants \kappa_n of our limiting matrix distribution. The idea of Bernard and Hruza is now to extend this to an inhomogeneous setting, where the indices matter and thus the limiting information is given by ”local” free cumulants \kappa_n(t_1,\dots,t_n) where the t_k are the limits of i_k/N. For the operator-valued aficionado this has the flavor of operator-valued cumulants (over the diagonals), and this is indeed the case, as shown in the paper. For the case of inhomogeneous Gaussian matrices, where the limits are given by operator-valued semicircular elements, such results are not new, going back to the work of Dima Shlyakhtenko on band matrices, and constitute one of our most beloved indications for the power of operator-valued free probability. The paper of Bernard and Hruza goes much beyond the semicircular case and opens a very interesting direction. The fact that this is motivated by problems in statistical physics makes this even more exciting. Of course, there are many questions arising from this, on which we should follow up. In particular, is there a good version of this not only over diagonal matrices; or, is there a relation with the notion of R-cyclic distributions, which I introduced with Andu and Dima a while ago in this paper?

As I mentioned at the beginning I am also intrigued by the effect of applying non-linear functions to the entries of our matrices. This is something, we usually don’t do in ”classical” random matrix theory, but which has become quite popular in recent years in the machine learning context. There are statements which go under the name of Gaussian equivalence principle, which say that the effect of the non-linearity is in many cases the same as adding an independent Gaussian random noise. Actually, this is usually done for special random matrices, like products of Wishart matrices. However, this seems to be true more general; I was convinced that this is valid for unitarily invariant matrices, but the paper of Bernard and Hruza shows that even in their more general setting one has results of this type.

In order to give a concrete feeling of what I am talking about here, let me give a concrete numerical example for the unitarily invariant case. Actually, I prefer her the real setting, i.e., orthogonal invariant matrices. For me canonical examples of those are given by polynomials in independent GOE, so let us consider here X_N=A_N^2+A_N+B_NA_N+A_NB_N+B_N, where A_N and B_N are independent GOE. The following plot shows the histogram of the eigenvalues of one realization of X_N for $N=5000$.

We apply now a non-linear function to the entries of X_N. Actually, one has to be a bit careful about what this means; in particular, one needs a good scaling and should not act naively on the diagonals, as this would otherwise dominate everything. Here is the precise definition of the entries of the new matrix Y_N.

y_{ij}=\begin{cases} \frac 1{\sqrt{N}} f(\sqrt{N} x_{ij}), & i\not=j\\ 0,& i=j.\end{cases}

For the function f, let’s take the machine learning favorite, namely the ReLU function, i.e., f(x)=\text{ReLU}(x):=\max(0,x). Here is the histogram of the eigenvalues of the corresponding matrix Y_N

But now the Gaussian equivalence principle says that this Y_N should asymptotically have the same distribution as the original matrix perturbed by a noise, i.e., as \theta_1 X_N+\theta_2 Z_N, where Z_N is a GOE matrix, independent from X_N, and where in our case \theta_1=1/2 and \theta_2=\sqrt{5(1-2/\pi)}/2. The following plot superimposes the eigenvalue distribution of \theta_1 X_N+\theta_2 Z_N to the preceding plot for Y_N. Looks convincing, doesn’t it.

One should note that the ReLU functions does not fall into the class of (polynomial or analytic) functions, for which the results are usually proved, but ReLU is explicit enough, so that probably one could also do this more directly in this case. Anyhow, all this should just be an appetizer, there is still quite a bit to formulate and prove in this direction, but I am looking forward to a formidable feast in the end.

Free probability, between math and physics (and also machine learning) – some updates

In the recent post of a similar title I mentioned some papers which related physics problems (eigenstate thermalization hypothesis or Open Quantum SSEP) with free probability. Let me point out that the title of the preprint by Hruza and Bernard has been changed to “Coherent Fluctuations in Noisy Mesoscopic Systems, the Open Quantum SSEP and Free Probability” and that there are some new and follow up preprints in this directions, namely “Spectrum of subblocks of structured random matrices: A free probability approach“, by Bernard and Hruza, and also “Designs via free probability“, by Fava, Kurchan, Pappalardi. In all of them free cumulants and their relations to random matrices play an important role. Not too surprisingly, I find this very interesting in general, but also in particular, as during my voyage in the machine learning world I became a bit obsessed with the fact that free cumulants are given by the leading order of classical cumulants of the entries of unitarily invariant matrix ensembles (with a “cyclic” or “loop” structure of the indices). This seems to be highly relevant – though, at the moment I am not actually sure for what exactly.

Anyhow, if anybody is interested in this, in the last lecture of my machine learning course I give a very high level survey on these relations, and in the video on Gaussian equivalence principle in the same course I talk about a more concrete model of this in the random features model context.

Free Probability, Random Matrices and Machine Learning

Update (August 14): The lecture notes of the course on “High-Dimensional Analysis: Random Matrices and Machine Learning” are now available.

My activity on this blog has been quite low for a while, so it might be good to give a life sign and let you know that I still intend to keep the blog running, and hopefully be more active again in the future.

Lately, I have become interested in machine learning, mostly from the point of view of random matrices and free probability. I am trying to get some ideas what is going on there on a mathematical and conceptual level – mostly from the perspective that neural networks should give some inspirations for interesting questions on and extensions of random matrix and free probability theory.

The best way to learn a subject is to teach it, and that’s what I did. I just finished here in Saarbrücken a lecture series on “High-Dimensional Analysis: Random Matrices and Machine Learning”. I have created a blog page concerning this course; the lectures were recorded and will bit by bit be uploaded to the corresponding youtube playlist.

I hope to have more to say here on those topics in the (hopefully not too far) future. I know that quite a few people from our community have also some interest (maybe even already some work) in this direction. Guest blogs on such topics are highly welcome!