Tag Archives: random matrices

Understanding Non-Linearity in Random Matrices

Update: I will give an online talk on my work with Alex in Voiculescu’s Probabilistic Operator Algebra Seminar on Monday, Feb 3, at 9am Pacific time (which is 6pm German time). In order to get the online link for the talk you should write to jgarzav at caltech dot edu.

In the post “Structured random matrices and cyclic cumulants: A free probability approach” by Bernard and Hruza I mentioned the problem about the effect of non-linear functions on random matrices and showed some plots from my ongoing joint work with Alexander Wendel. Finally, a few days ago, we uploaded the preprint to the arXiv.

What is this about? Consider a random matrix ensemble XN which has a limiting eigenvalue distribution for N going to infinity. In the machine learning context applying non-linear functions to the entries of such matrices plays a prominent role and the question arises: what is the asymptotic effect of this operation. There are a couple of results in this direction; see, in particular, the paper by Pennington and Worah and the one of Peche. However, it seems that they deal with quite special choices for XN, like a product of two independent Wishart matrices. In those cases the asymptotic effect of applying the non-linearity is the same as taking a linear combination of the XN with an independent Gaussian matrix. The coefficients in this linear combination depend only on a few quantities calculated from the non-linear function. Such statements are often known as Gaussian equivalence principle.

Our main contribution is that this kind of result is also true much more generally, namely for the class of rotationally invariant matrices. The rotational invariance is kind of the underlying reason for the specific form of the result. Roughly, the effect of the non-linearity is a small deformation of the rotational invariance, so that the resulting random matrix ensemble still exhibits the main features of rotational invariance. This provides precise enough information to control the limiting eigenvalue distribution.

We consider the real symmetric case, that is, symmetric orthogonally invariant random matrices. Similar results hold for selfadjoint unitarily invariant random matrices, or also for corresponding random matrices without selfadjointness conditions.

In addition to what I already showed in the above mentioned previous post, we extended the results now also to a multi-variate setting. This means we can also take, again entrywise, a non-linear function of several jointly orthogonally invariant random matrices. The asymptotic eigenvalue distribution after applying such a function is the same as for a linear combination of the involved random matrices and an additional independent GOE. As an example, take the two random matrices XN=AN2– BN and YN=AN4+CNAN+ANCN, where AN, BN, CN are independent GOEs. Note that, by the appearance of AN in both of them, they are not independent, but have a correlation. Nevertheless, they are jointly orthogonally invariant — that is, the joint distribution of all their entries does not change if we conjugate both of them by the same orthogonal matrix. We consider now the non-linear random matrix max(XN,YN), where we take entrywise the maximum of the corresponding entries in XN and YN. Our results say that asymptotically this should have the same eigenvalue distribution as the linear model aXN+ bYN + cZN, where ZN is an additional GOE, which is independent from AN, BN, CN, and where a, b, c are explicitly known numbers. The next plot superimposes the two eigenvalue distributions.

If you want to know more, in particular, why this is true and how the coefficients in the linear model are determined in terms of the non-linear function, see our preprint Entrywise application of non-linear functions on orthogonally invariant random matrices.

What do determinants tell us about the eigenvalues of integer-valued matrices? — And what can we say about Fuglede-Kadison determinants of operator-valued semicircular elements?

Consider a matrix with integer entries, like

B=\begin{pmatrix} 1&2&1&3\\ 2&2&1&1\\ 1&1&2&3\\ 3&1&3&1 \end{pmatrix}

How can we convince ourselves that there are no two eigenvalues close to zero; say, you should convince me that two eigenvalues both with absolute value smaller than 0.1 are not possible. Of course, we should not calculate the eigenvalues but decide on this by arguments as soft as possible.

It is easy to get upper bounds for eigenvalues by the operator norm or the normalized trace of the matrix – those are quantities which are usually easy to control.

Not so clear are lower bounds, but actually it’s the determinant which will be of good use here. Namely, since the determinant is the product of the eigenvalues, knowing a lower bound for the absolute value of the determinant, together with the above mentioned easy upper bounds, gives lower bounds for the absolute values of the non-zero eigenvalues. In this example, we can calculate the determinant as -1, estimate the norm against the Frobenius norm of \sqrt{60} and can derive from this that no two eigenvalues of absolute value smaller than 0.1 are possible, because otherwise we would have the following estimate

\vert \text{det}(B)\vert \leq 0.1\times 0.1\times \sqrt{60}\times\sqrt{60}=.6

which contradicts the fact that \vert \text{det}(B)\vert=1.

Maybe you don’t like so much the idea of having to calculate the determinant, as this does not feel so soft. Assume however that I told you, as an oracle, that the matrix is invertible, then you can forget about calculating the determinant and just argue that the determinant of a matrix with integer entries must be an integer, too, and since it cannot be zero you can conclude that it must in absolute value at least be 1; and so you can repeat the above estimate.

Those observations are of course not restricted to the above concrete example; actually, I have hopefully convinced you that no invertible matrix with integer entries can have too many eigenvalues close to zero, and the precise estimate depends on the matrix only through its operator norm.

This kind of reasoning can actually be extended to “infinite-dimensional matrices”, which means for us operators in a finite von Neumann algebra. There we are usually interested in the distribution of selfadjoint operators with respect to the trace and in recent work with Johannes Hoffmann and Tobias Mai it has become important to be able to decide that such distributions do not accumulate too much mass in a neighborhood of zero. In finite von Neumann algebras there exists also a nice version of the determinant, named after Fuglede and Kadison, and the main question is whether the above reasoning survives also in such a setting. And indeed, it does for the operators in which we are interested. But this is for reasons which are still nice, but quite a bit more tricky than for ordinary matrices.

The operators we are interested in are operator-valued semicircular elements of the form S=\sum a_i\otimes s_i, where the s_i are free semicircular elements and the a_i are ordinary mxm matrices, with the only restriction that their entries are integers. We assume that such an S is invertible (as an unbounded operator) ; one can show that then its Fuglede-Kadison determinant \Delta(S) is not equal to zero. The main question is whether we have a uniform lower estimate for \Delta(S) away from zero. As there is now no formula for the determinant in terms of the entries of the matrix S, the integer values for the entries of the a_i are of no apparent use. Still we are able to prove the astonishing fact that for all such invertible operator-valued semicircular operators their Fuglede-Kadison determinant \Delta(S) is always \geq e^{-1/2}.

For all this and much more you should have a look at my newest paper Fuglede-Kadison determinants of matrix-valued semicircular elements and capacity estimates with Tobias. If you need more of an appetizer, here is also the abstract of this paper: We calculate the Fuglede-Kadison determinant of arbitrary matrix-valued semicircular operators in terms of the capacity of the corresponding covariance mapping. We also improve a lower bound by Garg, Gurvits, Oliveira, and Widgerson on this capacity, by making it dimension-independent.

If this is still not enough to get you interested, maybe a concrete challenge will do. Here is a conjecture from our work on the limiting behavior of determinants of random matrices.

Conjecture: Let m and n be natural numbers and, for each i=1,…,n, an mxm-matrix a_i with integer entries be given. Denote by \eta the corresponding completely positive map

\eta:M_m(\mathbb{C})\to M_m(\mathbb{C}), \quad b\mapsto \eta(b):=\sum_{i=1}^n a_iba^*_i

Assume that \eta is rank non-decreasing. Then consider n independent GUE random matrices X^{(N)}_1,\dots,X^{(N)}_n and let A_N denote the Gaussian block random matrix

A_N=\sum_{i=1}^n a_i\otimes X^{(N)}_i

of size mN x mN. We claim that A_N is invertible for sufficiently large N and that

\lim_{N\to\infty}\text{det}(A_NA_N^*)^{1/N}=\text{cap}(\eta)e^{-m},

where the capacity \text{cap}(\eta) is defined as the infimum of \text{det}(\eta(b)) over all b\in M_m(\mathbb{C}) for which det(b)=1.

Free probability, between math and physics (and also machine learning) – some updates

In the recent post of a similar title I mentioned some papers which related physics problems (eigenstate thermalization hypothesis or Open Quantum SSEP) with free probability. Let me point out that the title of the preprint by Hruza and Bernard has been changed to “Coherent Fluctuations in Noisy Mesoscopic Systems, the Open Quantum SSEP and Free Probability” and that there are some new and follow up preprints in this directions, namely “Spectrum of subblocks of structured random matrices: A free probability approach“, by Bernard and Hruza, and also “Designs via free probability“, by Fava, Kurchan, Pappalardi. In all of them free cumulants and their relations to random matrices play an important role. Not too surprisingly, I find this very interesting in general, but also in particular, as during my voyage in the machine learning world I became a bit obsessed with the fact that free cumulants are given by the leading order of classical cumulants of the entries of unitarily invariant matrix ensembles (with a “cyclic” or “loop” structure of the indices). This seems to be highly relevant – though, at the moment I am not actually sure for what exactly.

Anyhow, if anybody is interested in this, in the last lecture of my machine learning course I give a very high level survey on these relations, and in the video on Gaussian equivalence principle in the same course I talk about a more concrete model of this in the random features model context.

Summer school on Free Probability, Random Matrices, and Applications from June 6th to June 10th 2022 at the University of Wyoming

Zhuang Niu and Ping Zhong are organizing a summer school on Free Probability, Random Matrices, and Applications from June 6th to June 10th 2022 at the University of Wyoming. This event is planned to be held in person, but all lectures will have hybrid components for remote participation. 

The summer school will bring together leading experts, young researchers, and students working in free probability, operator algebras, random matrices, and related fields with the aim of fostering new communications and collaborations. Four mini-courses will be given by leading researchers in the field. In addition, there will be some survey talks or research talks. The mini-courses are designed for graduate students and young researchers. However, everyone is very welcome to participate in the meeting.  

Mini-courses: 

  • Hari Bercovici (Indiana), Complex analysis in free probability. 
  • Benoit Collins (Kyoto), Around the operator norm convergence of random matrices in free probability.  
  • Ken Dykema (College Station), On spectral distributions and decompositions of operators in finite von Neumann algebras 
  • Alexandru Nica (Waterloo), Recent developments related to the use of cumulants in free probability 

The conference website and registration form can be found at https://sites.google.com/view/rmmc2022. Please indicate in the registration form if you would like to contribute a research talk.  

Financial supports are available, and priority will be given to graduate students, junior researchers, and other participants without travel grants. The deadline of registration for full consideration of a financial aid is April 30th, 2022.  

Please share this information with anyone who might be interested in this meeting. 

Queen’s Seminar on Free Probability and Random Matrices

One of the weekly highlights during my times at Queens was the joint seminar with Jamie Mingo on free probability and random matrices. This seminar was of course going on after I left Queen’s, and I also installed different versions of that seminar in Saarbrucken, but at least for me it did not feel the same anymore. So I am happy that we decided now to join forces again and revive our joint seminar in online form. For now it is running on Thursdays at 10 am Eastern time, which is 4 pm German time. You can find the program on the seminar page. If you are interested, write to Jamie to be put on the mailing list.

Proof of Harer-Zagier now online

This blog started actually a couple of years ago as a (not very successful) discussion forum for the recordings of my lectures on Free Probability, Non-Commutative Distributions, Random Matrices, Mathematical Aspects of Quantum Mechanics. During the last year I have still been producing quite a few videos, but those are on mathematics for engineers (and also in German), so not of much interest in this context here. But the Christmas break gave me some motivation and energy to do something more elaborate. So I came back to the random matrix class, where one lecture was, by some technical reasons, not recorded during the original class two years ago. I did a retake of this missing class, and so finally the playlist for the lecture series on random matrices is now also complete. The topic of this new recording is the proof of the theorem of Harer-Zagier. It gives more or less the original proof from the paper by Harer and Zagier, which I find still fun and amazing – consisting of a mixture of analysis, combinatorics, and generatingfunctionology, which I am most fond of. If you want to have a look see here, here, here, and here. For videos without audience (as is now common during corona times) I have become used to splitting the whole lecture into smaller units — which also has the advantage that I can clean the black board in between.

Maybe I will also find time and energy in a (probably far far away) future to do a reshooting of the whole free probability course. This is of course closest to my heart, but as this was my first attempt on recording lectures I needed some time to become aware of the importance of the audio quality – which is so bad in those videos that they did not make it to youtoube …

PhD position for a project on “Free Probability Aspects of Neural Networks” at Saarland University

Update: the position has been filled!

I have funding from the German Science Foundation DFG for a PhD position on the interrelations between free probability, random matrices, and neural networks. Below are more details. See also here for the announcement as a pdf.

Project: Neural networks are, roughly speaking, functions of many parameters in a high-dimensional space and the training of such a network consists in finding the parameters such that the function does what it is supposed to do on the “training inputs”, but also generalizing this in a meaningful way to “real test inputs”. Random matrix and free probability theory are mathematical theories which deal with typical behaviours of functions (which have an underlying matrix structure) in high dimensions and in the limit of large matrix size. Thus it is not surprising that those theories should have something to offer for describing and dealing with neural networks. Accordingly, there have been approaches relying on random matrix and/or free probability theory to investigate questions around deep learning. This interaction between free probability and neural networks is hoped to be bi-directional in the long run. However, the project will not address practical purposes of deep learning; instead we want to take the deep learning challenges as new questions around random matrices and free probability and we aim to develop those theories further on a mathematical level.

Prerequisites: Applicants should have an equivalent of a Master’s degree and a background in at least one of the subjects

  • free probability
  • random matrices
  • neural networks

and an interest in learning the remaining ones and, in particular, in working on their interrelations.

Application: Inquiries and applications should be addressed to Roland Speicher. Your application, in German or in English, should arrive before June 20, 2021. It should contain your curriculum vitae and an abstract of your Master’s thesis. Arrange also for at least one recommendation letter to be sent directly to Roland Speicher, preferably by email. State in your application the name of those you asked for such a letter .

Contact:
Prof. Dr. Roland Speicher
Saarland University
Department of Mathematics
Postfach 15 11 50
66041 Saarbrücken
Germany
speicher@math.uni-sb.de
https://www.uni-saarland.de/lehrstuhl/speicher/

Correction on my lecture notes on random matrices (weak convergence versus convergence of moments)

I just noticed that I have a stupid mistake in my random matrix lecture notes (and also in the recording of the corresponding lecture). I am replacing the notes with a new version which corrects this.

In Theorem 4.16, I was claiming that weak convergence is equivalent to convergence of moments, in a setting where all moments exist and the limit is determined by its moments. Of course, this is a too optimistic statement. What is true is the direction that convergence of moments implies weak convergence. That’s the important direction. The other direction would be more of a relevance for combinatorial aficionados like me, as it would allow me to claim that the combinatorial and the analytical approach in such a setting are equivalent. However, the other direction is clearly wrong without some additional assumptions; and thus there are nice-looking situations where one cannot prove weak convergence by dealing with moments.

Of course this is not a new insight. In the context of proving the convergence to the semicircle for Wigner matrices with non-zero mean for the entries we know that we cannot do this with moments (see for example, Remark 11 in Chapter 4 of my book with Jamie).

To get a kind of positive spin out of this annoying mistake, I started to think about what kind of convergence we actually want in our theorems in free probability. Usually our convergence is in distribution, i.e., we are looking on moments – which seems to be the natural thing to do in the multivariate case of several non-commuting operators. However, we can also project things down to the classical world of one variable by taking functions in our operators and ask for the convergence of all such functions. And then there might be a difference whether we ask for weak convergence or for convergence in distribution (i.e., convergence of all moments).

This might become kind of relevant in the context of rational functions. Sheng Yin showed in Non-commutative rational functions in strong convergent random variables that convergence in distribution goes over from polynomials to rational functions (in the case where we assume that the rational function in the limit is a bounded operator) if we assume strong convergence on the level of polynomials (i.e., also convergence of the operator norms). Without the assumption of strong convergence it is easy to see that there are examples (see page 12 of the paper of Sheng) where one has convergence in distribution for the polynomials, but not for the rational functions. However, though one does not have convergence of the moments of the rational function, it is still true in this example that one has weak convergence of the (selfadjoint) rational function. So maybe it could still be the case that, even without strong convergence assumptions, convergence in distribution for polynomials (or maybe weak convergence for polynomials) implies weak convergence for rational functions. At least at the moment we do not know a counter example to this.

The saga ends …

I have now finished my class on random matrices. The last lecture motivated the notion of (asymptotic) freeness from the point of view of looking on independent GUE random matrices. So you might think that there should now be continuations on free probability and alike coming soon. But actually this part of the story was already written and recorded and if you don’t want to spoil the tension you should watch the series not in its historical but in its logical order:

  1. Random Matrices (videos, homepage of class)
  2. Free Probability Theory (videos, homepage of class)
  3. Non-commutative Distributions (and Operator-Valued free Probability Theory) (videos, homepage of class)

More information, in particular the underlying script (sometimes in a handwritten version, sometimes in a more polished texed version), can be found on the corresponding home page of the lecture series.

May freeness be with you …