A timely post, for once: this paper is going to be presented next Tuesday at NIPS. This has the usual disclaimer that all my scanned notes have, but this time it’s because there’s probably too much detail, instead of too little. These notes are written in enough detail for someone with no background in statistical learning theory and not much background in statistics (like myself), but I don’t know if anyone with no background in SLT will want to read this paper.
This is a scanned set of notes for a paper from the learning theory reading group, which I’m posting here with some keywords so I can easily search for them. I doubt they’ll be useful for anyone else, but who knows?
My naive and uninformed view is that this is cool because it lets find convergence rates if you know something about the structure of the data without having to mess around with operator theory. These notes cover the basic results (plus a lot of the pre-requisites for understanding the basic results) and lasso estimates for sparse models; the other examples aren’t covered in full detail, but they do go into some detail on low rank matrices.
The estimation of high-dimensional parametric models requires imposing some structure on the models, for instance that they be sparse, or that matrix structured parameters have low rank. A general approach for such structured parametric model estimation is to use regularized M-estimation procedures, which regularize a loss function that measures goodness of fit of the parameters to the data with some regularization function that encourages the assumed structure. In this paper, we aim to provide a unified analysis of such regularized M-estimation procedures. In particular, we report the convergence rates of such estimators in any metric norm. Using just our main theorem, we are able to rederive some of the many existing results, but also obtain a wide range of novel convergence rates results. Our analysis also identifies key properties of loss and regularization functions such as restricted strong convexity, and decomposability, that ensure the corresponding regularized M-estimators have good convergence rates.