The dangers of breadth-first search (or Tom, here’s that entry you wanted)

Hey Tom,

You said you’d write for this blog if I penned an entry asking you to write; well, here it is: I’m calling you out.

Joel and I decided to start an epistolary blog a few years back as a way to continue sharing random ideas and facts with each other after I moved away from Madison. Joel came up with the title and URL — breadth-first, after breadth-first search, because we both tend to skim from topic to topic, preferring to learn nothing about everything to learning everything about nothing. That works pretty well for math-y topics, but it’s dangerous when it comes to arts-y things, or even the social sciences, especially for someone coming from a hard science background.

We humans are good at recognizing patterns; in math, when you explore what seems like it might be a tenuous connection between two things which may or may not be related, you’ll often find a deep fundamental connection. Unfortunately, superficial connections are easy to find, too, and the signal-to-noise gets higher the farther you move from pure math: witness all the people who think their system will let them predict the stock market, horse races, or even future events. Closer to home, look at Paul Graham’s Hackers and Painters essay. It’s a slick, well-written essay that’s convincing to hackers, but the moment he posted it, a number of painters responded, saying he didn’t know anything about painting, that there’s no connection at all between hacking and painting, and that the fortune-teller-vague connection he was drawing could be made to anything. More recently, the same thing happened when Joel Spolsky blogged about technical jobs vs. the star system Hollywood: within hours, people who were actually familiar with the star system responded, saying Joel has no idea what he’s talking about.

So why start a blog with a broken premise? Freewheeling conversations where you draw connections between disparate topics are fun, even if they’re ultimately flawed, and if someone comes along and completely refutes what we wrote, so much the better. Learning something new is worthwhile. But now that my co-blogger has gone silent, I feel a bit ridiculous writing on random topics I know nothing about. With the division of labor in the blogosphere, there are experts in the field writing on any particular topic, so who am I to write anything about anything?

I was just watching the Marx Brothers, which made me think of Aristophanes. Don’t you think Horse Feathers : The Clouds :: Duck Soup : Lysistrata :: A Night at the Opera : The Frogs? That’s the sort of thing I’d bring up in casual conversation, or write about in an epistolary-style blog. But, no doubt there’s some classics Ph.D. out there who’s a rabid Marx Brothers fan, someone who’s watched their films countless times, teasing out the connections between every theme, doing an Empsonesque close reading of every shot, every take, every line, so what makes me think I’m qualified write about it? Now that this has become a solitary blog, it feels narcissistic to write hundreds of words on something I know so little about.

Here are some other things I’ve thought about this weekend but haven’t written up, for the same reason: the connection between Zecharias Frankel, Solomon Schechter, and Reform Judiasm, and prescriptivism vs descriptivism in linguistics; doing philosophy in CS terms (Rawls = worst-case analysis, the philosophy 101 caricature of utilitarianism is average-case analysis, so what does Spielman’s smoothed analysis looked like in philosophical terms?); and applying Elinor Ostrom’s work to software engineering. If you want to start writing, perhaps we could turn the blog back that way.

Another direction would be to write things that are actually informative. You have decades of experience writing assembly and microcode. Surely you’ve got some curmudgeonly advice for us whippersnappers who think you can do everything in a language like Scala, perhaps using macros or some other code generation framework when we really need to dip into assembly.

If that’s not explicit enough for you, how about writing up reviews of the Teaching Company courses you listen to during your commute? You’ve never steered me wrong, but things have been hit-or-miss when I’ve selected courses at random. The reviews wouldn’t have to be very long, even short blurbs, like these, would be useful. I just got through listening to Erickson’s Philosophy as a Guide to Living, which was a total waste of time. The man speaks so slowly that I had to crank my player up to 4x speed to avoid falling alseep, and even then, the information throughput was still zero. 4 x 0 = 0. He has three lectures on Kant, but never goes deeper than saying things like “Kant says if you assume X, you get a contradiction, and if you assume not X, you get a contradiction, so there’s a fundamental problem”, without ever explaining where the contradictions arise; it’s just a laundry list of facts. He summarizes Plato by saying that Plato was concerned with getting out of the cave. That’s it. If you’re already familiar with Plato, you’ll know what he’s referring to and learn nothing new, and if not, the reference to the cave will be meaningless. Why even bring it up? My experience has been that about half of all TC courses are as pedagogically worthless as this one. I’ve gotten bits and pieces of reviews from you scattered across perhaps five different emails, plus another twenty or thirty recommendations and anti-recommendations that you’ve mentioned in person (that I’ve completely forgotten). I would love it if you could write-up one review of all the great courses that are worth listening to on my commute.

P.S. If you want an alternative to Erickson’s TC course, Adrian Moore does a better job of explaining Kant in a fifteen minute podcast than Erickson does in his three half-hour lectures.

The dangers of breadth-first search (or Tom, here’s that entry you wanted)

Probabilistic Inference Using Markov Chain Monte Carlo Methods, Parts 3 & 4

These notes are based on the gnofai tutorial that covers Radford Neal’s MCMC review (pdf). Sanmi’s writeup is here. This post is mainly to index my scanned notes with keywords, so I can search for them.

We started off with a review of the motivation behind this tutorial series. What do we want? E[f]. Why? P(y|x) can be written as E[y|x]

E[f] = \int f(x)p(x)dx, but we don’t know our distribution. So take some z ~ p(x). We want \frac{1}{n} \sum f(z) \rightarrow E[f] as n \rightarrow \infty.

This is where markov chains come in: we get some p_n(x) \rightarrow p(x); if our MC is ergodic, we’ll converge to p(x) regardless of our starting state, p_0., i.e., we want to reach p(x) = p(x)T(x).

The practical concerns Sanmi mentioned where work per transition, time to converge, and the number of steps we need to get IID draws from our distribution once we’ve converged.

Then there was a brief review of Gibbs sampling, to motivate the Metropolis algorithm. Recall that in Gibbs sampling, our local transitions, B_k, the marginals, p(x_k | X_{\backslash k}). With Gibbs, we fix all but one variable, sample from / change that variable, and iterate, but this requires getting the partials. Sometimes we can’t do that, and that’s when we want to use Metropolis, which is what today’s notes are about.

Unfortunately, the explosive decompression of a pen destroyed my notes for part 4, but as always, Sanmi has a nice writeup

Probabilistic Inference Using Markov Chain Monte Carlo Methods, Parts 3 & 4

Guitar Hero for Conductors?

I just got back from the Dance Repository Theatre’s preview of CanciĆ³n del Cuerpo, which was intriguing enough that I’ll want to see it when it’s finished.

One interesting gimmick William Meadow’s use of Wiimotes to control the score. I can imagine a fun version of Guitar Hero or Rock Band where you conduct with Wiimotes, but I’m not sure I see the value in a real performance (other than the novelty). If you’re going to rely on digital recordings, having a human at the helm is like having a human umpire call balls and strikes: the human element can only add mistakes.

I can understand the appeal of a live orchestra: even my uncultured non-audiophile ears appreciate the difference between a real orchestra and a pair of speakers. But once you have the speakers, what’s the point of having a live conductor? Why wouldn’t you just use the best recording?

Guitar Hero for Conductors?

Notes on S. Negahban, P. Ravikumar, M. J. Wainwright and B. Yu, A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers

A timely post, for once: this paper is going to be presented next Tuesday at NIPS. This has the usual disclaimer that all my scanned notes have, but this time it’s because there’s probably too much detail, instead of too little. These notes are written in enough detail for someone with no background in statistical learning theory and not much background in statistics (like myself), but I don’t know if anyone with no background in SLT will want to read this paper.

This is a scanned set of notes for a paper from the learning theory reading group, which I’m posting here with some keywords so I can easily search for them. I doubt they’ll be useful for anyone else, but who knows?

My naive and uninformed view is that this is cool because it lets find convergence rates if you know something about the structure of the data without having to mess around with operator theory. These notes cover the basic results (plus a lot of the pre-requisites for understanding the basic results) and lasso estimates for sparse models; the other examples aren’t covered in full detail, but they do go into some detail on low rank matrices.

Continue reading “Notes on S. Negahban, P. Ravikumar, M. J. Wainwright and B. Yu, A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers”

Notes on S. Negahban, P. Ravikumar, M. J. Wainwright and B. Yu, A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers