Mistake Bound Model and Learning Decision Lists (Computational and Statistical Learning Theory, Lecture 1)

Scribing and writing up summaries of the gnofai MCMC tutorial made me realize how useful it is to force myself to review my notes by writing up summaries, so I’m going to write up searchable summaries of my handwritten notes for the Computational/Statistical Learning Theory class Adam Klivans and Pradeep Ravikumar ran last semester. I missed the first month of class due to a scheduling conflict, but I’ll write up summaries of those lectures anyway.

Apropos of nothing, I’m still shocked by how friendly and accommodating people are here at UT. Adam was nice enough to move the class to a time that would work for me even though I was just another nameless first-semester grad student from another department.

WARNING: these writeups aren’t detailed to be sufficient for learning or even making sense of the material; they’re just detailed enough to force myself to re-work all the theorems as exercises (and detailed enough that I can Google up my handwritten notes).

Mistake Bound (MB) model: the learner wants to learn some $f$. The learner gets some unlabeled example, $x \in \{0,1\}^n$, makes a prediction based on $h$, and then finds out the truth, $f(x)$. The learner updates if it made a mistake (which is equivalent to always being able to update for obvious reasons).

Definition: A concept class, $C$, over $\{0,1\}^n$ is efficiently learnable in MB if there exists an algorithm $A$ such that for any $c \in C$, A has an MB $M(c) = poly(n,size(c))$ and its running time in each trial is also $poly(n,size(c))$, where $size(c)$ is the description length of c under some encoding scheme. If the MB is $poly(f)$ instead of $poly(n)$, i.e., if it’s poly in the “size” of the function being learned, as opposed to poly in the number of features in the domain (which we care about in the high-dimensional case), then $A$ is attribute efficient.

Learning Decision Lists: We can learn a DL ($\{0,1\}^n \rightarrow \{0,1\}$) of length k with MB $O(nk)$. Start with everything at the top of the list, and push it down every time we make a mistake, a.k.a., use the bag algorithm. This gives us $2nk$. Note that this doesn’t actually produce a DL, but the hyp we get will be consistent with $c$. Of course we can get a DL from our list of bags by picking arbitrarily from each bag.

’05 scribe notes (which seem to be broken at the moment)
’06 scribe notes