Disclaimer: these writeups aren’t detailed to be sufficient for learning or even making sense of the material; they’re just detailed enough to force myself to re-work all the theorems from scratch (and littered with enough keywords so that I can Google up my more detailed handwritten notes). I don’t have handwritten notes for the first few lectures, but I’m writing them up anyway, just to make myself prove all the theorems.
Learning Disjunctions (or Conjunctions): Set the initial hypothesis to everything (2n). Cross off at least one variable for each mistake: MB = . We basically do the same thing for conjunctions.
Decision Tree: See notes for definition. Size = number of nodes = .
How powerful are these compared to decision lists? Just consider parity! We need to remove the restriction that each block is a single boolean variable to express an arbitrary DT as a DL.
How large a DL do we need then? Call an n-decision list one with blocks of size n. We can obviously convert a DT to a DL by traversing the tree, and putting each traversal in a block. How can we do better? Consider the concept of rank: rank of a leaf = 0. Otherwise, rank = max(rank(,) when and rank otherwise.
What’s the rank of a tree with leaves? At most : by induction, number of leaves is at most . But any rank DT is a k-DL: Induct on the depth of the tree. Let $x_i$ be the variable at some level. WLOG, has rank . So our DL is the DL of with $x_i$, and the DL of .
So we can learn a DT in .
What’s the next step up in complexity from a DT? DNF (Sum of products for us EE folks). Why is it a step up? We can take any DT and turn it into DNF by traversing the tree and setting that traversal to one of the conjunctions.
Augmented Decision Tree: this is just a DT with DNF formulas at the leaves. If this is a t-DT, any poly sized DNF with terms over variables has a -DT. Consider all terms with length > t: take the most common variable, and put it at the root. Repeat.
Now we can extend our DL to have DNFs at the outputs instead of 0s and 1s. Say we have an augmented DT with rank R and term length t. We can turn an R-DT into a extended R-DL, and convert that into an (R+t)-DL by pushing variables from the DNFs into the decision blocks. This means we can turn an term variable DNF into a -DL. This is minimum when , so our MB is