# Notes on Gopalan and Radhakrishnan, Finding duplicates in a data stream

As with all my handwritten notes, this has the usual disclaimer: these posts are just so I can use nice indexed search to find my notes, which are sufficient for me to recall talks and papers but probably not much use to anyone else. Slides here. Paper here

Given a data stream of length $n$ over an alphabet $[m]$ where $n > m$, we consider the problem of finding a duplicate in a single pass. We give a randomized algorithm for this problem that uses $O((\log m)^3)$ space. This answers a question of Muthukrishnan and Tarui, who asked if this problem could be solved using sub-linear space and one pass over the input. Our main tool is an Isolation Lemma that reduces this problem to the task of detecting and identifying a Dictatorial variable in a Boolean halfspace.

This paper is actually from SODA09, but I’m tagging it SODA10 so it’s grouped with similar work; I’ll tag it SODA09 if I go through more SODA09 papers.