As with all my handwritten notes, this has the usual disclaimer: these posts are just so I can use nice indexed search to find my notes, which are sufficient for me to recall talks and papers but probably not much use to anyone else. Paper here. Related slides here
Bug-fixes are widely used for predicting bugs or finding risky parts of software. However, a bug-fix does not contain information about the change that initially introduced a bug. Such bug-introducing changes can help identify important properties of software bugs such as correlated factors or causalities. For example, they reveal which developers or what kinds of source code changes introduce more bugs. In contrast to bug-fixes that are relatively easy to obtain, the extraction of bugintroducing changes is challenging.
In this paper, we present algorithms to automatically and accurately identify bug-introducing changes. We remove false positives and false negatives by using annotation graphs, by ignoring non-semantic source code changes, and outlier fixes. Additionally, we validated that the fixes we used are true fixes by a manual inspection. Altogether, our algorithms can remove about 38%~51% of false positives and 14%~15% of false negatives compared to the previous algorithm. Finally, we show applications of bug-introducing changes that demonstrate their value for research.