Notes on some of Facebook's dev tools work

The scale of big tech companies means it makes sense to put a lot of work into developer tools, even if they provide only marginal improvements to any individual workflow. Some of the projects described on Facebook’s engineering blog in particular are really cool – and sometimes their utility also matches that coolness. I’ll summarize a couple of them here:

Anomaly detection

This tool finds statistically significant features from crash reports to aid in debugging. Lots of crash reports, with lots of (mostly irrelevant) information in each one, are difficult to interpret. There’s a huge upside if you can group reports into similar types of crashes, and also determine the features from similar crash reports that led to the crash. This article describes a way to automate the latter step.

Crash reports can be distilled to group of key-value pairs like { country: US, version: 4, ...}. The article calls thees contrast sets. The goal is to find contrast sets that are particularly relevant for some group.

For discrete data (like country: US ), we can work with these null hypotheses for statistical tests to determine relevance:

  • There exists some pair of groups G_i, G_j, where the likelihood of seeing the contrast set is significantly different between them
  • The difference is at least d for some pair of groups G_i, G_j

We can also use continuous data, like TD-IDF scores for a given page transition (treated a a bigram, eg, feed → photos). These show how common this interaction was in a crash report, relative to all the others.

The statistical test here is similar idea: the null hypothesis is that the average of the variable is the same across groups

Autofix suggestions

Static analysis can help you identify problematic code, but this tool, Getafix, goes a step above that – by suggesting a patch to fix it.

This is a tool that’s similar to publicly available services like https://lgtm.com/, but it’s cool to see the methodology explained here.

Getafix learns from past bug fixes to suggest new bug fixes. It’s a linter that also suggests intelligent changes, based on code that other developers have used to fix similar issues.

It works by training on diffs of the AST, doing anti-unification to group examples that differ only by variables, then building a model that takes the surrounding code into account.

In the pattern mining stage, it unifies edits with some element in common that can be swapped out. E.g., if(dog == null) and if(cat == null) get represented by the common if(n_0 == null)

It also supports multiple layers of context. if(dog == null) return; dog.drink(); and if(list == null) return; do{ list.pop } have the first line in common, but not the second line.

The context restricts the number of places where the fix can be applied, even if some of the context is unmodified. It can also decide between competing fixes

  • h0.h1()h0!=null && h0.h1()
  • h0.h1()if (h0 == null) return; h0.h1()`

The second is specific to function calls and therefore more likely to be applied, if it matches.

Automated performance regression testing

Facebook’s CI pipeline analyzes every code change for performance regressions. But when doing end-to-end performance regression testing, especially on a site as complex as Facebook, there’s a lot of variation in performance between runs that make these sorts of tests difficult to interpret. Here are some are the areas where the engineers had to work minimize variance between tests, and their solutions:

  • Data loaded. There’s a proxy to ensure that data servers serve a cached version for both the control and treatment versions of the experiment.
  • Code nondeterminism - eg, slow logging scripts that are run 1-in-10 times. The fix replaces random generators on the client side with seeded versions. This is necessary on the server side as well, since presumably A-B testing and the like means that the components returned are not the same every time. Here, each call site gets with a uniquely seeded RNG, since multithreading means that the same execution order is not guaranteed every time.
  • Environment differences - e.g. browser cookies and cache. Here, it’s just necessary to have a clean browser setup for each run of the tests.