en
The blog post discusses the use of sequential anytime-valid hypothesis testing procedures for safe software releases at Netflix. It highlights the importance of quickly identifying differences in data streams and the need for statistical procedures to detect changes in distributions during canary testing. The post explains the limitations of fixed-n tests and the benefits of sequential testing in detecting regressions early while controlling false positives. It also introduces the concept of sequential confidence bands and sequential p-values for monitoring changes in distributions. The impact of this testing approach at Netflix is discussed, emphasizing its role in preventing bugs and performance regressions, thus enabling more frequent software releases with reduced risks. A case study is presented to illustrate how the system detected changes in play-delay quantiles in a canary test. The post concludes by hinting at future developments in sequential testing procedures for count data.