The earliest use of statistical hypothesis testing is generally credited to the question of whether male and female births are equally likely (null hypothesis) which was addressed in the 1700s by John Arbuthnot (1710) and later by Pierre-Simon Laplace (1770s). Arbuthnot examined birth records in London for each of the 82 years from 1629 to 1710 and applied the sign test a simple non-parametric test. In every year the number of males born in London exceeded the number of females. Considering more male or more female births as equally likely the probability of the observed outcome is 0.5 or about 1 in 4 8360 0000 0000 0000 0000 0000; in modern terms this is the p-value. Arbuthnot concluded that this is too small to be due to chance and must instead be due to divine providence: "From whence it follows that it is Art not Chance that governs." In modern terms he rejected the null hypothesis of equally likely male and female births at the p = 1/2 significance level. Laplace considered the statistics of almost half a million births. The statistics showed an excess of boys compared to girls. He concluded by calculation of a p-value that the excess was a real but unexplained effect. In a famous example of hypothesis testing known as the Lady tasting tea Dr. Muriel Bristol a female colleague of Fisher claimed to be able to tell whether the tea or the milk was added first to a cup. Fisher proposed to give her eight cups four of each variety in random order. One could then ask what the probability was for her getting the number...