Tuatara: Volume 2, Issue 1, March 1949
Some Basic Ideas in Statistical Method
Some Basic Ideas in Statistical Method
Although one may not subscribe to Kelvin's belief that our knowledge of a subject is meagre and unsatisfactory until one can measure what one is speaking about and express it in numbers, I believe it would be generally agreed that the introduction of quantitative methods into biology has led to significant advances.
Foremost among these methods has undoubtedly been statistical method; indeed, it was the needs of biology which gave the stimulus for the development of a large part of the recent advances in statistical theory. In this connection the names of Francis Galton, Karl Pearson, William Gosset (who wrote under the pseudonym of Student) and R. A. Fisher, should be familiar to all biologists.
Statistical method provides procedures for handling numerical data which are subject to uncontrolled variation. Examples of such data are to be found in any scientific investigation where measurements are made. The use of a collection of measurements on selected characteristics, e.g., beak, wing, leg and tail lengths of birds from different areas, as an aid to classification of the group into sub-species or races, is a case of some interest. If one takes the measurements of the birds of one species in a restricted area, the results will differ from bird to bird and there will certainly be a greater range of variation in the observations obtained from the examination of birds from a larger area, or from widely separated localities. The results will differ, although they may overlap to a certain extent, but due to basic variability there will be uncertainty whether the differences are due merely to the differences to be expected in sampling from one species, or whether they indicate a real difference between the groups examined.
An excellent example of a study of this problem for an ocean sea bird is set out in “The Life Cycle of Wilson's Petrel,” by Brian Roberts (1940). His analysis for birds collected from widely dispersed regions in the Southern ocean, has shown that there are four distinct subspecific races separable by measurements. The importance of the result obtained by this method is stressed because many past separations into races in birds, and in many other animal and plant groups has been done arbitrarily or from insufficient data, and can be shown to be invalid when subjected to statistical analysis.
Even in the best controlled and most precise experimentation, as for example in the physical determination of the velocity of light, uncontrolled variability is smaller than one would find, say, in the weights of grain in harvesting a field of wheat by small sections. But the difference is one of degree rather than one of kind. As a concrete page 10 example, consider the following results. An experimenter (ref. Edwards, T. I., Plant Physiology 9, 8 (1934)) planted 20 soya beans on each of 183 agar plates; the numbers germinating per place are summarised in the table:—
|On each plate||6||7||8||9||10||11||12||13||14||15||16|
|No. of plates||5||9||8||19||26||34||26||22||21||10||3||183|
Such an array is called a frequency distribution. It records the number of times each type of observation concerned occurs.
The type of variation shown here appears very frequently in biological observations. The most common observation (11 in the table) is somewhere near the middle of the whole range and the number of observations decreases steadily on each side of this.
Let us consider a few of the problems which arise in the handling of material of this nature. First there arises the question of further reducing the results and summarizing them in a few numbers which convey the salient features of the distribution. We require first a typical measure of the number of germinations per plate. The usual quantity employed for this is the arithmetic mean of the observations. For the table above the mean number of beans germinating on a plate is 11.2. The mean is an abstraction—there is no plate having this number of germinations—but it does give a measure of the general level of germination for this batch of material.
We next require some measure of the variation shown by the results. The simplest one is that known as the range, and this is the difference between the largest and smallest observations, here 16 minus 6 equals 10. One grave weakness of this is that it is based on limited number only of the observations and so it does not take account of the manner in which the numbers are distributed between the extremes. The most commonly used measure, and one derived from all the observations, is called the standard deviation. (This name should be regarded as a technical term, for there is nothing particularly standard in the ordinary sense about it). To calculate it we determine the difference between each observation and the mean, square this difference, sum all the squares so formed and then divide by the total number of observations, thus obtaining the mean square. Finally, in order to obtain a measure in the same units as the quantity observed, viz., number of beans, we take the square root of this mean square. The result is the standard deviation (s.d.). For the data it is 1.7. In a general way one can see that the number so obtained is a kind of average of the deviations of the observations from their mean and so it gives us a measure of the amount of variation in the material. It is found that these two numbers, the mean page 11 and the standard deviation, contain a great deal of information about the original table. Thus, providing the frequency distribution falls away symmetrically on each side of the mean, it will be found that approximatly two-thirds of the observations lie within the interval mean plus or minus 1 s.d., approximately nineteen-twentieths with mean plus or minus 2 s.d., and practically all within the interval, mean plus or minus 3 s.d. A proper appreciation of the relation of these two quantities to the original data can be obtained only by working with them on statistical material with which one is familiar. These two quantities are the minimum necessary to characterize a distribution. They are also the basic quantities employed in the further statistical analysis of observational material.
The next problem requiring consideration is the implication of the inherent variability for experimental investigations with the material. Suppose from the 183 sets of 20 beans we choose, by some random procedure, 25 sets, then we obtain the following germination counts: 10, 14, 6, 11, 7, 11, 11, 8, 7, 14, 14, 10, 10, 10, 9, 9, 10, 12, 12, 8, 13, 8, 10, 12, 14. giving a mean 10.4 and a s.d. 2.3. A second set is 14, 10, 9, 9, 14, 12, 10, 7, 11, 12, 10, 12, 14, 10, 8, 11, 11, 10, 13, 7, 11, 13, 14, 8, 7, having a mean 10.7 and a s.d. 2.2. We see that these two samples have means and s.d.'s different, but not greatly so, from the original 183 sets, that is, the inherent variability of the material involves differences in the numerical characteristics of samples from it.
The original group of 183 sets of 20 is itself a sample from the whole bulk (population is the term commonly used) of beans from which they were chosen. Its numerical characteristics reflect those of the bulk but undoubtedly differ from it to a greater or lesser extent. In carrying out the above experiment the experimenter presumably was interested in the behaviour of the whole bulk. How good an estimate of the germination rate of the bulk is the 11.2? If we find that it is not accurate enough for the purpose in hand, how can we improve it? One way that suggests itself intuitively is to take more observations. Can we determine how many so that we can achieve a prescribed degree of accuracy? Statistical method gives us answers to queries like these, admittedly under definite assumptions about our material and the way that we handle it, but assumptions that are adequately fulfilled in many cases of practical interest.
Suppose now that we wished to compare the germination rate of beans from two different sources of supply. We could take one set of 20 from each and determine the germination rate. Let us assume the results are 10 and 15. Judging by the variation shown in the distribution given earlier these could be from the same bulk. But they could equally well be from bulks having different germination page 12 rates, the 10 from one with a mean of 11, say, and the 15 from one with a mean of 14. How are we to decide? If we had only one result from each source we cannot. But if we submitted several sets from each source to experiment (i.e., introduce replication into the experiment), then we can obtain in addition to the mean rate corresponding to each source measures of the variability for each source. This we do by calculating the s.d. from the observations within each group. With this information it is possible to arrive at a decision whether the observed difference is a real one and not just a difference due to sampling from variable material.
In carrying out such a procedure we are performing what is called a test of statistical significance. Briefly the argument involved in all such tests is as follows. We make the hypothesis that the two lots of experimental material have the same germination rate (the “null hypothesis”), i.e., we regard our results as coming from the same bulk, then the calculations of the test give us a measure of the chance of obtaining, on the uniformity hypothesis, the difference actually observed. If this chance is moderate, e.g., 1 in 5, then we would assert that there is no difference in the rates for the two sources since the observed difference is one that could arise frequently (viz., 1 in 5 times) from uniform material. But if on the other hand the chance is small, e.g., 1 in 50, then we would probably rule that the uniformity hypothesis is false, in other words, the sources do differ in germination rate.
It may be noted that the application of tests of significance is greatly facilitated in most of the cases that arise by the fact that table are available which give the value of the relevant chances, after quite simple arithmetical calculations.
Since the outcome of the test of significance is expressed in the form of a chance it should be clear that there is no hard and fast dividing line between situations where we decide to accept or reject the null hypothesis. This decision requires considerations of a non-statistical nature, such as for example the economic consequences of taking one course of action or the other. It is, however, the convention in most biological work to adopt a chance of 1 in 20 (the 5 per cent. level of significance) as the critical point. This implies that we are prepared to take the risk once in 20 times of making a wrong decision regarding our hypothesis.
In addition to the replication requirement introduced above, there is a further one implied in the mathematical theory on which the test of significance is based. This is that the sample from each bulk be truly representative of that bulk. One way of achieving this is to take a random sample. Most people have an intuitive notion of what is meant by randomness, although it is a difficult idea to define. It is page 13 disconcerting to find to what extent unconscious personal biases enter when one attempts to produce a random sequence. Experience in the drawing of samples from any given bulk has made abundantly clear the impossibility of relying on individual judgment in this and so statistical method provides objective procedures for attaining this end.
A very useful development of statistical method is that which provides for the analysis of data in which each observation consists of more than one number. An illustration is afforded by investigations arising from herd-testing work among dairy cattle. For a group of cows we can obtain for each cow her butter-fat production as a two-year-old and her “maturity equivalent” production, this being defined as the average of the 4, 5, 6 and 7 year productions. Thus we have two production figures for each animal. We find, as might be expected that a knowledge of the 2-year production gives some indication of the maturity equivalent. For a certain group of 702 Jersey cows the maturity equivalent production ranged from 141 to 560 . butterfat, but those with a 2-year production in the interval 261-280, the maturity equivalent production lay between 231 and 470 . There was a similar narrowing of the range over all the 2-year groups.
Such pairs of observations are said to be correlated and there is a large part of statistical method concerned with the reduction and analysis of this type of data. One very important concept used is the idea of the regression line which originated in the statistical work of Francis Galton. If we calculate for the above data the mean maturity equivalent corresponding to the successive 2-year production values and plot results on graph paper, we find that the series of points suggests a definite curve, in this case a straight line. A mathematical expression can be set up for this line and this expression is called the regression line of the maturity equivalent production on the 2-year production. The idea of the regression line is of very wide applicability. One use to which it can be put in the present case is to give us a means of estimating what, on the average, any one cow's mature equivalent production will be, given her 2-year production. Providing our sample is representative we can use the formula to give us such an estimate for new batches of 2-year olds.
The ideas and methods worked out for data involving pairs of correlated observations have been extended, in the theory of multiple correlation and regression, to cover situations where we have three or more observations made on each individual.
In conclusion I wish to make some general remarks on the use and place of statistical method. It is but one of the aids available in the analysis of biological data and must not be allowed to obscure the fact that the primary interest of the investigation is biological. In page 14 order to obtain worthwhile results from the use of statistical method it is very necessary to have thought out clearly for what purpose one is collecting the material and to what uses the results of the investigation may be put. This presupposes some knowledge of the field of investigation concerned, and the lack of this may call for a preliminary survey before detailed studies are made. Statistical method is a powerful aid to clear thinking when it is used appropriately, but it cannot legitimately be made a substitute for such thought.
TIPPETT, L. H. C.—Statistics. Home Univ. Lib. 1944. A most readable non technical survey.
MATHER, K.—Statistical analysis in biology. Methuen 1946. This covers most of the statistical techniques required in biological work. The numerical calculations involved are well illustrated.
MATHER, K.—The measurement of linkage in heredity. Methuen 1938. A good account of statistical procedures in the analysis of genetical data.
FISHER, R. A.—The design of experiments. Oliver & Boyd. A unique book. The earlier chapters require no detailed statistical knowledge.
FINNEY, D. J.—Probit analysis. C.U.P. 1947. An account of statistical procedures as used in the biological assay of insecticides, drugs, vitamins, etc.
FISHER, R. A. & YATES, F.—Statistical tables for biological, agricultural, and medical research. Oliver & Boyd 1938. The introduction gives a variety of examples illustrating the use of the tables.
YULE, U. & KENDALL, M. G.—Introduction to the theory of statistics. Griffin. This is a classic. It views the subject broadly but makes no mathematical demands beyond quite elementary algebra.
There are numerous American textbooks written for the biologists and of very varying quality. Two good ones are Statistical methods by G. W. Snedecor and Methods of statistical analysis by C. H. Goulden.