APPENDIX J The Logic of Statistical Inference
Books - Narcotics Delinquency & Social Policy |
Drug Abuse
APPENDIX J The Logic of Statistical Inference
To the statistical laity, a sample that is not deliberately picked for the characteristics that will be studied is a "random" sample. The statistician's criteria of randomness are somewhat more complicated, and, from this point of view, the samples with which we are dealing in this study are not random ones. The point is germane because the logic of statistical inference and the mathematical formulas that are used in statistical inference are predicated on random sampling in the technical sense of the term. Such a state of affairs (i.e., nonrandom sampling) is not uncommon in behavioral science research (and, in fact, in all research involving human beings), and the use of inferential statistics is predicated on a fiction—that the samples involved are "quasi-random" samples of populations which are not precisely identified but which are hopefully not very different from the populations one is trying to sample. The effects of the fiction are that one cannot be certain of the sample characteristics with respect to which generalizations are relevant and, hence, that generalizations must be regarded with caution.
In the comparison of groups, in such an event, statistical inference offers, at best, a rough guide as to which differences to take seriously, i.e., as of sufficient magnitude to make it unlikely that they would arise by chance in sampling. The risk, however, that such significant differences are related to unidentified characteristics of the populations of which the comparison groups are samples, rather than to the characteristics with which one thinks one is dealing, cannot be assessed by any statistical method. The considerations of mathematical-statistical logic aside, we took the differences reported in Chapter V seriously because they were internally consistent (e.g., except when otherwise noted, they held up for each of the ethnic groups), coherent (i.e., when taken together, they told an intelligible story), and externally consistent (e.g., the differences between delinquents and nondelinquents were similar to those reported in other studies, even when drug use was held constant—a factor not considered in other studies comparing delinquents and nondelinquents). In relation to the third point (external consistency), we have repeatedly been told by people with experience with drug-users that we seem to have hit the nail on the head; and, with one exception not relevant to Chapter V (the point is discussed when it arises), no expert has ever suggested to us that we were mistaken in this.1
There is one point that may be troublesome to those who have not studied the logic of statistical inference. This is that differences of the same magnitude are sometimes described as statistically significant and sometimes as not. One circumstance in which this can happen is easily explained. Assuming random sampling, the larger the number of cases on which a statistic is based, the less likely it is to differ greatly from the true population figure. Hence, a difference of, say, 15 per cent between small samples is not so dependable as the same difference found between large samples.
A second circumstance is a bit more complicated. A statistic obtained by sampling a relatively homogeneous population is more dependable than one obtained by sampling a less homogeneous population; one is less likely to get a sample of atypical cases in the first instance than in the second. Hence, the dependability of a statistic depends, not only on the size of the sample, but also on the heterogeneity or variability of the population from which the sample was taken. Exactly how one takes into account the probable heterogeneity of the population is a matter too complicated to explain here. Suffice it to say, however, that this factor makes it possible to get differences of the same size between two samples which are sometimes dependable (statistically significant with a given risk of error) and, in other instances, not dependable, even though the numbers of cases involved are identical. The issue is further complicated by the number of groups being compared at the same time. We have followed the convention that, when more than two groups are being compared, no difference between any pair of groups is accepted as significant unless an over-all test indicates that there are significant differences-among the groups in the set. This, too, is a matter too complicated to discuss here.
To say that a difference is not statistically significant—i.e., that it is not large enough to justify the belief that it did not arise through the process of sampling and, hence, that it is not safe to generalize the fact of the difference to the populations involved—is not to say that the populations are actually alike in terms of the comparison. The statement concerns the evidence and not the fact. "We have no evidence that the difference would hold up if the entire populations could be compared" is not to be interpreted as saying, "If the entire populations were compared, there would not be any difference." The weight of the evidence is always on the side of what is actually found, but that evidence can be quite convincing or it can be quite weak. On the other hand, to say that a difference is significant is not to say that it certainly reflects a true difference between the populations. It is in the nature of probability that the improbable can happen. "Statistically significant" means, in a convention of the behavioral sciences, that the odds are at least nineteen-to-one that the difference did not arise by chance. This leaves one chance in twenty that it did, and, when many comparisons are based on the same samples, the odds that at least some of the "statistically significant" differences did arise by chance are considerably greater.
It is also possible, on the basis of statistical evidence, to assert positively that two or more groups are alike. With samples of the size with which we were dealing, however, the risks of making such statements erroneously are so great as to make them pointless.
1 For a more complete and nontechnical discussion of the logic of sampling, see Isidor Chein, "An Introduction to Sampling," in C. Selltiz, M. Jahoda, M. Deutsch, and S. W. Cook, Research Methods in Human Relations (New York: Holt, 1959).
< Prev | Next > |
---|