APPENDIX A Statistical Appendix |
![]() |
Written by Bruce D Johnson |
Thursday, 14 February 2013 00:00 |
In attempting to analyze sociological data, the researcher has a wide variety of statistical techniques available such as the correlation (Pearsonian r), multiple and partial correlation, and various measures of association such as gamma, tau, and chi square. Such statistics are very convenient in summarizing a great deal of data with a few numbers. Despite this advantage, the present study depends upon what sociologists call cross-tabulation or multivariate analysis techniques, which utilize percentages almost exclusively. Percentages have one intrinsic advantage: they can be easily understood by the layman who is not acquainted with more advanced statistical techniques. In addition, cross-tabulation permits close examination of the behavior of specific subgroups. Furthermore, such data can be presented easily in graph form to help the reader visualize the important findings. The major disadvantages of multivariate analysis are that many different percentages are required and may be somewhat confusing at first and that certain theoretically crucial subgroups in a population may not contain a sufficient number of respondents (cases) to compute a percentage stable enough to be compared to another percentage. But the author feels that the advantages of clarity and understandability outweigh the disadvantages, especially since the sample obtained is quite large.
DETERMINING STATISTICAL SIGNIFICANCE Nevertheless, it is sometimes important to know whether a difference is significant or not. Information about significance can help to support or reject a hypothesis, especially where a relationship is relatively weak. Since our sample size is large and the true variation of a biased sample unknown, it is important that all estimates of significance be conservative; it is dangerous to accept a relationship as significant when in fact it is not. Therefore, in Table 14, instead of interpolating the percent difference needed to achieve significance, it is better to use information given for the immediately smaller number of cases. For example, if a percentage is based upon 950 cases, it is better to look in the row or column labeled "750" than the one labeled "1000"; this will increase the probability that a research finding is statistically significant. To utilize the information in Table 14, it is necessary to understand how the data are presented in this appendix and graphs in the text. The percentages are one- or two-digit numbers that appear immediately above two- to four-digit numbers enclosed in parentheses; the latter is the number of cases upon which the percentage is based. In the graphs, the height of the bar is the percentage, with the number of cases presented immediately below the bar.
To illustrate how to use Table 14 and to show some of the possible weaknesses of significance tests, we will utilize one of the book's main findings, which casts doubt upon the assumption of the Bureau of Narcotics that marihuana use leads to heroin use. In Table 22 (also see Graph 6.3), the first row (persons with no heroin-using friends), we find ".3" over "(1422)" for noncannabis users and "5" over "(318)" for weekly or more cannabis users. This indicates that among those without heroin-using friends, 0.3% of 1422 (base number of cases) noncannabis users versus 5% of the 318 weekly cannabis users have themselves tried heroin. Thus, we may ask, among those lacking heroin-using friends, are weekly cannabis users significantly more likely than noncannabis users to use heroin? Is this 4.7% (5.0%-0.3%) difference statistically significant or does it happen by chance? Since there are 1422 noncannabis users and 318 weekly or more cannabis users, we look in Table 14 for the numbers immediately smaller than these numbers of cases, that is, 1000 and 300. Then reading down the column headed "1000" and across the row labeled "300" in the section "for percentages around 10% or 90%," we find the number 4. This indicates that the 4.7% difference is probably statistically significant; there are less than 5 chances in 100 that this finding occured because of sampling error or other random factors. We have several objections to the following conclusion, which might be drawn from this finding: "Even among those without heroin-using friends, there is a slight, but significant, relationship between marihuana use and heroin use." First, the 5% figure among weekly cannablis users is very suspect, since our research design oversampled heavy drug users, a few of whom could have ended up in that cell. Second, among those with no heroin-using friends, the 4% difference in heroin use between weekly-or-more versus weekly-or-less cannabis users is not significant according to the table: it needs to be 5%. But, third, is this statistically significant 4.7% difference socially significant?' Can we assume that some pharmacological factor in marihuana use causes heroin use even among those with no heroin-using friends? Can public policy be based on such a flimsy finding? We suggest that the answer is no; at least until other factors are held constant. By isolating such a finding for discussion, we ignore the socially significant finding; persons without heroin-using friends are very unlikely to use heroin when compared with those having intimate friends. Furthermore, focusing on this one finding detracts from the pattern of findings for all hard drugs; those without friends using a hard drug are unlikely themselves to try it, even though regular cannabis users are somewhat more likely to try each drug than noncannabis users or less regular users. Thus, the reader is urged to utilize these materials on statistical significance as an aid or guide to forming judgments about the interpretations proposed in this book. Significant tests should not be substituted for theoretical thinking upon which the empirical evidence is brought to bear in a meaningful way. READING AND INTERPRETING THREE-WAY (N-WAY) TABLES In large samples, where small differences are frequently statistically significant, it is important to investigate what happens to a significant relationship between an independent and dependent variable when another variable is held constant. Formally, the methodology utilized in this book is called multivariate analysis. This mode of analysis is outlined in Lazarsfeld and Rosenberg, and Hyman;4 with further elaboration and a multitude of examples of the technique provided by Rosenberg and Zeise1.5 In an attempt to make our reasonably complex data easier to understand, we have developed a relatively standard format for presenting the tables in this appendix. (We have discussed this methodology as it applies to graphs in the footnote of Graph 4.3). When multivariate analysis and the format of tables is understood, the data in this Appendix can he read quickly and meaningfully. In addition, these tables provide the data which may be examined in considerably greater detail than that provided in the text or graphs. Central to understanding the tables in this appendix and the book is to ask several important questions. If asked in the correct order, these questions will help test the validity of the hypothesis in question. We will list these questions and then provide an example of how to answer them.
(1) How many variables does the author present in a given table? With these questions in mind, the following table and explanation should be carefully examined. Table 15 contains the data upon which the top chart of Graph 8.1 is based. Answers to the nine questions raised above can determine whether marihuana use is a fundamental cause of political militancy, the hypothesis discussed in Chapter 8. Each question is answered in turn. Questions 1-2 The heading of Table 15 states that there are three variables involved in the table: "Percent High on Political Militancy" indicates that the dependent variable is the Political Militancy Index (developed in Chapter 8). Furthermore, the table title indicates that each cell contains information about only one category of the dependent variable: those high on the variable; excluded are those with "none" and "some" involvement in militancy. For example, excluding the percent difference (% diff.) row and column, each cell in the table should be read as in the following example: in the "buy cannabis" column and "irregular" row, "14" appears over "(425)." This means that there arc 425 students who are irregular (less than weekly) cannabis users and have only purchased cannabis; of these 425, 14% are high on the Political Militancy Index, and by implication, 86% (100%44%) of the 425 are not high on the Militancy Index. Another example: in the "total" column and "regular" row, there are 642 persons using cannabis regularly (weekly or more); 24% of these regular users are high on the Political Militancy Index (without reference to drug buying or selling).
Question 3
Question 4
Question 5 However, the present study utilizes what is probably the simplest measure of the strength of an association: the percent difference. The percent difference is abbreviated "% diff." in the tables that follow and %Y (where Y is a symbol for the dependent variable) in the graphs in the text (see footnote of Graph 4.3). With linear variables, the percent difference is usually computed for one category of the dependent variable (high on militancy) by subtracting extreme categories of the independent variable. Thus, the militancy of regular minus noncannabis users provides a 21% (24%-3%) difference. This difference indicates that regular cannabis users are 21% more likely to be militant than noncannabis users. However, most graphs in tables for Chapters 7-10 are somewhat unusual because of the test factor, the Illicit Marketing Index. The general purpose of these tables is to demonstrate that drug buying and selling is more important than marihuana use in determining militancy and other dependent variables. But nonmarihuana users cannot be compared with users, because almost all buyers and sellers use cannabis. There were only nineteen noncannabis users who bought or sold drugs. Thus, we must compare the militancy of regular versus irregular cannabis users, and exclude from consideration all noncannabis users (eliminate the first row). The data show that regular users are 10% (% diff. = 24%-14%) more likely than irregular cannabis users to be high on political militancy. This 10% difference is entered in the "% diff." row, "total" column (the 10 before the slash). This percent difference summarizes the strength of the two-way relationship between the independent and dependent variable.
Question 6 To show that a third factor, and not marihuana use, is responsible for militancy, we hold constant the Illicit Marketing Index. To hold constant this test variable, we separate our cannabis users into five subgroups: those who have neither bought cannabis nor sold drugs ("none"), those who have only bought cannabis ("buy cannabis'), those who have sold cannabis ("sell cannabis"), those who have sold one or two hard drugs ("1-2"), and those who have sold three or more hard drugs ("3+"). Within each subgroup, the militancy of regular versus irregular cannabis users is compared. In the first subgroup, nonbuyers and nonsellers, we find that regular cannabis users are 3% (10%-7%) more likely than irregular users to be militant. Among cannabis buyers the difference between regular and irregular users is 3% (17%-14%); among cannabis sellers, a 2% (20%-18%) difference; among sellers of one or two hard drugs, a — 1% difference (29%-30%); and among sellers of three or more drugs, a 2% difference (34%-32%). Each of these percent differences are placed in the "% diff." row and the appropriate column for each subgroup. Another observation from these data shows that regular marihuana users who are not involved in drug selling are considerably less likely to be militant (10%, 17% vs. 30%, 33%) than irregular cannabis users who sell hard drugs.
There is a simple statistic to summarize these percent differences for each of the five subgroups; this statistic is called the average percent difference (APD). This is computed by adding the percent differences from each of the subgroups and dividing by the number of subgroups. In this example, we add the five subgroup percent differences in the "% diff." row and divide by 5. Thus, Another way to understand the meaning of the APD is that the independent variable ,(marihuana use) has an independent or direct effect of 2% upon the dependent variable (militancy) when the test factor (Illicit Marketing Index) is held constant. Hence, marihuana use has a very minor direct effect upon militancy. Generally, a reduction of half to two-thirds of the original relationship is sufficient to conclude that the effect of an independent variable upon a dependent variable is greatly diminished when the test factor is held constant.
Question 8
Question 9 Once results such as this have been obtained, a researcher must attempt to understand theoretically why the Illicit Marketing Index is so strongly related to political militancy. A theory of increasing drug-subculture participation is the explanation presented in the text for this intriguing result. Indeed, without this explanation the link between drug selling and a wide variety of factors appears to be almost nonsensical. PRESENTATION OF TABLES IN THIS APPENDIX The above example of multivariate analysis is typical of the majority of tables in this appendix; however, many tables upon which graphs in the text are based have not been presented. There are three basic reasons for not including these tables in this appendix. First, the data presented in the text may be virtually complete, as is true of two-way relationships presented in Graphs 2.1, 3.1-2, and 4.1 and tables presented in the text. Second, in Graphs 4.2, 4.4, and 7.1 our argument does not try to demonstrate that one variable is more important than another, so detailed tables have not been presented. But most importantly, many three-way tables are not presented because they can be easily constructed from data given in the graphs combined with information given in the text (as with Graph 4.3) or with data given in Table 24 (for Chapters 7-10). The following discussion demonstrates how the interested reader can develop such three-way tables from Table 24 and graphs in Chapters 8.10). Suppose one wished to develop a three-way table examining why persons were contacted by police for drug violations: because of their marihuana use or actual involvement in selling drugs. Information in Graph 10.2 (top chart) would provide most of the information needed. The following steps will provide sufficient information to construct Table 16. In the table each of these steps is identified in small letters.
(a) Devise a title for the table that states clearly the dependent variable, independent variable, and test factor(s).
(d) From Graph 10.2 take the percentage and base number of cases for each combination of the independent variable and test factor and place it in the appropriate columns of the table. For example, the two rightmost bars in Graph 10.2 should go in the fifth column, second and third rows of Table 16.
Following the steps outlined in this appendix will permit the reader to utilize
REFERENCES
|