APPENDIX A Statistical Appendix

Written by Bruce D Johnson

Thursday, 14 February 2013 00:00

In attempting to analyze sociological data, the researcher has a wide variety of statistical techniques available such as the correlation (Pearsonian r), multiple and partial correlation, and various measures of association such as gamma, tau, and chi square. Such statistics are very convenient in summarizing a great deal of data with a few numbers. Despite this advantage, the present study depends upon what sociologists call cross-tabulation or multivariate analysis techniques, which utilize percentages almost exclusively.

Percentages have one intrinsic advantage: they can be easily understood by the layman who is not acquainted with more advanced statistical techniques. In addition, cross-tabulation permits close examination of the behavior of specific subgroups. Furthermore, such data can be presented easily in graph form to help the reader visualize the important findings. The major disadvantages of multivariate analysis are that many different percentages are required and may be somewhat confusing at first and that certain theoretically crucial subgroups in a population may not contain a sufficient number of respondents (cases) to compute a percentage stable enough to be compared to another percentage. But the author feels that the advantages of clarity and understandability outweigh the disadvantages, especially since the sample obtained is quite large.

DETERMINING STATISTICAL SIGNIFICANCE
In the text, we have presented many tables and graphs without reference to statistical significance. We have generally ignored the question of significance for four basic reasons: (1) Our sample was not selected on a random basis (an important assumption of significance tests). Schools were selected for inclusion on the basis of their characteristics and cooperation (see Appendix B and Chapter 2). (2) Several groups that contained a large proportion of heavy drug users were oversampled. Yet the information from each respondent was weighted equally. (3) All too frequently, tests of significance are used as a substitute for thought and analysis of significant hypotheses.' Rather than analyze the "why" and "how" questions, researchers sometimes count the number of significant differences and assume that this explains something.2 (4) The major reason for lack of concern about statistical significance lies in the fact that with a large sample, most differences between two subgroups are significant. As a rule of thumb, in the present survey, almost any two-way relationship with a 6% or greater difference is statistically significant at the .05 level.

Nevertheless, it is sometimes important to know whether a difference is significant or not. Information about significance can help to support or reject a hypothesis, especially where a relationship is relatively weak. Since our sample size is large and the true variation of a biased sample unknown, it is important that all estimates of significance be conservative; it is dangerous to accept a relationship as significant when in fact it is not. Therefore, in Table 14, instead of interpolating the percent difference needed to achieve significance, it is better to use information given for the immediately smaller number of cases. For example, if a percentage is based upon 950 cases, it is better to look in the row or column labeled "750" than the one labeled "1000"; this will increase the probability that a research finding is statistically significant.

To utilize the information in Table 14, it is necessary to understand how the data are presented in this appendix and graphs in the text. The percentages are one- or two-digit numbers that appear immediately above two- to four-digit numbers enclosed in parentheses; the latter is the number of cases upon which the percentage is based. In the graphs, the height of the bar is the percentage, with the number of cases presented immediately below the bar.

To illustrate how to use Table 14 and to show some of the possible weaknesses of significance tests, we will utilize one of the book's main findings, which casts doubt upon the assumption of the Bureau of Narcotics that marihuana use leads to heroin use. In Table 22 (also see Graph 6.3), the first row (persons with no heroin-using friends), we find ".3" over "(1422)" for noncannabis users and "5" over "(318)" for weekly or more cannabis users. This indicates that among those without heroin-using friends, 0.3% of 1422 (base number of cases) noncannabis users versus 5% of the 318 weekly cannabis users have themselves tried heroin. Thus, we may ask, among those lacking heroin-using friends, are weekly cannabis users significantly more likely than noncannabis users to use heroin? Is this 4.7% (5.0%-0.3%) difference statistically significant or does it happen by chance? Since there are 1422 noncannabis users and 318 weekly or more cannabis users, we look in Table 14 for the numbers immediately smaller than these numbers of cases, that is, 1000 and 300. Then reading down the column headed "1000" and across the row labeled "300" in the section "for percentages around 10% or 90%," we find the number 4. This indicates that the 4.7% difference is probably statistically significant; there are less than 5 chances in 100 that this finding occured because of sampling error or other random factors.

We have several objections to the following conclusion, which might be drawn from this finding: "Even among those without heroin-using friends, there is a slight, but significant, relationship between marihuana use and heroin use."

First, the 5% figure among weekly cannablis users is very suspect, since our research design oversampled heavy drug users, a few of whom could have ended up in that cell. Second, among those with no heroin-using friends, the 4% difference in heroin use between weekly-or-more versus weekly-or-less cannabis users is not significant according to the table: it needs to be 5%.

But, third, is this statistically significant 4.7% difference socially significant?' Can we assume that some pharmacological factor in marihuana use causes heroin use even among those with no heroin-using friends? Can public policy be based on such a flimsy finding? We suggest that the answer is no; at least until other factors are held constant. By isolating such a finding for discussion, we ignore the socially significant finding; persons without heroin-using friends are very unlikely to use heroin when compared with those having intimate friends. Furthermore, focusing on this one finding detracts from the pattern of findings for all hard drugs; those without friends using a hard drug are unlikely themselves to try it, even though regular cannabis users are somewhat more likely to try each drug than noncannabis users or less regular users. Thus, the reader is urged to utilize these materials on statistical significance as an aid or guide to forming judgments about the interpretations proposed in this book. Significant tests should not be substituted for theoretical thinking upon which the empirical evidence is brought to bear in a meaningful way.

READING AND INTERPRETING THREE-WAY (N-WAY) TABLES

In large samples, where small differences are frequently statistically significant, it is important to investigate what happens to a significant relationship between an independent and dependent variable when another variable is held constant. Formally, the methodology utilized in this book is called multivariate analysis. This mode of analysis is outlined in Lazarsfeld and Rosenberg, and Hyman;4 with further elaboration and a multitude of examples of the technique provided by Rosenberg and Zeise1.5

In an attempt to make our reasonably complex data easier to understand, we have developed a relatively standard format for presenting the tables in this appendix. (We have discussed this methodology as it applies to graphs in the footnote of Graph 4.3). When multivariate analysis and the format of tables is understood, the data in this Appendix can he read quickly and meaningfully. In addition, these tables provide the data which may be examined in considerably greater detail than that provided in the text or graphs.

Central to understanding the tables in this appendix and the book is to ask several important questions. If asked in the correct order, these questions will help test the validity of the hypothesis in question. We will list these questions and then provide an example of how to answer them.

(1) How many variables does the author present in a given table?
(2) What is the dependent variable? What is the author trying to explain or understand?
(3) What is the independent variable? What factor has the author hypothesized might be the cause of the dependent variable? In most cases this hypothesis is discussed in the text and is a summary of the position of the Bureau of Narcotics.
(4) What other variable(s), called a test factor, does the author use to explain the relationship between the independent and dependent variable?
The answers to these questions should be clear from reading the title of the table and from discussion in the text of the book. Then the reader should locate the three variables in the column and row headings as well as the categories of each variable. Then the following questions should be answered from the numbers in the table.
(5) What is the relationship between the independent and dependent variable? (This is called the original, or two-way, relationship.) How strong, as measured by the percent difference, is the two-way relationship?
(6) What happens to the relationship between the independent and dependent variables when a third variable (test factor) is held constant? Does the percent difference due to the independent variable upon the dependent variable decrease in each or in only some of the categories of the test factor? Compute an average percent difference (APD).
(7) Does a comparison of the two-way relationship and the average percent difference with the test factor(s) held constant show that the APD is one-half to two-thirds the strength of the original relationship? If so, the independent variable is probably misleading or wrong as a cause of the dependent variable because of the confounding effect of the test factor.
(8) What is the relationship (measured by the percent difference) between the test factor (now examined as the independent variable) and the dependent variable. How strong is the relationship? Is the relationship decreased or unchanged when the original independent variable (now treated as a test factor) is held constant?
(9) Which variable the independent or test factor, has the greatest direct effect upon the dependent variable? Does a comparison of the APD's show that the independent variable is more or less important than the test factor?

With these questions in mind, the following table and explanation should be carefully examined. Table 15 contains the data upon which the top chart of Graph 8.1 is based.

Answers to the nine questions raised above can determine whether marihuana use is a fundamental cause of political militancy, the hypothesis discussed in Chapter 8. Each question is answered in turn.

Questions 1-2

The heading of Table 15 states that there are three variables involved in the table: "Percent High on Political Militancy" indicates that the dependent variable is the Political Militancy Index (developed in Chapter 8). Furthermore, the table title indicates that each cell contains information about only one category of the dependent variable: those high on the variable; excluded are those with "none" and "some" involvement in militancy. For example, excluding the percent difference (% diff.) row and column, each cell in the table should be read as in the following example: in the "buy cannabis" column and "irregular" row, "14" appears over "(425)." This means that there arc 425 students who are irregular (less than weekly) cannabis users and have only purchased cannabis; of these 425, 14% are high on the Political Militancy Index, and by implication, 86% (100%44%) of the 425 are not high on the Militancy Index. Another example: in the "total" column and "regular" row, there are 642 persons using cannabis regularly (weekly or more); 24% of these regular users are high on the Political Militancy Index (without reference to drug buying or selling).

Question 3
The title of Table 15 also identifies the independent variable with the phrase "by frequency of cannabis use." Essentially, the author is hypothesizing that something about marihuana use is associated with political militancy. Although sociologists hedge greatly and say that an independent variable is "significantly related to," "correlated with," or "associated with" the dependent variable, there is also an implicit assumption that the independent variable (marihuana use) is at least a partial "cause" of the dependent variable (political militancy). Given that a statistically significant association between two variables exists, the researcher must ask, "Why?" and "What other factors might possibly explain the association?"

Question 4
In the attempt to explain the association between an independent and a dependent variable, the researcher should attempt to hold constant, or take into account, other variables that might account for the link between marihuana and militancy. The phrase "holding constant the Illicit Marketing Index" identifies another variable with which the author expects to better understand the relationship between cannabis and militancy. Thus, the Illicit Marketing Index is what sociologists call the test factor or control variable.

Question 5
Having identified the appropriate variables, data in the table can be utilized to verify the hypothesis that there is an association between marihuana use and militancy, Cross-tabulating the frequency of cannabis use by the Political Militancy Index would quickly verify that there is a relationship that is statistically significant at the < .001 level (less than one chance in a thousand that it happened by random). In Table 15 the original, or two-way, relationship between the independent and dependent variable is found in the "Total" column: 3% of the noncannabis users, 14% of the irregular users, and 24% of the regular cannabis users are high on the Political Militancy Index. More important than the level of significance is the strength of the relationship that could be measured by such statistics as chi square or gamma.

However, the present study utilizes what is probably the simplest measure of the strength of an association: the percent difference. The percent difference is abbreviated "% diff." in the tables that follow and %Y (where Y is a symbol for the dependent variable) in the graphs in the text (see footnote of Graph 4.3).

With linear variables, the percent difference is usually computed for one category of the dependent variable (high on militancy) by subtracting extreme categories of the independent variable. Thus, the militancy of regular minus noncannabis users provides a 21% (24%-3%) difference. This difference indicates that regular cannabis users are 21% more likely to be militant than noncannabis users. However, most graphs in tables for Chapters 7-10 are somewhat unusual because of the test factor, the Illicit Marketing Index. The general purpose of these tables is to demonstrate that drug buying and selling is more important than marihuana use in determining militancy and other dependent variables. But nonmarihuana users cannot be compared with users, because almost all buyers and sellers use cannabis. There were only nineteen noncannabis users who bought or sold drugs. Thus, we must compare the militancy of regular versus irregular cannabis users, and exclude from consideration all noncannabis users (eliminate the first row).

The data show that regular users are 10% (% diff. = 24%-14%) more likely than irregular cannabis users to be high on political militancy. This 10% difference is entered in the "% diff." row, "total" column (the 10 before the slash). This percent difference summarizes the strength of the two-way relationship between the independent and dependent variable.

Question 6
It is important to ask why marihuana is linked to militancy. Is it because of some inherent quality of marihuana or some other factor? Normally a researcher would hold constant several other factors such as sex, race, socioeconomic status, and political beliefs in an attempt to disprove the relationship, with only those factors having the greatest effect being reported. Having held constant such factors, we find that drug buying or selling is an important -factor in marihuana-militancy association.

To show that a third factor, and not marihuana use, is responsible for militancy, we hold constant the Illicit Marketing Index. To hold constant this test variable, we separate our cannabis users into five subgroups: those who have neither bought cannabis nor sold drugs ("none"), those who have only bought cannabis ("buy cannabis'), those who have sold cannabis ("sell cannabis"), those who have sold one or two hard drugs ("1-2"), and those who have sold three or more hard drugs ("3+"). Within each subgroup, the militancy of regular versus irregular cannabis users is compared. In the first subgroup, nonbuyers and nonsellers, we find that regular cannabis users are 3% (10%-7%) more likely than irregular users to be militant. Among cannabis buyers the difference between regular and irregular users is 3% (17%-14%); among cannabis sellers, a 2% (20%-18%) difference; among sellers of one or two hard drugs, a — 1% difference (29%-30%); and among sellers of three or more drugs, a 2% difference (34%-32%). Each of these percent differences are placed in the "% diff." row and the appropriate column for each subgroup. Another observation from these data shows that regular marihuana users who are not involved in drug selling are considerably less likely to be militant (10%, 17% vs. 30%, 33%) than irregular cannabis users who sell hard drugs.

There is a simple statistic to summarize these percent differences for each of the five subgroups; this statistic is called the average percent difference (APD). This is computed by adding the percent differences from each of the subgroups and dividing by the number of subgroups. In this example, we add the five subgroup percent differences in the "% diff." row and divide by 5. Thus,

Question 7
This APD means that the frequency of marihuana use has an effect of 2% upon militancy when holding constant (symbolized by a slash) the Illicit Marketing Index. The APD is entered after a slash to the right of the original relationship. The cell 10/2 facilitates comparison between the two-way relationship and the same relationship when a third factor is held constant. Thus, the original relationship found that regular users were 10% more likely than irregular cannabis users to be militant; but this marihuana-militancy relationship is reduced to an average of 2% when drug buying and selling are held constant. In short, the effect of cannabis use upon militancy is reduced by 80% [(10 — 2)/10 = 8/10] from its original strength, when drug buying and selling is taken into account.

Another way to understand the meaning of the APD is that the independent variable ,(marihuana use) has an independent or direct effect of 2% upon the dependent variable (militancy) when the test factor (Illicit Marketing Index) is held constant. Hence, marihuana use has a very minor direct effect upon militancy. Generally, a reduction of half to two-thirds of the original relationship is sufficient to conclude that the effect of an independent variable upon a dependent variable is greatly diminished when the test factor is held constant.

Question 8
If marihuana use is not an important factor in political militancy, why is it strongly associated with militancy? The answer to this question is clear; there is a strong relationship between the Illicit Marketing Index and militancy. First, we change around the independent variable and test factor so that the Illicit Marketing Index becomes the independent variable and cannabis use becomes the test factor. In the "Total" row, we find the two-way relationship between the Illicit Marketing Index and political militancy. Militancy increases with each level of drug buying or selling; students selling three or more hard drugs are 26% (% diff. = 33%-7%) more likely than nonbuyers and nonsellers to be high on political militancy. This 26% difference is entered in the "% diff." column, "total" row to the left of the slash. Next we hold marihuana use constant in an attempt to affect this relationship. We divide our cannabis users into regular and irregular using groups. In the "irregular" row, we find that sellers of three or more drugs are 25% (3270-7%) more likely than nonbuyers and nonsellers to be militant; in the "regular" cannabis use row, we find that three-or-more hard-drug sellers are 24% (34%40%) more likely than nonbuyers and nonsellers to be militant. These two percent differences are entered in the "% cliff." column and the APD computed.

This APD is entered to the right of the slash. The 26/25 cell shows that the original relationship (26% difference) between illicit marketing and militancy is not affected (APD = 25) when marihuana use is held constant. Hence, the major factor affecting political militancy is drug buying or selling and not marihuana use. In addition, the reason that marihuana use is highly correlated with militancy is because cannabis use and drug selling are correlated.

Question 9
The above conclusion becomes even more clear when the direct effect of cannabis use and drug selling is compared. If the APD's arc considered as pure numbers that have been computed in a similar manner, the independent effect of the Illicit Marketing Index upon militancy is about twelve times greater than the independent effect of marihuana use upon militancy (APD's of 25 versus 2).

Once results such as this have been obtained, a researcher must attempt to understand theoretically why the Illicit Marketing Index is so strongly related to political militancy. A theory of increasing drug-subculture participation is the explanation presented in the text for this intriguing result. Indeed, without this explanation the link between drug selling and a wide variety of factors appears to be almost nonsensical.

PRESENTATION OF TABLES IN THIS APPENDIX

The above example of multivariate analysis is typical of the majority of tables in this appendix; however, many tables upon which graphs in the text are based have not been presented. There are three basic reasons for not including these tables in this appendix. First, the data presented in the text may be virtually complete, as is true of two-way relationships presented in Graphs 2.1, 3.1-2, and 4.1 and tables presented in the text. Second, in Graphs 4.2, 4.4, and 7.1 our argument does not try to demonstrate that one variable is more important than another, so detailed tables have not been presented.

But most importantly, many three-way tables are not presented because they can be easily constructed from data given in the graphs combined with information given in the text (as with Graph 4.3) or with data given in Table 24 (for Chapters 7-10). The following discussion demonstrates how the interested reader can develop such three-way tables from Table 24 and graphs in Chapters 8.10). Suppose one wished to develop a three-way table examining why persons were contacted by police for drug violations: because of their marihuana use or actual involvement in selling drugs. Information in Graph 10.2 (top chart) would provide most of the information needed.

The following steps will provide sufficient information to construct Table 16. In the table each of these steps is identified in small letters.

(a) Devise a title for the table that states clearly the dependent variable, independent variable, and test factor(s).
(b) Construct the "shell" of the table; devise the row and column headings for the table and appropriate categories for the independent and test variables.
(c) In this example, from Graph 10.2 take information from the "original relationship" and place it in the "total" column of the table. In the graph, the height of each bar is the percent to be entered in the appropriate cell of the table, while the number at the bottom of the bar should be entered in parentheses underneath the percentage; this is the number of cases upon which the percentage is based.

(d) From Graph 10.2 take the percentage and base number of cases for each combination of the independent variable and test factor and place it in the appropriate columns of the table. For example, the two rightmost bars in Graph 10.2 should go in the fifth column, second and third rows of Table 16.
(e) Then, from Table 24 enter the two-way relationship between the Illicit Marketing Index and the dependent variable (police contact for drug violation); the number of cases comes from the top row of Table 24. This information provides data that can be utilized to compute percentage differences and average
percentage differences.
(f) Next, the percent difference for the two-way relationship between
marihuana use and police contact is computed (15%-4% = 11%) and entered in the "total" column, "% cliff." row. (Remember that the noncannabis users have been excluded because they do not buy or sell.)
(g) Then compute the percent differences due to the frequency of cannabis use among each of the subgroups of buyers and sellers. For example, among those selling one or two hard drugs, regular users are 13% (2270-9%) more likely than irregular cannabis users to have police contact for drugs. Enter 13 in the % Jiff. row.
(h) Compute the average percent difference:

and place to the right of a slash next to the two-way relationship so that a direct comparison of the original and controlled relationship can be made.
(i) Next repeat steps f-h while treating the Illicit Marketing Index as the independent variable and marihuana use as the test factor. (Subtract numbers in the first column from those in the fifth column for each row except the
noncannabis users.) Finally, interpret the results as discussed above.

Following the steps outlined in this appendix will permit the reader to utilize
more fully the data presented in the following tables.

REFERENCES
1. Gerhard Lenski, The Religious Factor, Garden City, N.Y.: Anchor, 1961, pp. 367-376.
2. Richard Blum and Associates, The Dream Sellers, San Francisco: Jossey-Bass, 1972, pp. 123, 133, 149, count the number of significant differences but do not explain the theoretical usefulness of such data or findings.
3. Lenski, Ref. 1, p. 368.
4. Paul F. Lazarsfeld and Morris Rosenberg, The Language of Social Research, New York: Free Press, 1955, pp. 115-124. Herbert H. Hyman, Survey Design and Analysis, New York: Free Press, 1955, pp. 242-329.
5. Morris Rosenberg, The Logic of Survey Analysis, New York: Basic Books, 1968. Hans Zeisel, Say It With Figures, 5th ed., revd., New York: Harper & Row, 1968, pp. 118-189.