Two variable contingency table
Introduction:
Scientists studying the physical world or the social world begin with an assumption: relationships in the world are not haphazard and there are regularities that can be ascertained. The scientific endeavour involves discovering the regularity or order of the world, whether physical or social. Scientific research follows a rationale in studying the world. This rationale is best illustrated in the classical experimental approach and its underlying logic.
In social research, it is difficult to study social relations in a laboratory setting using the experimental approach, due to ethical and practical considerations. This does not mean that social scientists cannot follow the same rationale as physical scientists do in their experimental approach. Social scientists adopt a method which allows them to approximate the logic used in the experimental design. It is essentially a logic that involves setting up the condition of ceteris paribus, which is the Latin phrase that simply means “other things being the same or equal”. We learn how to do that using a contingency table for two variables. In later modules, we shall learn how to do this in three-variable contingency table analysis and other analyses.
The Classical experimental Approach
The classical experimental design provides a basic framework for understanding the logic of scientific research; it involves conducting an experiment using an experimental group and a control group. Let us begin with an example. Suppose a professor teaches two identical classes of second year students majoring in sociology and wishes to find out whether using a PowerPoint presentation in class can help students to learn or not. The professor arranges an identical lecture for both classes, and asks the students before the lecture to do a test to find out how much they know about the subject matter of the lecture. The professor then has the same person delivering the same lecture to both classes, but in one class the professor includes a PowerPoint presentation whereas in the other class, she uses the same lecture without the PowerPoint presentation. After the lecture, the professor asks the students to write another test to see how much they have learned from the lecture. If the use of PowerPoint presentation does help students to learn better, then one should expect the class exposed to the lecture with the PowerPoint presentation should do better in the second test compared to the first test. On the other hand, if the use of PowerPoint presentation makes no difference in learning, then the students in both classes should perform about the same in the second test compared to the first test.
The above example illustrates well the rationale of the classical experimental design. By using two identical classes to find out whether a PowerPoint presentation helps student to learn or not, the professor is setting up the condition that allows her to test the effectiveness of PowerPoint presentation in class learning. The condition has to do with the only difference between the two classes: the experimental class is exposed to the PowerPoint presentation and the control class is not. With this condition, there are two possible outcomes: (1) the experimental class performs better than the control groups comparing the two test results, and (2) the experimental class performs the same as the control group. If (1) is found, it indicates that whatever difference there is in the experimental process makes a difference in the outcome; if (2) is found, then whatever difference there is in the experimental process makes no difference in the outcome. In other words, the outcome (1) indicates that PowerPoint presentation is related to improved learning; the outcome (2) indicates that PowerPoint presentation is not related to improved learning.
Figure 4-1 summarizes all the elements in the classical experimental design. The process begins with “randomization”. It refers to a process of assigning individuals (or subjects) to the experimental group and the control group so that each individual has the same chance of being selected in either one group. Randomization is a random assignment to ensure that individuals in one group are equally likely to possess a feature as individuals in the second group. In the example discussed before, one could argue that the experimental group learns more not because of the PowerPoint presentation, but because the individuals in the experimental group could be more intelligent. However, if the process of randomization is followed, then the chance of intelligent students showing up in the experimental group is the same as the chance of intelligent students showing up in the control group.
Figure 4-1: Classical Experimental Design.
Source: Eva Xiaoling Li, University of Saskatchewan, module author.
Figure 4-1 shows that observations of the phenomenon under study are taken at t0, the point before the application of treatment or no treatment, and the results are recorded as E0 for the experimental group and C0 for the control group. The next stage is the application of “treatment” to the experiment group, and the application of “no treatment” to the control group. When these are completed, a second observation is taken at t1, and the observations are given by E1 for the experimental group, and C1 for the control group. The results are compared. The difference between C1 – C0 indicates the “natural change” that happens in the process, and the difference E1 – E0 indicates the change due to treatment and the “natural change”. The difference between (E1 – E0) and
(C1 – C0) indicates that change is due to treatment only, which is over and above the “natural change”.
The contingency table
Social research involves studying people in natural settings in society, which makes it hard to apply the classical experimental design. In fact, unlike physical scientists who conduct experiments and observe how things unfold in the experimental process in the laboratory, social scientists often have to collect their observations of individuals in the social world based on reported events of the past. This point is well illustrated in a questionnaire survey in which the person conducting the survey is asking respondents questions about the past, such as when a person was born, how the respondent voted in the last election, and how many years of schooling the respondent has completed. Since these events have already occurred, a researcher would never be able to go back in time to assign individuals to the experimental group and the control group before events unfolded. The question is that without the process of randomization to ensure “other things are being equal or the same”, what do researchers do about this condition?
The condition of ceteris paribus central to the experimental design cannot be established in survey methods used by social scientists. Instead, the best social scientists can do is to try to group people with a certain level of a feature and compare them with people of a different level of the same feature to see how these groups of people differ in something else. We have learned how to describe a feature using the concept of a variable. For example, if we want to find out the effect of education on income, and we suspect that more education brings higher income, then we can group people into different levels of education to see if the income in each level differs or not. If income differs at different levels of education, then we can say the change in education from one level to another level brings an increase or decrease in income. In essence, putting individuals with the same level of social feature in one group is a way to create the condition, after the fact, that these individuals have the same condition with respect to one social feature.
The contingency table analysis provides a convenient way to allow researchers to compare individuals with the same condition to individuals with a different condition with respect to a variable. The comparison is to see how individuals sharing the same level of a variable differ from individuals sharing another level of the same variable in terms of what the researcher is interested in studying (dependent variable). In short, the contingency table analysis involves comparing individuals in one category (or level) of an independent variable to see how they differ from individuals in another category (or level) of the same independent variable with respect to a dependent variable.
To construct a bivariate (two-variable) contingency table, we begin with a data matrix. A data matrix is simply an arrangement of data into rows and columns. It may also be seen as a summary arrangement of information obtained from a study. Table 4-1 illustrates a data matrix in rows and columns. There are ten rows of data in the matrix; each row represents information or data for one unit. In this case one unit is an individual. There are ten units or ten individuals in the data matrix. For each individual, there are two pieces of information, gender and type of job. Gender is a nominal variable, and it is coded as a dummy variable (0 = Female; 1 = Male). Type of job is also a nominal variable but it may be treated as an ordinal variable if one wishes to argue that a managerial job (coded as 2) has a higher status than a non-managerial job (coded as 1). We can look down the columns of “Gender” and “Type of Job” to see how the numbers change from one row to another. In general, if gender is 0, the type of job is likely to be 1. However, individuals C and J are exceptions. The same can be said when gender is 1, in which case the type of job is likely to be 2. Again, individual G is an exception. Even visually going down the list, one can roughly see how changes in one variable correspond with changes in the other variable. However, this task becomes difficult when the data set is large.
Individual | Gender | Type of Job |
A | 1 | 2 |
B | 0 | 1 |
C | 0 | 2 |
D | 0 | 1 |
E | 0 | 1 |
F | 1 | 2 |
G | 1 | 1 |
H | 0 | 1 |
I | 1 | 2 |
J | 0 | 2 |
Gender: 0 = F; 1 = M | ||
Type of job: | 1 = non-managerial | |
2 = managerial |
Table 4-1: Example of a Data Matrix.
Source: Eva Xiaoling Li, University of Saskatchewan, module author.
Learning Activity 1 Based on the materials from “The Classical Experimental Approach” and “The Contingency Table”, answer the following questions: 1. Use your own words to explain the process of the classical experimental design. 2. Explain the difference of the condition of control between the classical experimental design and social science survey method. |
We can display the information in a data matrix in a contingency table. Using the variables gender and type of job in Table 4-1, we construct the contingency table in Table 4-2. There are 6 data cells in this table. The last column labeled as “Total” shows the total number of individuals in each row, or simply row total. From the information, we know that there are 5 individuals holding non-managerial jobs and 5, managerial ones. Similarly the last row labeled as “Total” shows the total number of individuals in each column, or simply column total. There are 6 individuals who are female, and 4 individuals, male. The row total and the column total are called marginals as they show the total number of cases in the margins. Information in the margin is redundant in the sense that they do not add new information to the table; they are summation of numbers from the respective row or column. For this reason, we refer to a contingency with 2 variables each with 2 categories as a 2 by 2 table. In other words, we do not count the margins when we refer to the contingency table in this way.
Type of Job | Female (0) | Male (1) | Total |
Non-Managerial (1) | 4 | 1 | 5 |
Managerial (2) | 2 | 3 | 5 |
Total | 6 | 4 | 10 |
Table 4-2: Contingency Table of Type of Job by Gender.
There are 4 cells in a 2 by 2 table, as in Table 4-2. The number in each cell represents the number of individuals who have the same features in the two variables. For example, in the first cell in the upper left, there are 4 individuals who are female (gender=0) and holding a non-managerial job (job=1). In the lower right-hand cell, it shows that there are 3 individuals whose gender is coded as 1 and whose job is coded as 2. In short, where an individual lands in the table is contingent upon the individual’s values of the two variables. This is similar to plotting a graph using information of two variables in the sense that the location of a dot in the graph is determined by (or contingent upon) the values of the two coordinates. From Table 4-2, we can see that 4 out of 6 women have non-managerial jobs, but 2 have managerial jobs. Most men (3 out of 4) have managerial jobs. In short, the contingency table provides us a more systematic and convenient way to read the pattern in the data than just going through the data matrix by visual inspection.
Learning Activity 2 Education, age, gender, and income are four variables. The measurements of those variables are given below: Education: 1 = below high school; 2 = high school; 3 = bachelor degree; 4 = above bachelor degree. Age: 1 = young, 2 = old. Gender: 1 = male, 2 = female. Income: 1 = low income; 2 = high income. Based on the above information, choose two variables and build a data matrix for 10 individuals. |
Bivariate Relationship in a Contingency Table
How do we interpret the relationship in a contingency table? Let us begin with a bivariate contingency table with 2 variables, each with 2 cells. Table 4-3 is such a 2 by 2 table cross-classifying the distribution of “job position” and “gender”. Job position is a nominal variable with two categories: executive position and worker position, and “gender” is also a two-category nominal variable. The numbers in the margin show that there are 200 men and 50 women, making up a total of 250 cases. In addition, we know that there are 50 executive positions and 200 worker positions.
The numbers in the 4 cells indicate that 45 of the 200 men, or roughly a quarter of men, hold an executive position. In contrast, only 5 out of 50 women, or 10 percent, hold an executive position. In other words, these numbers tell us that men are more likely than women to hold an executive position. Similarly, we can say that 155 out of 200 men, or roughly three-quarters of men, hold a worker position, but 45 out of 50 women, or 90 percent women, hold a worker position. In other words, women are more likely than men to hold a worker position. The two statements “men are more likely than women to hold an executive position” and “women are more likely than men to hold a worker position” are complements in the sense that if we know one statement, we also know the other statement because there are only two categories in the variable “job position”. This is the same as saying that if we know the value of p in probability, we also know the value of q, since p and q make up the total probability of 1. Hence, if p is large, q has to be small.
Male | Female | |
Executive position | 45 | 5 |
Worker position | 155 | 45 |
Total | 200 | 50 |
Table 4-3: Contingency Table of Job Position by Gender, In Number.
We can see now that using the raw numbers in the contingency table to perform the analysis is still a bit cumbersome, as it requires us to take the total number in the marginal into account when we interpret the number in the cell. When we do so, we are inevitably using percentages to interpret the relation. Table 4-4 uses the same data, but this time includes both the raw number and percent by column.
Male | Female | Total | |
Executive positions | 45 (23%) | 5 (10%) | 50 |
Worker positions | 155 (77%) | 45 (90%) | 200 |
Total | 200 (100%) | 50 (100%) | 250 |
Table 4-4: Contingency Table of Job Position by Gender, In Number and Percent.
The percentages in Table 4-4 clearly show the relationship between gender and job position. Looking across by row, Table 4-4 indicates that 23 percent of men hold an executive position compared to 10 percent of women hold such a position. In short, men are over two times more likely (23%-10%) than women to hold an executive position. We know now without even looking at the numbers in the second row that women (90%) are more likely than men (77%) to hold a worker position. In fact, the difference here is also 13 percent.
Using percentages in a contingency table to do the analysis removes the problem of having unequal numbers of men and women in the study, as in Table 4-4, as percentages express the absolute numbers into relative numbers (percent is relative to 100).
There are two ways to run the percentages in a contingency table, by row or by column. In Table 4-4, the percentages are by column in the sense that the percentages add up to 100 percent by column. The rule is: If the independent variable is at the top, (that is, categories of independent variables are listed by column) the percentages should always add to 100 by column. The reason is simply that we would like to compare one category of the independent variable (say male) to another category of the independent variable (say female) to see how the percentage in one category (executive position) of the dependent variable change. We should remember the comparison is always with 2 or more categories of the independent variable regarding changes in percent in ONE given category of the dependent variable.
I should be clear by now that for Table 4-4, if we run the percentages by row, we are saying that we are treating “job position” as an independent variable and “gender” as the dependent. This however, does not make sense, as the type of job position cannot affect the gender of a person.
Learning Activity 3 Below is a contingency table. Calculate the percentage of each cell based on the number given and explain the relationship between the two variables. |
Male | Female | |
Executive positions | 50 | 5 |
Worker positions | 150 | 45 |
Total | 200 (100%) | 50 (100%) |
Table 4-5: Learning Activity 3 Contingency Table.
Summary
In this module, you were introduced to the rationale of the classical experimental design. Such a design emphasizes the random assignment of subjects to the experimental group and the control group to achieve the condition of “other things being equal or the same”. This condition is to make sure the conditions for the two groups are the same except what is being experimented—exposure of the experimental group to treatment effect but not the control group. The logic of the classical experimental design is to observe the change in the dependent variable in the experiment and be able to attribute the change to the treatment effect and nothing else.
It is not practical or feasible in social research to adopt the experimental design. Instead, researchers use the survey method to gather information about individuals and group them according to similarities in social features or variables. The contingency table is one method that allows survey data to be arranged in a table format for convenient interpretation. A contingency table has a number of cells determined by the number of categories in the dependent and independent variables. In addition, the margins in a contingency table provide useful, but redundant information about the number of cases in each row or column. When comparisons are made in contingency table analysis, it is important to use relative numbers or percentages since the importance or weight of an absolute number in a cell is affected by the total number in the margin. The percentages should be run in the direction of categories of the independent variable such that they add up to 100 in the direction of each category of the independent variable.
Module 5: Three Variable Contingency Table
Overview:
In Module 4, we covered the classical experimental design, and learned how researchers in social science adopt the logic of the experimental approach in survey research. The contingency table analysis allows researchers to group individuals who share the same level of a variable as one group and compare them to another group of individuals who share a different level of the same variable. The purpose of the comparison is to see how the dependent variable changes in these groups. A bivariate contingency table provides a convenient way to summarize how variables change in the data from one individual to another, as well as to study the relationship between two variables. We have learned how to use the frequencies in the cells of a contingency table to perform analysis using both absolute numbers and percentages.
In this module, we extend the analysis to three variables, using a three variable contingency table.
Introduction
In Module 4, we learned how to compare individuals at different levels of an independent variable to see how they differ in the dependent variable. If we change from one level of an independent variable to another level and see a corresponding change in the dependent variable, then we know that there is a relationship between the independent and the dependent variable. This is an adaptation of the logic of the experimental design. In such a design, the presence and absence of “treatment” in the experimental and the control group allow a researcher to conclude that if there is a net change in the variable under study (dependent variable), it would have to come from the treatment effect and nothing else. Such a conclusion can be drawn because of randomization—the random assignment of subjects to the experimental or the control group to ensure that the probability of having a particular feature in one group is the same as that of the other. Such a process creates the condition of ceteris paribus—other things being the same or equal.
In survey research, the best a researcher can do when matching people with the same characteristic for comparison is to be able to attribute the change in the dependent variable to the variable being matched. In other words, the researcher can conclude that the change in the dependent variable is due to the independent variable being matched and compared, but cannot say for sure that the change is not due to “something else”. This reason is simply that besides the independent variable being matched and compared, other variables that may make a difference in the dependent variable have not been controlled. This is why one of the assumptions a researcher often makes in survey research is that all the variables being considered in the analysis constitute a closed system, and the influence of “other” variables not considered is assumed to be random.
In a three variable contingency table, also called a 3-way contingency table, we learn how to control for a third variable to see how the relationship changes under this condition. If we need to control for yet another independent variable, then we go to
4-way or n-way contingency table analysis.
Logic of control in survey Method
In Module 4 a contingency table showed that there is a relationship between “job position” (executive and worker position) and gender (male and female). The relationship shows that men are more likely to hold an executive position than women. But what the researcher cannot say is that the difference in the likelihood of holding an executive position can only be attributed to gender and nothing else. Indeed, it is very possible that other variables not being controlled in the table may have produced the difference in the dependent variable. One such possibility is that the men being studied may have a higher educational level than women. If this is the case, then the men being studied may be more likely to hold an executive position simply because they have a higher education than women. If this can be established, then the proper conclusion is that education, and not gender or at least to a lesser extent gender, makes the difference in the likelihood of holding an executive position.
This example illustrates that in contingency table analysis, it is not sufficient to just examine the relationship between a dependent variable and an independent variable. We also need to “control” for the variation of other variables that we suspect may make a difference in the dependent variable. This method of control is what is being adopted to compensate for not having the condition of “other things are being equal” as in the experimental design. We now turn to a three variable contingency table analysis.
Learning Activity 1 Answer the following based on the module Learning Material “Logic of Control in Survey Method”. 1. What is the logic of control in survey method? 2. Explain the logic of control of survey method by using an example with three variables, one dependent and two independent. |
The analysis of Three Variable (3-way) contingency table
Table 5-1 is a table of hypothetical data that shows the relationship between respiratory symptoms by rural and urban residents in a country. The study is to find out what variables influence respiratory ailment. The marginal cells indicate that there are 1,800 individuals in the hypothetical study, of which 620 have symptoms of respiratory ailment and 1,180 do not. In addition, of the 1,800 individuals in the study, 800 are rural residents and 1,000 are urban residents. To interpret the relationship in the table, we first calculate the percentages in the direction of the categories of the independent variable.
Rural (N) | Urban (N) | Total | |
Symptoms of respiratory ailment | 320 | 300 | 620 |
No symptom | 480 | 700 | 1180 |
Total | 800 | 1000 | 1800 |
Table 5-1: Respiratory Ailment by Rural and Urban Residents in Number.
Table 5-2 shows the relationship between respiratory ailment by rural and urban residents in percent. From the table, it is clear that 40 percent of rural residents have symptoms of respiratory ailment compared to 30 percent among urban residents. Thus, the table indicates that residing in rural or urban areas influences the likelihood of having symptoms of respiratory ailment. However, if we suspect that rural residents also smoke cigarettes more frequently than urban residents, perhaps it is the frequency of smoking that makes a difference in symptoms of respiratory ailment. To answer this question, we need to further separate the rural and urban residents groups according to their frequency of smoking and compare them under different conditions of smoking.
Rural (%) | Urban (%) | |
Symptoms of respiratory ailment | 40 | 30 |
No symptom | 60 | 70 |
Total | 100 | 100 |
Table 5-2: Respiratory Ailment by Rural and Urban Residents in Percent.
Let us suppose we measure the frequency of smoking in two categories (nominal variable): those who are daily smokers, and others who are not daily smokers. Table 5-3 introduces this third variable as a control variable. Column 1 of Table 5-3 includes rural residents who are daily smokers; column 2, urban residents who are also daily smokers. These numbers are the same! When we compare column 3 (rural non-daily smokers) and column 4 (urban non-daily smokers), we find that the numbers are not the same, but we should note that the column total is also not equal. As before, we convert the raw numbers to percentages to interpret the relationships.
Daily Smokers | Non-Daily Smokers | ||||
Rural | Urban | Rural | Urban | Total | |
Symptoms of respiratory ailment | 253 | 253 | 38 | 76 | 620 |
No symptom | 347 | 347 | 162 | 324 | 1180 |
Total | 600 | 600 | 200 | 400 | 1800 |
Table 5-3: Respiratory Ailment by Rural and Urban Residents and by Type of Smokers, in Number.
Source: Eva Xiaoling Li, University of Saskatchewan, module author.
Table 5-4 shows that 42 percent of rural daily smokers have symptoms of respiratory ailment, compared to the same percent of urban daily smokers. In other words, when we look at only those who smoke daily, we find the percent of respiratory ailment to be the same among rural and urban residents. When we compare rural non-daily smokers (column 3) to urban non-daily smokers (column 4), we find the same 19 percent not having respiratory ailment. These comparisons clearly indicate that the original relationship between respiratory ailment and rural/urban residency disappears when the frequency of smoking is controlled.
Daily Smokers | Non-Daily Smokers | |||
Rural (%) | Urban (%) | Rural (%) | Urban (%) | |
Symptoms of respiratory ailment | 42 | 42 | 19 | 19 |
No symptom | 58 | 58 | 81 | 81 |
Total | 100 | 100 | 100 | 100 |
Number | 600 | 600 | 200 | 400 |
Table 5-4: Respiratory Ailment by Rural and Urban Residents and by Type of Smokers, in Percent.
The important aspect of control in a contingency table analysis is to make sure when we compare two columns of numbers that there is “only one thing that varies at one time”. For example, in columns 1 and 2, the level of smoking is the same (daily smokers) and the only thing that varies here is rural or urban residents. It follows that whatever difference there is in the dependent variable in the table, the difference can only come from that variable which varies here. Conversely, if there is no change in the dependent variable when we vary an independent variable, then the independent variable cannot be said to influence the dependent variable. Using the same logic, we can find out the effect of frequency of smoking on respiratory ailment in Table 5-4. If we compare columns 1 and 3, we are comparing two groups of people who are rural residents, but they differ in the frequency of smoking. The table indicates that 42 percent of rural daily smokers, compared to 19 percent of rural non-daily smokers, have respiratory ailment. Therefore, the difference in respiratory ailment (42%-19%) must have come from that element which varies—that is, whether residents are daily smokers or not.
Learning Activity 2 Below is a three-way contingency table; explain the relationship among the three variables. |
Daily Smokers | Non-Daily Smokers | |||
Rural (%) | Urban (%) | Rural (%) | Urban (%) | |
Symptoms of respiratory ailment | 42 | 42 | 19 | 19 |
No symptom | 58 | 58 | 81 | 81 |
Total | 100 | 100 | 100 | 100 |
Number | 600 | 600 | 200 | 400 |
Table 5-5: A three-way contingency table.
Spurious relations and models
We can summarize the relationship among the three variables in Table 5-3 as follows. The original relationship between respiratory ailment and rural/urban residency is really false or spurious because the original relationship disappears under the influence of a third variable, the frequency of smoking. We use the term spurious relation to refer to a false relationship between two variables when in fact such a relationship is not there. The relations in Table 5-3 imply that
Rural/urban residency——-► frequency of smoking——► respiratory ailment
This is why when the middle variable is controlled, the original relationship between the first and last variable disappears since the middle variable intervenes the relationship. For this reason, the middle variable in such a model is also called an intervening variable. The model above, with the middle variable acting as an intervening factor, is formally called an interpretation model. When we have three variables, there are three models possible depending on how the original relationship changes under the influence of a third variable. Figure 5-1 summarizes these models.
Figure 5-1: Diagram Showing Original Relationship, Relationship Under Control, and Implied Model.
Source: Eva Xiaoling Li, University of Saskatchewan, module author.
In the first model, there is an original relationship between two variables, Y and X, but the relationship disappears when a third variable, T, is controlled (ryx.t=0). If we think that the middle variable acts as an intervening variable as we have seen in Table 5-3, then the model is called interpretation. However, if we suspect the middle variable acts as an antecedent variable (a variable that precedes, or prior to, others), the model is called explanation. An example is the relationship between smoke and heat, which would disappear when we control for fire—which is a variable that precedes smoke and heat. A researcher cannot decide whether a model is interpretation or explanation based on the changes in the original relationship, since the statistical conditions are the same for both interpretation and explanation models. To do so requires the researcher to rely on common sense or theory. In the case of fire, smoke and heat, it would make no sense to think smoke causes fire which in turn causes heat. Therefore, we rule out the model of interpretation to opt for the model of explanation. Finally, the most common model that we see in actual social research is the model of specification, also called refinement. It is a model in which the original relation changes but does not disappear under the influence of T (ryx.t≠0), indicating that both X and T affect the dependent variable Y.
Learning Activity 3 Answer the following questions based on the Learning Material “Spurious Relations and Models” in the module. 1. What is a spurious relationship? 2. What is an interpretation model? 3. What is an explanation model? 4. What is a specification model? |
Summary
In this module, we learned the logic of control in survey research. The control involves comparing people with a similar feature (independent variable) to see how the phenomenon being studied (dependent variable) changes. We use a three-way contingency table to analyze the relationship among three variables. We begin with a dependent variable and an independent variable to see what the bivariate relationship looks like. We then subject the relationship to the influence of a third variable, the control variable, to see how the original relationship changes. Using the columns of a contingency table, we are able to control one variable while allowing one other to vary. This way, we know that whatever changes there are in the dependent variable, they would have to come from the variable that varies, and not the variable that is being held constant.
In this module, we also learned what a spurious relationship is. We end the module by introducing three possible models to describe the relationships among three variables.