SPSS Bivariate Analysis
Soc 360, Spring 06
Don Barrett
Warning: The following glosses over many important statistical points. For this class these points can be ignored, but the incompleteness of these statements should be remembered when performing statistical analysis for others. All bivariate analysis is performed using SPSS, which is only available on the lab computers. Instructions on using SPSS are in Using SPSS in Soc 360.
Bivariate analysis is used to determine whether there is a statistical relationship between two variables. This document contains four sections:
Determining which test to use
Reading Output
Understanding Probability
Determining which test to use:
1) Determine whether your
variables are categorical or continuous (same rules as in basic analysis
exercise):
Continuous variables: For continuous variables, the order of the values should be clear, easily discerned, and raise or lower in a steady direction. Examples of this are measures of age, income, and measures that range from strongly disagree to strongly agree. In other words, a 20 on an age variable means 20 years old, while 30 means thirty years old, and clearly 30 is older than 20. We describe continuous variables with measures of central tendency, most typically means and standard deviations (see further below)
One additional way to tell whether a variable is continuous or not is to ask whether it makes sense to talk about the ‘average’ of the variable. It does make sense to talk about the average age in a group, or the average educational level, or the average level of depression. But it doesn’t make sense to talk about average sex (e.g., ‘the average sex in the sample is 1.2’), average race, or the average on the heavy drinker variable.
Categorical variables: For categorical variables, the values are simply codes assigned to a person’s status, and there is no implied order in the values. Variables measuring race, sex, or marital status are typically categorical variables. Take, for example, if race on a data set was coded 1 (white), 2 (black), and 3 (other). While 3 is greater than 1 on a number scale, when using this variable we have to think of 3 as indicating ‘other’ and 1 as indicating ‘white’. Since ‘other race’ is not greater than or less than ‘white race’, but simply different, we have to treat this variable as a categorical variable. Average race would not make sense, but saying the sample is 75% white does make sense.
2) Then select from the
following for analysis (see further below for more information on how to read
output):
Independent Samples T-test : compares a categorical variable that has only 2 categories to a continuous variable. Can be found in SPSS found under 'Analyze' then 'compare means'. The 'test variable' is the continuous variable, the 'grouping variable' is the categorical. Under 'define categories' you enter the two codes of the categorical variable. The probability for determining if the samples are difference takes two steps: a) if the Levine test is significant, use the number under ‘Sig. 2-tailed’ on the ‘Equal variance assumed’ line, b) if the Levine test is not significant, use the significance on the ‘Equal variance not assumed’ line.
Crosstabs: Compares two categorical variables.
Can be found in SPSS found under 'Analyze' then 'descriptive statistics'.
It doesn't matter which variable you put in the row, and which in the column,
boxes. When
using this be sure to
One-way
ANOVA: Compares a
categorical variable (with more than 2 categories) to a continuous variable.
Can be found in SPSS found under 'Analyze' then 'compare means'. Be
sure to click on Options and then check Descriptives. Your categorical variable goes in the 'factor' box
and your continuous in the 'dependent list' box. For probability, use the ‘significance’ next
to the ‘F’.
Bivariate Correlation: Compares two continuous variables. Can be found in SPSS found under 'Analyze' then 'correlate'. Only one significance is printed, which is the one you use.
Controlling:
In many analyses in the class you
will be are asked to ‘control for’ a third variable. For example, you might have been asked to
examine the relationship between race and income, which would be either a
t-test or a oneway
(depending on how race is measured.) If
you were then asked to measure the relationship between race and income
controlling for sex, that means that you would want to look at the relationship
between race and income for men, and then for women. Note that when you do this, you are actually
doing two separate analysis and the results simply tells you if the relationship
you initially found still holds true when you break the sample down into two
groups: men and women. Note also that
this doesn’t tell you anything about whether men make more then women.
To do this, use the Split File command under Data, check ‘organize the output by groups’, place your grouping variable (in this case the sex variable) in the box, and make sure the ‘sort the file’ dot is checked. Then run the bivariate analysis that you need to run, which in this case would be either a t-test or oneway of race and income.
NOTE: Once you have performed the 'split
file' command, all subsequent analysis will be split by the grouping variable. If you want to do some other analysis that does not control for
the grouping variable, you turn off
Split File by going to the Split File command and checking ‘analyze all cases’.
Reading output:
(Highlighted sentences are examples of how to describe statistical results when reporting them)
Crosstabs and t-tests: The following describes how to use the output from Crosstabs and from T-test. In these examples the first test describe divorce rates by sex. The second test then tests whether those who are divorced are typically different on educational level from those who are not divorced.
CROSSTABS
/TABLES=sex BY divorce
/FORMAT= AVALUE TABLES
/STATISTIC=CHISQ
/CELLS= COUNT ROW COLUMN
/COUNT ROUND CELL .



T-TEST
GROUPS = divorce(1 2)
/MISSING = ANALYSIS
/VARIABLES = educ
/CRITERIA = CI(.95) .


There is more information in the above than you need, to make the table below comparing the groups use only the following. From the Crosstabs, use the column percentages for each group and the PEARSON chi-square p-value. From the t-tests use the MEAN for each group. For the p-value in a t-test, first look at the 'Sig' under "Levene's Test," if it is > .05 continue across on the first line, if it is < .05 use the second line. In either case, the p-value is in the column labeled "Sig. (2-tailed)".
Comparison of History of Divorce
| Variable | Ever divorced | Never divorced | p-value |
| Male (%) | 24.5 | 75.5 | |
| Female (%) | 22.5 | 77.5 | .209 |
| Education (average years) | 12.87 | 13.29 | .015 |
The 'p-value' tells you whether the differences between the groups are more than would be expected by chance. If the p-value is less than .05, then you can consider the groups to be different. Thus in this sample there is no relationship between sex of the respondent and divorce. Men and women had roughly equal rates of divorce, with 24.5% of men being divorced and 22.5% of women being divorced, which is not a significant difference (p=.209). On the other hand, there is a statistically significant relationship between education and divorce with people who have been divorced being less educated than people who have not (12.87 versus 13.29, p=.015). Always report both the statistic and the p-value in your table and text.
ONEWAY: The following uses Oneway to see if there is a relationship between years of education and race.
ONEWAY
educ BY race
/STATISTICS DESCRIPTIVES
/MISSING ANALYSIS .
Oneway


Level of Education, by Race
| Race |
Years of Education |
p-value |
| White | 13.43 | |
| Black | 12.35 | |
| Other minority | 13.42 | .000 |
The p-value for the ONEWAY is the number under 'Sig' next to the F; means and standard deviations are easily findable in the output. So, with this you can say that there is a statistically significant difference (p=.000) in education by racial group in this sample with whites having an average education of 13.42 years, blacks of 12.35 years, and other minorities of 13.42 years.
CORRELATIONS The following examines the relationship between age and income.
Correlations
Correlations
|
|
incomeR |
age |
|
|
incomeR |
Pearson Correlation |
1 |
.222(*) |
|
|
Sig. (2-tailed) |
|
.044 |
|
|
N |
83 |
83 |
|
age |
Pearson Correlation |
.222(*) |
1 |
|
|
Sig. (2-tailed) |
.044 |
|
|
|
N |
83 |
83 |
* Correlation is significant at the 0.05 level (2-tailed).
The correlation indicates that there is a significant positive relationship between age and income in this sample. A correlation is a measure of the extent to which increases in one variable result, on average, in increases (or decreases) in the other variable. A positive correlation is an indication that as the values on one variable go up, so do the values on the other variable. A negative correlation is an indication that as the values on one variable goes up, the values on the other variable go down. In this example, the positive correlation indicates that as the values for age go up, the value for income also go up. The correlation is reported as r=.222, p=.044. The graph below illustrates this relationship, where the line is equivalent to the r of .222 in the above results. Graphs are not essential for this class, but they do help in understanding. To get a graph: 1) go to graphs menu in SPSS and check scatterplot, 2) check 'simple', 3) enter your two variables in the X and Y axis, 4) click 'OK', 5) when the scatterplot is in your output, double click on the scatterplot to get the graph editor, 6) click on 'elements' and 'add fit line at total'.
GRAPH
/SCATTERPLOT(BIVAR)=age WITH incomeR
/MISSING=LISTWISE .
Graph

Understanding Probability
How do we know that there isn’t just some random chance of their being a difference between two variables? We use statistical testing. Statistical testing takes into consideration a number of different characteristics of the data we’re looking at, and calculates a number that is an estimate of the chance that we would see these same results if we just took a random sample and there was no real difference between the groups. For this class, we always use less than .05 (< .05) as a cutoff.
Example: Theory on sexual identity politics suggests that gay males who are otherwise members of the mainstream (i.e., white) are more likely to be comfortable with their sexuality and open about it than are males who are minority. Suppose we find the following (see tables below) using data that measures minority status and has a measure of outness as gay (a scale that ranges from 1 to 5): That white males have an average score of 3.98 on the measure of outness, while minority males have an average score of 3.45. Does this difference support our hypothesis? We need to know how likely we are to find such a difference in similarly drawn samples of gay males. The probability is what tells us this, the probability for this test is p = .038. Thus the statistical test here tells us that we would be likely to find this distribution of results about 3.8% of the time if we just kept sampling gay males. Since this is fairly unlikely, we can believe that this data supports the hypothesis that minorities are less comfortable openly expressing their identity as gay.
Group Statistics
|
minority |
N |
Mean |
Std. Deviation |
Std. Error Mean |
|
|
|
outness |
1.00 |
126 |
3.975 |
1.0640 |
.0948 |
|
|
|
2.00 |
24 |
3.450 |
1.0946 |
.2234 |
|
Independent Samples Test
|
|
Levene's Test for Equality of Variances |
t-test for Equality of Means |
|
|||||||||||||||
|
F |
Sig. |
t |
df |
Sig. (2-tailed) |
Mean Difference |
Std. Error Difference |
95% Confidence Interval of the Difference |
|
||||||||||
|
|
|
|
|
|
|
|
Lower |
Upper |
|
|||||||||
|
outness |
Equal variances assumed |
.740 |
.391 |
2.206 |
148 |
.029 |
.5250 |
.2380 |
.0546 |
.9954 |
||||||||
|
|
Equal variances not assumed |
|
|
2.163 |
31.834 |
.038 |
.5250 |
.2427 |
.0305 |
1.0195 |
||||||||