Week 3 -> 4: Are student responses to Pretest Q3 related to student grade on Midterm 1 in class A?

April 21, 2017

Week 3 -> 4: Are student responses to Pretest Q3 related to student grade on Midterm 1 in class A?

Introduction

For this research question, we are trying to determine whether there is a correlation between a pretest response and midterm grade, or whether these variables are independent of each other. This type of comparison between a continuous variable and a categorical variable typically requires an ANOVA or ANCOVA, but if we bin the midterm grade into upper, middle, and lower thirds, we can do a contingency table. Regardless of our method, we found that there was no significant correlation between pretest response and midterm grade. Full analysis is in Google drive.

Flowchart

1) More than one variable of interest: Yes (go to circle 4)

2) Interested in the relationship between two variables: Yes

3) Both variables continuous: No

4) One variable continuous and one variable categorical: Yes

Analysis-of-variance (ANOVA)

5) Number of ways which the categorical variable can be classified: 1

6) Outcome variable normal or can central-limit theorem be assumed to hold: Yes

7) Other covariates to be controlled for: Possibly

One-way ANOVA or Analysis of covariance (ANCOVA)

Alternate:

4) One variable continuous and one categorical: No

5) Ordinal data: No

6) Interested in tests of association

Contingency tables

One-way ANOVA:

Analysis of Variance is a statistical model which attempts to separate the “total” variance of a result’s distribution into “component” variances in order to measure the effect of each component on the result. We are doing a one-way ANOVA because we are interested in the effect of one category (Student response to Pretest 3) on the result (Midterm 1 Scores). In this case, our component variances would be “due to Pretest 3 Response” and “variables unaccounted for”.

By my understanding, a one-way ANOVA is done by first splitting the parent population (All Class A) into sub-populations based on the category (Ans A, Ans B, Ans C). By looking at the spread of each individual sub-population compared to the parent population, one can determine how much of the variance is due to the categorization. In the extreme cases: (1) if the sub-populations individually form tight groups but are spread from each other, this is suggestive of a strong effect due to the category on the response; (2) if the sub-population individually are spread but the sub-populations overlap with each other, this is suggestive of a strong effect due to variables that are not accounted for. Based on the visual from the box plots, it appears that our data will be closer to case (2).

Excel has a data-analysis tool called “ANOVA: single factor” which will quickly do the analysis.

ANOVA
Source of Variation	SS	df	MS	F	P-value	F crit
Between Groups	205.6207	2	102.8104	0.454596	0.63605	3.090187
Within Groups	21937.29	97	226.1576

Total	22142.91	99

The “Source of Variance” labels the two effects: Between groups is an effect due to the categorization, within groups is due to unaccounted for variables. The “SS for between groups” is the weighted sum of the variance of a group’s mean to the total mean. The “SS for within groups” is the weighted sum of the group’s internal variance. “MS” is the mean square, which corresponds to the quotient of the SS with the degrees of freedom, and to the square of the standard deviation of the model. The primary comparison in the F-statistic, which is the ratio of the two MS. A large F corresponds to a strong effect due to the categorical variable, a small F corresponds to a small effect. The F-statistic has its own probability distribution and can be matched to a p-value, which can thus determine whether or not the effect due to the categorical variable is statistically significant or not.

As we guessed from the box plot, the effect of students’ response to Pretest Q3 were insignificant compared to variables unaccounted for. Our null hypothesis is that students’ score on the midterm is independent of their response to Pretest Q3, and the high p-value means that we cannot reject the null hypothesis.

One-way ANCOVA:

The ANOVA suggests that although there is a small effect of students’ response to Pretest Q3 on midterm 1 scores, much of the variance in unaccounted for. We hypothesized that tutorial participation may be a strong influence on midterm 1 scores. Thus if we try to account for the effect of tutorial participation on midterm 1 scores and then look at the relation between our two variables of interest, we are doing a one-way analysis of covariance.

t-test: t = -1.8712, df = 83.715, p-value = 0.06482

We did a little bit of digging, but couldn’t find a well-defined way to “account for the effects of the covariate.” A quick t-test between students who attended and students who did not attend show that the effect is quite strong, although not quite statistically significant. We tried curving the grades of the students who did not attend by finding their number of standard deviations from their mean, and then giving them the corresponding score on the distribution of those who attended tutorial. Then, we used Excel's ANOVA function on the adjusted scores.

ANCOVA
Source of Variation	SS	df	MS	F	P-value	F crit
Between Groups	431.8773	2	215.9386	1.133506	0.326129	3.090187
Within Groups	18478.99	97	190.505

Total	18910.86	99

It appears that the “SS within groups” has decreased as the ANCOVA is supposed to do. However, there is a noticeable increase in the “SS between groups.” This means that after adjusting the scores, the separation between the groups increased, for whatever reason. Both of these effects contribute to an increased F-statistic, but still not statistically significant.

Contingency Table:

A contingency table involves two or more independent variables (one as a set of rows and the other as a set of columns) to create a matrix of cells into which the dependent data is sorted. Any individual measurement exists in one and only one cell; for instance each student is in only one of the class-standing bins and only chose one answer for the pretest question. This can be done for arbitrarily many variables, though the resulting tensor is much harder to represent.

The contingency table contains an extra row and column with marginal totals, and the bottom-right cell is the grand total.

		Pretest response
		A	B	C	Totals
Score	Upper	6	24	3	33
	Middle	10	14	10	34
	Lower	8	20	5	33
	Totals	24	58	18	100

It appears that the middle third may be different from the upper and lower thirds. However to determine if this is so, we need to use Pearson’s chi-squared test. This test determines how likely the difference between the two sets (data sorted by pretest response and data sorted by class standing) comes from chance.

Null hypothesis: that the two sets are statistically independent.

With the null hypothesis, we can generate predicted table entries based on the distributions of the Total row and the Total column. The expected values are:

		Pretest response
		A	B	C	Totals
Score	Upper	7.92	19.14	5.94	33
	Middle	8.16	19.72	6.12	34
	Lower	7.92	19.14	5.94	33
	Totals	24	58	18	100

Assumptions:

Simple random sample: the data are drawn from a larger population such that any member of the population is equally likely to have been selected.
Expected cell values: no expected value is <1, and no more than 20% of all expected values are <5. (For a 2x2 contingency table, no expected value <5).
Independence: the observations are assumed to be independent of each other. This test cannot be used for matched data, for example.

We then compute a χ² statistic based on the differences between the expected values and our data, and then compare that to a table. With 4 degrees of freedom at the 0.05 confidence interval, we have χ²_{4, 0.95} = 9.49. Our calculated value χ² = 7.88, so we cannot reject the null hypothesis. There is not sufficient evidence to suggest that any differences between the sets did not come from chance.

Search This Blog

STATISTICS FOR PER

Week 3 -> 4: Are student responses to Pretest Q3 related to student grade on Midterm 1 in class A?

Comments

Post a Comment

Popular Posts

Notes from group discussion 4/21/2017

Semi-Summary of Week 1 discussion