(Week 2->3) Analysis of MT2 Scores: Are the 2 Classes Identical?

April 13, 2017

(Week 2->3) Analysis of MT2 Scores: Are the 2 Classes Identical?

Introduction

In this investigation, we compare the scores from Midterm 2 (MT2) for classes A and B. The main focus of our inquiry is to determine whether or not the students from the two classes are drawn from the same (larger) general population of students. Using a set of complementary analyses, we are able to conclude that, based upon their MT2 scores, the students from classes A and B do not originate from the same initial pool of students. A brief synopsis of our findings is included below. For a more thorough overview of our process, a synthesis between explanatory text and raw code is available in PDF form on Google Drive.

Exploratory Analysis

Having delved briefly into the twisting turns of the Rosner Fundamentals of Biostatistics' flowchart before (led courageously by our Week 1->2 group!), we are aware of a prerequisite need for several descriptors of our dataset:

Number of Variables (1...2?)
Number of Samples (2...0?)
Data Normality (Requires Testing)
Sample Independence (They are Independent)
Data Variance Similarity (Requires Testing)

Additionally, we are aware of a need to explore the dataset visually. Below are the box and whiskers plots, histograms, and Q-Q plots for each classes' MT2 scores.

We notice that R's boxplot function identifies five outliers at the lower extremity of Class A.

We notice qualitatively that the data for each class is roughly normal. The two notable exceptions to this are at the lower extremity of Class A's distribution where there is a slight peak near an MT2 score of 20 and at the upper extreme of class B's distribution (though less so than the discrepancy in Class A).

We notice, qualitatively, that the bulk of each class's data is arrayed linearly against the theoretical normal distribution. The major exception to the linear trends in these Q-Q plots is notable in the lower extremity of Class A, where a small subset of percentiles lay on a slightly shallower slope than the rest of the Class A data.

We may quantify the normality of each dataset using both the Shapiro-Wilk and Anderson-Darling normality tests. The null hypothesis for these tests is that the data is distributed normally. A p-value of less than 0.05 indicates a need to reject the null hypothesis, which would indicate non-normality in the data. The p-value results for these tests on each class's MT2 data are presented in the table below:

These tests indicate that we may proceed assuming that the data for Class B is distributed normally, however we must accept that the data for Class A is not distributed normally.

Though our data for Class A is non-normal, we still intend to perform a t-test on the two data sets. The t-test is supposed to be relatively robust to non-normality, Class A's data does not look qualitatively too non-normal, and we wish to test whether the t-test would actually work in this case (by using the other tests for comparison). The t-test has two types: one of equal variances and one of unequal variances, and so we need to compare whether the two dataset's variances are equal. The F test is available for assessing that. The null hypothesis for the F test is that the ratio of the variances is equal to 1. The result of an F test on these data finds F = 1.9 with a p-value of 0.0008. R's var.test function provides an alternative hypothesis that the ratio of the variances is not equal to 1 and provides a 95% confidence interval for the true ratio to lay between 1.3 and 2.8. We will consider our data to have unequal variances.

Permutation Test

Our decision tree through the Rosner flowchart that leads to nonparametric methods, namely the permutation test. This is similar to the Week 1->2 decision tree except that we have a non-normal distribution.

There are a number of nonparametric tests, but the only one that seems to apply to our situation is the permutation test. In essence, the permutation test examines the likelihood that, given the data as a whole, this particular arrangement should arise. As there are 220 students in total (100 in Class A and 120 in Class B), there are "220 choose 100" (3.7e64) ways of splitting the students into the two classes. The permutation test steps through every one of these possibilities and calculates the difference between the two class's mean MT2 score. Just like in thermodynamics, there are more arrangements that produce certain mean MT2 differences than others and so it is more likely that, given this total population of students, a random permutation would have a particular difference in mean MT2 scores than other differences in the mean MT2 scores. The null hypothesis for this test is that the average difference between the means is zero.

We calculate that the difference in the mean MT2 scores between Class A and Class B has a Z statistic of -3.9 with a p-value of 0.0001 meaning we should reject the null hypothesis. An alternative hypothesis is that the difference between the two groups is not equal to zero. We conclude from this test that the classes are not from the same larger sample population.

Kruskal-Wallis Test

Our second decision tree through the Rosner flowchart that leads to the Kruskal-Wallis test. We decide in this instance to question our earlier assumption that the class designation identifies two samples, instead considering the designation as a second binary variable (either Class A or not Class A).

The Kruskal-Wallis test is usually used for comparing more than two samples, but it will work for our case still. The test is a non-parametric alternative to the one-way analysis of variance (ANOVA) and tests whether the samples originate from the same distribution.

T-Test with Unequal Variances

The final decision tree through the Rosner flowchart that leads to the t-test with unequal variances. We believe that the data is near enough to being normal and that the t-test is robust enough against non-normality that this analysis would be valid. Additionally, after removing the outliers from Class A, the dataset is normally distributed and so this test certainly applies.

The t-test was used in the previous week, so we will just jump right to the numbers. The null hypothesis is that the difference in the mean MT2 scores between the two classes is zero. This test provides a p-value of 1.4e-4, which is less than 0.05 so we may reject the null hypothesis. An alternative hypothesis is that the difference in the means is not zero (i.e.: the means are significantly different).

We are also able to run this test on the data after removing the outliers from Class A, which would make that dataset normally distributed. The details of this analysis are included in the full analysis file on Google Drive. The results are similar to the above.

Summary

Using several tests, we are able to conclude that the MT2 scores indicate that the students for Classes A and B originate from distinctly different student populations because the MT2 scores are statistically different.

The various tests are consistent with one another, which also allows us to qualitatively test the sensitivity of the t-test to non-normal distributions. In this case, the t-test seems to function fine, despite the slight non-normality in Class A's MT2 score distribution.

Search This Blog

STATISTICS FOR PER