1. The data set on sheet #1 gi
1. The data set on sheet #1 gives data on GPA category andnumber of hours studied. Construct comparative box plots of thedata first GPA category. Then conduct two-sample t-test on the datafor whether GPA category influences the number of hours studied. Beprepared to explain the results of the test and the meaning of theboxplots and how they relate to each other. Then redo the analysisby replacing the ordinal GPA category with a numerical dummyvariable with Low=0, High=1. Run a regression analysis on how studyhours (x) influence GPA category (y). Include the scatterplot.Compare the results of the two tests. Be able to state and null andalternative hypotheses
Student | GPA | Hours per week |
1 | Low | 6 |
2 | Low | 18 |
3 | Low | 16 |
4 | Low | 14 |
5 | High | 0 |
6 | Low | 22 |
7 | Low | 15 |
8 | Low | 12 |
9 | High | 6 |
10 | Low | 7 |
11 | Low | 5 |
12 | High | 20 |
13 | High | 9 |
14 | High | 9 |
15 | Low | 22 |
16 | Low | 23 |
17 | High | 8 |
18 | Low | 7 |
19 | Low | 14 |
20 | Low | 12 |
21 | Low | 0 |
22 | High | 7 |
23 | High | 4 |
24 | Low | 9 |
25 | Low | 0 |
26 | Low | 0 |
27 | High | 6 |
28 | High | 14 |
29 | Low | 10 |
30 | Low | 9 |
31 | High | 5 |
32 | High | 7 |
33 | High | 4 |
34 | High | 16 |
35 | High | 0 |
36 | Low | 20 |
37 | Low | 13 |
38 | High | 0 |
39 | High | 4 |
40 | Low | 6 |
41 | Low | 17 |
42 | Low | 8 |
43 | High | 4 |
44 | Low | 0 |
45 | High | 16 |
46 | Low | 17 |
47 | Low | 4 |
48 | High | 11 |
49 | Low | 14 |
50 | Low | 16 |
51 | High | 11 |
52 | High | 7 |
53 | High | 4 |
54 | Low | 11 |
55 | Low | 8 |
56 | High | 2 |
57 | Low | 0 |
58 | Low | 0 |
59 | High | 13 |
60 | Low | 18 |
61 | Low | 28 |
62 | High | 1 |
63 | Low | 20 |
64 | Low | 13 |
65 | Low | 4 |
66 | Low | 7 |
67 | High | 11 |
68 | Low | 12 |
69 | High | 5 |
70 | Low | 7 |
71 | Low | 22 |
72 | High | 8 |
73 | Low | 19 |
74 | Low | 8 |
75 | High | 2 |
76 | High | 11 |
77 | Low | 18 |
78 | Low | 20 |
79 | High | 7 |
80 | High | 4 |
81 | High | 4 |
82 | High | 16 |
83 | High | 15 |
84 | Low | 9 |
85 | High | 8 |
86 | High | 10 |
87 | Low | 13 |
88 | High | 9 |
89 | Low | 2 |
90 | Low | 22 |
91 | Low | 12 |
92 | High | 6 |
93 | High | 9 |
94 | Low | 20 |
95 | Low | 14 |
96 | High | 7 |
97 | High | 15 |
98 | High | 9 |
99 | High | 2 |
100 | Low | 23 |
Answer:
Box plot:
we can observe that, Mean number of hours studied by GPA-Lowstudents is greater than Mean number of hours studied by GPA-highstudents. Now we need to test this statement using 2 samplet-test.
2 sample t-test:
Null hypothesis Ho: There is no difference in mean number ofhours studied by GPA-Low students and mean number of hours studiedby GPA-High students.
Alternative hypothesis H1: Mean number of hours studied byGPA-Low students is greater than Mean number ofhours studied by GPA-high students.
(So this is a right tailed or one tailed test)
Test statistic:
where
By usual definition of mean and standard deviation we get,
Substituting the above values in test statistic equation weget,
t=3.511
and degrees of freedom
Now to draw the conclusion, we need to compare the t value(3.511) with t-distribution value at 5% level of significance() withdegrees of freedom 98. (its called critical value)
i.e from t-distibution table we get
Since , we reject the null hypothesis at 5% level of significance.
Which means “Mean number of hours studied by GPA-Lowstudents is greater than Mean number of hours studied by GPA-highstudents.”
Or
“GPA category influence the number of hoursstudied”
Scatter Plot:
we can observe from above Scatter plot that, there is no linearrelationship between, Number of hours studied and GPA category.Since dependent variable GPA category is binary (o or 1) we can tryto fit a logistic regression.
Logistic regression Model:
Logistic regression model is given by,
and we get the model,