# 1. The data set on sheet #1 gi

1. The data set on sheet #1 gives data on GPA category andnumber of hours studied. Construct comparative box plots of thedata first GPA category. Then conduct two-sample t-test on the datafor whether GPA category influences the number of hours studied. Beprepared to explain the results of the test and the meaning of theboxplots and how they relate to each other. Then redo the analysisby replacing the ordinal GPA category with a numerical dummyvariable with Low=0, High=1. Run a regression analysis on how studyhours (x) influence GPA category (y). Include the scatterplot.Compare the results of the two tests. Be able to state and null andalternative hypotheses

 Student GPA Hours per week 1 Low 6 2 Low 18 3 Low 16 4 Low 14 5 High 0 6 Low 22 7 Low 15 8 Low 12 9 High 6 10 Low 7 11 Low 5 12 High 20 13 High 9 14 High 9 15 Low 22 16 Low 23 17 High 8 18 Low 7 19 Low 14 20 Low 12 21 Low 0 22 High 7 23 High 4 24 Low 9 25 Low 0 26 Low 0 27 High 6 28 High 14 29 Low 10 30 Low 9 31 High 5 32 High 7 33 High 4 34 High 16 35 High 0 36 Low 20 37 Low 13 38 High 0 39 High 4 40 Low 6 41 Low 17 42 Low 8 43 High 4 44 Low 0 45 High 16 46 Low 17 47 Low 4 48 High 11 49 Low 14 50 Low 16 51 High 11 52 High 7 53 High 4 54 Low 11 55 Low 8 56 High 2 57 Low 0 58 Low 0 59 High 13 60 Low 18 61 Low 28 62 High 1 63 Low 20 64 Low 13 65 Low 4 66 Low 7 67 High 11 68 Low 12 69 High 5 70 Low 7 71 Low 22 72 High 8 73 Low 19 74 Low 8 75 High 2 76 High 11 77 Low 18 78 Low 20 79 High 7 80 High 4 81 High 4 82 High 16 83 High 15 84 Low 9 85 High 8 86 High 10 87 Low 13 88 High 9 89 Low 2 90 Low 22 91 Low 12 92 High 6 93 High 9 94 Low 20 95 Low 14 96 High 7 97 High 15 98 High 9 99 High 2 100 Low 23

Box plot:

we can observe that, Mean number of hours studied by GPA-Lowstudents is greater than Mean number of hours studied by GPA-highstudents. Now we need to test this statement using 2 samplet-test.

2 sample t-test:

Null hypothesis Ho: There is no difference in mean number ofhours studied by GPA-Low students and mean number of hours studiedby GPA-High students.

Alternative hypothesis H1: Mean number of hours studied byGPA-Low students is greater than Mean number ofhours studied by GPA-high students.

(So this is a right tailed or one tailed test)

Test statistic:

where

By usual definition of mean and standard deviation we get,

Substituting the above values in test statistic equation weget,

t=3.511

and degrees of freedom

Now to draw the conclusion, we need to compare the t value(3.511) with t-distribution value at 5% level of significance() withdegrees of freedom 98. (its called critical value)

i.e from t-distibution table we get

Since , we reject the null hypothesis at 5% level of significance.

Which means “Mean number of hours studied by GPA-Lowstudents is greater than Mean number of hours studied by GPA-highstudents.”

Or

“GPA category influence the number of hoursstudied”

Scatter Plot:

we can observe from above Scatter plot that, there is no linearrelationship between, Number of hours studied and GPA category.Since dependent variable GPA category is binary (o or 1) we can tryto fit a logistic regression.

Logistic regression Model:

Logistic regression model is given by,

and we get the model,

