Review Correlation and Regression

Name

Institution

 

 

 

 

 

 

 

 

 

 

 

 

 

  1. Exploratory data analysis
  2. Exploratory data analysis on variables

Student Agreeableness/Lecture Agreeableness

Student Extroversion/Lecture Extroversion

 

 

Student Agreeableness/Lecture Extroversion

 

Student Extroversion/Lecture Agreeableness

  1. Give a one to two paragraphs, write up of the data once you have done this.

The analysis of the relationship between lecture and student agreeableness and Extroversion shows that there is a broader data distribution with a weaker relationship between the variables that have been considered in the study. Student Agreeableness and Lecture Agreeableness relationship is weak with the presence of outliers that are visible in the dataset with r2 = 0.03. The scatterplot of Student Extroversion and Lecture Extroversion shows a wider distribution of the data with the relationship explained by r2 = 0.023, showing that the relationship is weak. The Student Agreeableness and Lecture Extroversion also show that the distribution of is broad data distribution with the presence of outliers that are visible in the scatterplot. The relationship between the variables is weak, as explained by r2 = 0.002. The scatterplot between Student Extroversion and Lecture Agreeableness shows that there is no linear relationship between the variables, as explained by r2 = 1.8e-05.

  1. Create an APA style table that presents descriptive statistics for the sample.

 

Descriptive Statistics
N Minimum Maximum Mean Std. Deviation
Student Extroversion 418 5.00 46.00 30.1029 6.31897
Student Agreeableness 413 25.00 73.00 46.5157 7.45295
Student wants Extroversion in lecturers 283 -6.00 28.00 12.9576 6.94494
Student wants Agreeableness in lecturers 417 -21.00 29.00 8.8825 9.57577
Valid N (listwise) 271

 

  1. Make a decision about the missing data. How are you going to handle it, and why?

The missing data obtained from the dataset will be excluded from the analysis. This is to ensure that there is consistency in the data considered in making decisions.  Missing values are likely to have a negative influence on the findings, especially when there are many missing values. Thus, excluding them from the analysis is the most efficient way of maintaining the reliability of the findings.

  1. Correlation

 

Correlations
Student Extroversion Student Agreeableness Student wants Extroversion in lecturers Student wants Agreeableness in lecturers
Student Extroversion Pearson Correlation 1 .080 .153* .004
Sig. (2-tailed) .106 .010 .932
N 418 406 281 411
Student Agreeableness Pearson Correlation .080 1 .050 .164**
Sig. (2-tailed) .106 .412 .001
N 406 413 276 405
Student wants Extroversion in lecturers Pearson Correlation .153* .050 1 .118*
Sig. (2-tailed) .010 .412 .049
N 281 276 283 280
Student wants Agreeableness in lecturers Pearson Correlation .004 .164** .118* 1
Sig. (2-tailed) .932 .001 .049
N 411 405 280 417
*. Correlation is significant at the 0.05 level (2-tailed).
**. Correlation is significant at the 0.01 level (2-tailed).

 

The test that has been conducted is two-tailed because there is a need to determine whether there is a zero correlation between the variables included in the analysis.

Results interpretation

A Pearson r correlation was conducted to determine whether there was a statistically significant correlation between the variables. The findings from the analysis showed that there was a weak positive, statistically significant correlation between student agreeableness and lecture agreeableness, (r = 0.164, p= 0.001). There was also a weak positive, a statistically significant correlation between student extroversion and lecture Extroversion, (r = 0.153, p= 0.01).

 

 

 

 

  1. Regression
Model Summary
Model R R Square Adjusted R Square Std. Error of the Estimate
1 .153a .023 .020 6.82989
a. Predictors: (Constant), Student Extroversion

 

ANOVAa
Model Sum of Squares df Mean Square F Sig.
1 Regression 311.947 1 311.947 6.687 .010b
Residual 13014.630 279 46.647
Total 13326.577 280
a. Dependent Variable: Student wants Extroversion in lecturers
b. Predictors: (Constant), Student Extroversion

 

Coefficientsa
Model Unstandardized Coefficients Standardized Coefficients t Sig.
B Std. Error Beta
1 (Constant) 8.220 1.866 4.405 .000
Student Extroversion .160 .062 .153 2.586 .010
a. Dependent Variable: Student wants Extroversion in lecturers

 

Two-tailed or one-tailed

The test that conducted was a two-tailed test because it sought to determine whether the student’s extroversion score would predict if a student wants a lecturer to be extroverted. Thus two-tailed tests present a favorable focus on the outcomes.

The assumptions

Conducting a regression analysis incorporates certain assumptions that must be met. The basic assumption is that the dependent variable must be measured on a continuous level. The lecturer extroversion score is a continuous variable measured on a ratio scale hence fulfilling the variable. The independent variables must be continuous or categorical. The independent variable student’s extroversion score is a continuous variable.

Results

A regression analysis was conducted to examine whether or not the student wants a lecturer to be extroverted can be predicted using the student’s extroversion score. The findings showed that student’s extroversion score could be effectively used in predicting lecture extroversion at 95% confidence level (p = 0.01, p<0.05). The analysis also revealed that 23% of lecturer extroversion could be explained by student’s extroversion score.

The results are similar to the correlation analysis results, which found that there was a weak positive correlation between a student’s extroversion score and lecturer extroversion score.

  1. Multiple regression analysis
Model Summary
Model R R Square Adjusted R Square Std. Error of the Estimate
1 .168a .028 .018 6.82934
a. Predictors: (Constant), Gender, Student Extroversion, Age

 

ANOVAa
Model Sum of Squares df Mean Square F Sig.
1 Regression 373.952 3 124.651 2.673 .048b
Residual 12872.615 276 46.640
Total 13246.568 279
a. Dependent Variable: Student wants Extroversion in lecturers
b. Predictors: (Constant), Gender, Student Extroversion, Age

 

 

Coefficientsa
Model Unstandardized Coefficients Standardized Coefficients t Sig.
B Std. Error Beta
1 (Constant) 7.560 2.844 2.658 .008
Student Extroversion .161 .062 .155 2.607 .010
Age .019 .109 .010 .169 .866
Gender 1.036 .927 .066 1.118 .265
a. Dependent Variable: Student wants Extroversion in lecturers

A two-tailed or one-tailed test

The regression analysis test utilized a two-tailed test.  The analysis sought to determine whether age, gender, and student extroversion score predict lecturer’s extroversion.

Assumptions

The additional assumptions when conducting multiple regression analysis is that the independent variables must be two or more. The independent variables that have been included in the analysis are three.  It also assumes that the independent variables are not highly correlated with each other.

Results

A multiple regression analysis was conducted to determine whether age, gender, and student’s extroversion score predict lecturer extroversion. The results found that only student’s extroversion score could be effectively used in predicting lecturer extroversion at 95% confidence level (p = 0.01, p<0.05).

From the correlation above, student’s extroversion score was significantly correlated to the lecturer extroversion. The multiple regression conducted in this case has shown that a student’s Extroversion predicts lecturer extroversion.

Part B. Applying Analytical Strategies to an Area of Research Interest. 

    1. Briefly restate your research area of interest.

The number of confirmed Corona Virus cases continues to increase significantly in the recent past. There are different projects that have been modelled, presenting the expected cases of Coronavirus at different times. Therefore the variables that will be assessed in this case will include the actual number of cases and the projected cases. The variables included in this case are the projected positive cases and the real positive cases.

  1. Pearson correlation
Correlations
Actual Projected
Actual Pearson Correlation 1 .986**
Sig. (2-tailed) .000
N 20 20
Projected Pearson Correlation .986** 1
Sig. (2-tailed) .000
N 20 20
**. Correlation is significant at the 0.01 level (2-tailed).

 

A Pearson correlation analysis was conducted to determine whether or not there was a relationship between the actual and projected coronavirus cases. The results found that there was a strong positive relationship between actual and projected coronavirus data( r = 0.986, p = 0.000, p<0.05). The coefficient of determination (r2) = 0.972. Thus the projected data explains 97.2% of the actual data.

 

 

 

 

  1. Spearman correlation

 

Correlations
Actual Projected
Spearman’s rho Actual Correlation Coefficient 1.000 1.000**
Sig. (2-tailed) . .
N 20 20
Projected Correlation Coefficient 1.000** 1.000
Sig. (2-tailed) . .
N 20 20
**. Correlation is significant at the 0.01 level (2-tailed).

 

A Spearman rank-order correlation analysis was conducted to determine whether or not there was a relationship between the actual and projected coronavirus cases. The results found that there was a strong positive relationship between actual and projected corona virus data (rs(20) = 1, p <0.05). The coefficient of determination (r2) = 1. Thus the projected data explains 100% of the actual data.

  1. Partial Correlation vs Semi-Partial Correlation.

Partial correlation

The variables that are assessed in this case include actual data, projected data, and the average age of those found positive with Corona Virus. The three variables are continuous variables that are measured on a ratio scale.

 

Correlations
Control Variables Actual Projected Avg_age
-none-a Actual Correlation 1.000 .986 .467
Significance (2-tailed) . .000 .038
df 0 18 18
Projected Correlation .986 1.000 .499
Significance (2-tailed) .000 . .025
df 18 0 18
Avg_age Correlation .467 .499 1.000
Significance (2-tailed) .038 .025 .
df 18 18 0
Avg_age Actual Correlation 1.000 .982
Significance (2-tailed) . .000
df 0 17
Projected Correlation .982 1.000
Significance (2-tailed) .000 .
df 17 0
a. Cells contain zero-order (Pearson) correlations.

 

A partial correlation was run to determine the relationship between actual data and projected data while controlling for the average age. The results showed that there was a strong positive partial correlation between actual data and projected data r(17) = 0.982, p = 0.000, p<0.05.

Semi partial correlation

 

Coefficientsa
Model Unstandardized Coefficients Standardized Coefficients t Sig. Correlations
B Std. Error Beta Zero-order Partial Part
1 (Constant) 40.616 48.319 .841 .412
Projected .761 .035 1.002 21.512 .000 .986 .982 .869
Avg_age -.562 .785 -.033 -.717 .483 .467 -.171 -.029
a. Dependent Variable: Actual

 

A semi partial correlation analysis was conducted to determine the relationship between actual and projected data while controlling for age. The results showed that there was a significant correlation between actual and age while controlling for age r (20) = 0.869, p = 000, p<0.05).

Comparing the two correlation analysis, the results are related, although partial correlation presents a better understanding of the relationship between the variables that were being assessed.

  1. Simple regression analysis

The variables that would be considered in calculating regression analysis is the actual cases and projected cases. The outcome variable is actual cases, while the predictor variable is the projected cases. These variables have been measured on a ratio scale. This is because the analysis will focus on assessing whether the actual cases can be predicted by the projected cases.

Model Summary
Model R R Square Adjusted R Square Std. Error of the Estimate
1 .986a .971 .970 9.517
a. Predictors: (Constant), Projected
ANOVAa
Model Sum of Squares df Mean Square F Sig.
1 Regression 55454.702 1 55454.702 612.290 .000b
Residual 1630.248 18 90.569
Total 57084.950 19
a. Dependent Variable: Actual
b. Predictors: (Constant), Projected

 

Coefficientsa
Model Unstandardized Coefficients Standardized Coefficients t Sig.
B Std. Error Beta
1 (Constant) 6.063 3.192 1.899 .074
Projected .749 .030 .986 24.744 .000
a. Dependent Variable: Actual

 

The model summary shows that the coefficient of determination (r2) = 0.97. This shows that projected cases can explain 97% of the actual cases. There is a strong relationship between actual and projected cases. The analysis also found that projected cases are a significant predictor of actual coronavirus cases.

 

  1. Multiple regression

The variables that would be considered in calculating multiple regression include actual cases, projected cases and the average age of the patients. All three variables are continuous variables measured on a ratio scale. A Bivariate linear regression method will be used in this case based on the Enter method. This because there are two predictor variables which are evaluated on a linear context.

 

Model Summary
Model R R Square Adjusted R Square Std. Error of the Estimate
1 .986a .972 .969 9.648
a. Predictors: (Constant), Avg_age, Projected

 

 

ANOVAa
Model Sum of Squares df Mean Square F Sig.
1 Regression 55502.517 2 27751.258 298.130 .000b
Residual 1582.433 17 93.084
Total 57084.950 19
a. Dependent Variable: Actual
b. Predictors: (Constant), Avg_age, Projected

 

 

Coefficientsa
Model Unstandardized Coefficients Standardized Coefficients t Sig.
B Std. Error Beta
1 (Constant) 40.616 48.319 .841 .412
Projected .761 .035 1.002 21.512 .000
Avg_age -.562 .785 -.033 -.717 .483
a. Dependent Variable: Actual

 

The model summary shows that the coefficient of determination (r2) = 0.972. This shows that projected cases and average age predicts 97% of the actual cases. There is a strong relationship between actual and projected cases. The analysis also found that only projected cases are a significant predictor of actual coronavirus cases.

  1. Logistic regression

The three variables that have been included in the analysis are actual positive cases, average age and the setting.  Actual cases and average age are predictor variables that are continuous and measured on a ratio scale, while the setting is a categorical variable that is measured on a nominal scale. Binary logistic regression  Enter method will be used, considering that the outcome variable has only two groups.

Variables in the Equation
B S.E. Wald df Sig. Exp(B)
Step 1a Projected -.004 .008 .282 1 .595 .996
Avg_age -.076 .168 .206 1 .650 .927
Constant 4.922 10.344 .226 1 .634 137.209
a. Variable(s) entered on step 1: Projected, Avg_age.

The analysis shows that an increase in projected cases is likely to have a 0.996 chance occurrence in an urban area than rural area while the increase in age is 0.927 times likely to be in an urban area than in rural setting.