Review Correlation and Regression
Name
Institution
- Exploratory data analysis
- Exploratory data analysis on variables
Student Agreeableness/Lecture Agreeableness
Student Extroversion/Lecture Extroversion
Student Agreeableness/Lecture Extroversion
Student Extroversion/Lecture Agreeableness
- Give a one to two paragraphs, write up of the data once you have done this.
The analysis of the relationship between lecture and student agreeableness and Extroversion shows that there is a broader data distribution with a weaker relationship between the variables that have been considered in the study. Student Agreeableness and Lecture Agreeableness relationship is weak with the presence of outliers that are visible in the dataset with r2 = 0.03. The scatterplot of Student Extroversion and Lecture Extroversion shows a wider distribution of the data with the relationship explained by r2 = 0.023, showing that the relationship is weak. The Student Agreeableness and Lecture Extroversion also show that the distribution of is broad data distribution with the presence of outliers that are visible in the scatterplot. The relationship between the variables is weak, as explained by r2 = 0.002. The scatterplot between Student Extroversion and Lecture Agreeableness shows that there is no linear relationship between the variables, as explained by r2 = 1.8e-05.
- Create an APA style table that presents descriptive statistics for the sample.
Descriptive Statistics | |||||
N | Minimum | Maximum | Mean | Std. Deviation | |
Student Extroversion | 418 | 5.00 | 46.00 | 30.1029 | 6.31897 |
Student Agreeableness | 413 | 25.00 | 73.00 | 46.5157 | 7.45295 |
Student wants Extroversion in lecturers | 283 | -6.00 | 28.00 | 12.9576 | 6.94494 |
Student wants Agreeableness in lecturers | 417 | -21.00 | 29.00 | 8.8825 | 9.57577 |
Valid N (listwise) | 271 |
- Make a decision about the missing data. How are you going to handle it, and why?
The missing data obtained from the dataset will be excluded from the analysis. This is to ensure that there is consistency in the data considered in making decisions. Missing values are likely to have a negative influence on the findings, especially when there are many missing values. Thus, excluding them from the analysis is the most efficient way of maintaining the reliability of the findings.
- Correlation
Correlations | |||||
Student Extroversion | Student Agreeableness | Student wants Extroversion in lecturers | Student wants Agreeableness in lecturers | ||
Student Extroversion | Pearson Correlation | 1 | .080 | .153* | .004 |
Sig. (2-tailed) | .106 | .010 | .932 | ||
N | 418 | 406 | 281 | 411 | |
Student Agreeableness | Pearson Correlation | .080 | 1 | .050 | .164** |
Sig. (2-tailed) | .106 | .412 | .001 | ||
N | 406 | 413 | 276 | 405 | |
Student wants Extroversion in lecturers | Pearson Correlation | .153* | .050 | 1 | .118* |
Sig. (2-tailed) | .010 | .412 | .049 | ||
N | 281 | 276 | 283 | 280 | |
Student wants Agreeableness in lecturers | Pearson Correlation | .004 | .164** | .118* | 1 |
Sig. (2-tailed) | .932 | .001 | .049 | ||
N | 411 | 405 | 280 | 417 | |
*. Correlation is significant at the 0.05 level (2-tailed). | |||||
**. Correlation is significant at the 0.01 level (2-tailed). |
The test that has been conducted is two-tailed because there is a need to determine whether there is a zero correlation between the variables included in the analysis.
Results interpretation
A Pearson r correlation was conducted to determine whether there was a statistically significant correlation between the variables. The findings from the analysis showed that there was a weak positive, statistically significant correlation between student agreeableness and lecture agreeableness, (r = 0.164, p= 0.001). There was also a weak positive, a statistically significant correlation between student extroversion and lecture Extroversion, (r = 0.153, p= 0.01).
- Regression
Model Summary | ||||
Model | R | R Square | Adjusted R Square | Std. Error of the Estimate |
1 | .153a | .023 | .020 | 6.82989 |
a. Predictors: (Constant), Student Extroversion |
ANOVAa | ||||||
Model | Sum of Squares | df | Mean Square | F | Sig. | |
1 | Regression | 311.947 | 1 | 311.947 | 6.687 | .010b |
Residual | 13014.630 | 279 | 46.647 | |||
Total | 13326.577 | 280 | ||||
a. Dependent Variable: Student wants Extroversion in lecturers | ||||||
b. Predictors: (Constant), Student Extroversion |
Coefficientsa | ||||||
Model | Unstandardized Coefficients | Standardized Coefficients | t | Sig. | ||
B | Std. Error | Beta | ||||
1 | (Constant) | 8.220 | 1.866 | 4.405 | .000 | |
Student Extroversion | .160 | .062 | .153 | 2.586 | .010 | |
a. Dependent Variable: Student wants Extroversion in lecturers |
Two-tailed or one-tailed
The test that conducted was a two-tailed test because it sought to determine whether the student’s extroversion score would predict if a student wants a lecturer to be extroverted. Thus two-tailed tests present a favorable focus on the outcomes.
The assumptions
Conducting a regression analysis incorporates certain assumptions that must be met. The basic assumption is that the dependent variable must be measured on a continuous level. The lecturer extroversion score is a continuous variable measured on a ratio scale hence fulfilling the variable. The independent variables must be continuous or categorical. The independent variable student’s extroversion score is a continuous variable.
Results
A regression analysis was conducted to examine whether or not the student wants a lecturer to be extroverted can be predicted using the student’s extroversion score. The findings showed that student’s extroversion score could be effectively used in predicting lecture extroversion at 95% confidence level (p = 0.01, p<0.05). The analysis also revealed that 23% of lecturer extroversion could be explained by student’s extroversion score.
The results are similar to the correlation analysis results, which found that there was a weak positive correlation between a student’s extroversion score and lecturer extroversion score.
- Multiple regression analysis
Model Summary | ||||
Model | R | R Square | Adjusted R Square | Std. Error of the Estimate |
1 | .168a | .028 | .018 | 6.82934 |
a. Predictors: (Constant), Gender, Student Extroversion, Age |
ANOVAa | ||||||
Model | Sum of Squares | df | Mean Square | F | Sig. | |
1 | Regression | 373.952 | 3 | 124.651 | 2.673 | .048b |
Residual | 12872.615 | 276 | 46.640 | |||
Total | 13246.568 | 279 | ||||
a. Dependent Variable: Student wants Extroversion in lecturers | ||||||
b. Predictors: (Constant), Gender, Student Extroversion, Age |
Coefficientsa | ||||||
Model | Unstandardized Coefficients | Standardized Coefficients | t | Sig. | ||
B | Std. Error | Beta | ||||
1 | (Constant) | 7.560 | 2.844 | 2.658 | .008 | |
Student Extroversion | .161 | .062 | .155 | 2.607 | .010 | |
Age | .019 | .109 | .010 | .169 | .866 | |
Gender | 1.036 | .927 | .066 | 1.118 | .265 | |
a. Dependent Variable: Student wants Extroversion in lecturers |
A two-tailed or one-tailed test
The regression analysis test utilized a two-tailed test. The analysis sought to determine whether age, gender, and student extroversion score predict lecturer’s extroversion.
Assumptions
The additional assumptions when conducting multiple regression analysis is that the independent variables must be two or more. The independent variables that have been included in the analysis are three. It also assumes that the independent variables are not highly correlated with each other.
Results
A multiple regression analysis was conducted to determine whether age, gender, and student’s extroversion score predict lecturer extroversion. The results found that only student’s extroversion score could be effectively used in predicting lecturer extroversion at 95% confidence level (p = 0.01, p<0.05).
From the correlation above, student’s extroversion score was significantly correlated to the lecturer extroversion. The multiple regression conducted in this case has shown that a student’s Extroversion predicts lecturer extroversion.
Part B. Applying Analytical Strategies to an Area of Research Interest.
- Briefly restate your research area of interest.
The number of confirmed Corona Virus cases continues to increase significantly in the recent past. There are different projects that have been modelled, presenting the expected cases of Coronavirus at different times. Therefore the variables that will be assessed in this case will include the actual number of cases and the projected cases. The variables included in this case are the projected positive cases and the real positive cases.
- Pearson correlation
Correlations | |||
Actual | Projected | ||
Actual | Pearson Correlation | 1 | .986** |
Sig. (2-tailed) | .000 | ||
N | 20 | 20 | |
Projected | Pearson Correlation | .986** | 1 |
Sig. (2-tailed) | .000 | ||
N | 20 | 20 | |
**. Correlation is significant at the 0.01 level (2-tailed). |
A Pearson correlation analysis was conducted to determine whether or not there was a relationship between the actual and projected coronavirus cases. The results found that there was a strong positive relationship between actual and projected coronavirus data( r = 0.986, p = 0.000, p<0.05). The coefficient of determination (r2) = 0.972. Thus the projected data explains 97.2% of the actual data.
- Spearman correlation
Correlations | ||||
Actual | Projected | |||
Spearman’s rho | Actual | Correlation Coefficient | 1.000 | 1.000** |
Sig. (2-tailed) | . | . | ||
N | 20 | 20 | ||
Projected | Correlation Coefficient | 1.000** | 1.000 | |
Sig. (2-tailed) | . | . | ||
N | 20 | 20 | ||
**. Correlation is significant at the 0.01 level (2-tailed). |
A Spearman rank-order correlation analysis was conducted to determine whether or not there was a relationship between the actual and projected coronavirus cases. The results found that there was a strong positive relationship between actual and projected corona virus data (rs(20) = 1, p <0.05). The coefficient of determination (r2) = 1. Thus the projected data explains 100% of the actual data.
- Partial Correlation vs Semi-Partial Correlation.
Partial correlation
The variables that are assessed in this case include actual data, projected data, and the average age of those found positive with Corona Virus. The three variables are continuous variables that are measured on a ratio scale.
Correlations | |||||
Control Variables | Actual | Projected | Avg_age | ||
-none-a | Actual | Correlation | 1.000 | .986 | .467 |
Significance (2-tailed) | . | .000 | .038 | ||
df | 0 | 18 | 18 | ||
Projected | Correlation | .986 | 1.000 | .499 | |
Significance (2-tailed) | .000 | . | .025 | ||
df | 18 | 0 | 18 | ||
Avg_age | Correlation | .467 | .499 | 1.000 | |
Significance (2-tailed) | .038 | .025 | . | ||
df | 18 | 18 | 0 | ||
Avg_age | Actual | Correlation | 1.000 | .982 | |
Significance (2-tailed) | . | .000 | |||
df | 0 | 17 | |||
Projected | Correlation | .982 | 1.000 | ||
Significance (2-tailed) | .000 | . | |||
df | 17 | 0 | |||
a. Cells contain zero-order (Pearson) correlations. |
A partial correlation was run to determine the relationship between actual data and projected data while controlling for the average age. The results showed that there was a strong positive partial correlation between actual data and projected data r(17) = 0.982, p = 0.000, p<0.05.
Semi partial correlation
Coefficientsa | |||||||||
Model | Unstandardized Coefficients | Standardized Coefficients | t | Sig. | Correlations | ||||
B | Std. Error | Beta | Zero-order | Partial | Part | ||||
1 | (Constant) | 40.616 | 48.319 | .841 | .412 | ||||
Projected | .761 | .035 | 1.002 | 21.512 | .000 | .986 | .982 | .869 | |
Avg_age | -.562 | .785 | -.033 | -.717 | .483 | .467 | -.171 | -.029 | |
a. Dependent Variable: Actual |
A semi partial correlation analysis was conducted to determine the relationship between actual and projected data while controlling for age. The results showed that there was a significant correlation between actual and age while controlling for age r (20) = 0.869, p = 000, p<0.05).
Comparing the two correlation analysis, the results are related, although partial correlation presents a better understanding of the relationship between the variables that were being assessed.
- Simple regression analysis
The variables that would be considered in calculating regression analysis is the actual cases and projected cases. The outcome variable is actual cases, while the predictor variable is the projected cases. These variables have been measured on a ratio scale. This is because the analysis will focus on assessing whether the actual cases can be predicted by the projected cases.
Model Summary | |||||||||
Model | R | R Square | Adjusted R Square | Std. Error of the Estimate | |||||
1 | .986a | .971 | .970 | 9.517 | |||||
a. Predictors: (Constant), Projected | |||||||||
ANOVAa | |||||||||
Model | Sum of Squares | df | Mean Square | F | Sig. | ||||
1 | Regression | 55454.702 | 1 | 55454.702 | 612.290 | .000b | |||
Residual | 1630.248 | 18 | 90.569 | ||||||
Total | 57084.950 | 19 | |||||||
a. Dependent Variable: Actual | |||||||||
b. Predictors: (Constant), Projected | |||||||||
Coefficientsa | ||||||
Model | Unstandardized Coefficients | Standardized Coefficients | t | Sig. | ||
B | Std. Error | Beta | ||||
1 | (Constant) | 6.063 | 3.192 | 1.899 | .074 | |
Projected | .749 | .030 | .986 | 24.744 | .000 | |
a. Dependent Variable: Actual |
The model summary shows that the coefficient of determination (r2) = 0.97. This shows that projected cases can explain 97% of the actual cases. There is a strong relationship between actual and projected cases. The analysis also found that projected cases are a significant predictor of actual coronavirus cases.
- Multiple regression
The variables that would be considered in calculating multiple regression include actual cases, projected cases and the average age of the patients. All three variables are continuous variables measured on a ratio scale. A Bivariate linear regression method will be used in this case based on the Enter method. This because there are two predictor variables which are evaluated on a linear context.
Model Summary | ||||
Model | R | R Square | Adjusted R Square | Std. Error of the Estimate |
1 | .986a | .972 | .969 | 9.648 |
a. Predictors: (Constant), Avg_age, Projected |
ANOVAa | ||||||
Model | Sum of Squares | df | Mean Square | F | Sig. | |
1 | Regression | 55502.517 | 2 | 27751.258 | 298.130 | .000b |
Residual | 1582.433 | 17 | 93.084 | |||
Total | 57084.950 | 19 | ||||
a. Dependent Variable: Actual | ||||||
b. Predictors: (Constant), Avg_age, Projected |
Coefficientsa | ||||||
Model | Unstandardized Coefficients | Standardized Coefficients | t | Sig. | ||
B | Std. Error | Beta | ||||
1 | (Constant) | 40.616 | 48.319 | .841 | .412 | |
Projected | .761 | .035 | 1.002 | 21.512 | .000 | |
Avg_age | -.562 | .785 | -.033 | -.717 | .483 | |
a. Dependent Variable: Actual |
The model summary shows that the coefficient of determination (r2) = 0.972. This shows that projected cases and average age predicts 97% of the actual cases. There is a strong relationship between actual and projected cases. The analysis also found that only projected cases are a significant predictor of actual coronavirus cases.
- Logistic regression
The three variables that have been included in the analysis are actual positive cases, average age and the setting. Actual cases and average age are predictor variables that are continuous and measured on a ratio scale, while the setting is a categorical variable that is measured on a nominal scale. Binary logistic regression Enter method will be used, considering that the outcome variable has only two groups.
Variables in the Equation | |||||||
B | S.E. | Wald | df | Sig. | Exp(B) | ||
Step 1a | Projected | -.004 | .008 | .282 | 1 | .595 | .996 |
Avg_age | -.076 | .168 | .206 | 1 | .650 | .927 | |
Constant | 4.922 | 10.344 | .226 | 1 | .634 | 137.209 | |
a. Variable(s) entered on step 1: Projected, Avg_age. |
The analysis shows that an increase in projected cases is likely to have a 0.996 chance occurrence in an urban area than rural area while the increase in age is 0.927 times likely to be in an urban area than in rural setting.