Review Correlation and Regression

Name

Institution

Exploratory data analysis
Exploratory data analysis on variables

Student Agreeableness/Lecture Agreeableness

Student Extroversion/Lecture Extroversion

Student Agreeableness/Lecture Extroversion

Student Extroversion/Lecture Agreeableness

Give a one to two paragraphs, write up of the data once you have done this.

The analysis of the relationship between lecture and student agreeableness and Extroversion shows that there is a broader data distribution with a weaker relationship between the variables that have been considered in the study. Student Agreeableness and Lecture Agreeableness relationship is weak with the presence of outliers that are visible in the dataset with r² = 0.03. The scatterplot of Student Extroversion and Lecture Extroversion shows a wider distribution of the data with the relationship explained by r² = 0.023, showing that the relationship is weak. The Student Agreeableness and Lecture Extroversion also show that the distribution of is broad data distribution with the presence of outliers that are visible in the scatterplot. The relationship between the variables is weak, as explained by r² = 0.002. The scatterplot between Student Extroversion and Lecture Agreeableness shows that there is no linear relationship between the variables, as explained by r² = 1.8e-05.

Create an APA style table that presents descriptive statistics for the sample.

Descriptive Statistics
	N	Minimum	Maximum	Mean	Std. Deviation
Student Extroversion	418	5.00	46.00	30.1029	6.31897
Student Agreeableness	413	25.00	73.00	46.5157	7.45295
Student wants Extroversion in lecturers	283	-6.00	28.00	12.9576	6.94494
Student wants Agreeableness in lecturers	417	-21.00	29.00	8.8825	9.57577
Valid N (listwise)	271

Make a decision about the missing data. How are you going to handle it, and why?

The missing data obtained from the dataset will be excluded from the analysis. This is to ensure that there is consistency in the data considered in making decisions. Missing values are likely to have a negative influence on the findings, especially when there are many missing values. Thus, excluding them from the analysis is the most efficient way of maintaining the reliability of the findings.

Correlation

Correlations
		Student Extroversion	Student Agreeableness	Student wants Extroversion in lecturers	Student wants Agreeableness in lecturers
Student Extroversion	Pearson Correlation	1	.080	.153^*	.004
	Sig. (2-tailed)		.106	.010	.932
	N	418	406	281	411
Student Agreeableness	Pearson Correlation	.080	1	.050	.164^**
	Sig. (2-tailed)	.106		.412	.001
	N	406	413	276	405
Student wants Extroversion in lecturers	Pearson Correlation	.153^*	.050	1	.118^*
	Sig. (2-tailed)	.010	.412		.049
	N	281	276	283	280
Student wants Agreeableness in lecturers	Pearson Correlation	.004	.164^**	.118^*	1
	Sig. (2-tailed)	.932	.001	.049
	N	411	405	280	417
*. Correlation is significant at the 0.05 level (2-tailed).
**. Correlation is significant at the 0.01 level (2-tailed).

The test that has been conducted is two-tailed because there is a need to determine whether there is a zero correlation between the variables included in the analysis.

Results interpretation

A Pearson r correlation was conducted to determine whether there was a statistically significant correlation between the variables. The findings from the analysis showed that there was a weak positive, statistically significant correlation between student agreeableness and lecture agreeableness, (r = 0.164, p= 0.001). There was also a weak positive, a statistically significant correlation between student extroversion and lecture Extroversion, (r = 0.153, p= 0.01).

Regression

Model Summary
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.153^a	.023	.020	6.82989
a. Predictors: (Constant), Student Extroversion

ANOVA^a
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	311.947	1	311.947	6.687	.010^b
	Residual	13014.630	279	46.647
	Total	13326.577	280
a. Dependent Variable: Student wants Extroversion in lecturers
b. Predictors: (Constant), Student Extroversion

Coefficients^a
Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
		B	Std. Error	Beta
1	(Constant)	8.220	1.866		4.405	.000
	Student Extroversion	.160	.062	.153	2.586	.010
a. Dependent Variable: Student wants Extroversion in lecturers

Two-tailed or one-tailed

The test that conducted was a two-tailed test because it sought to determine whether the student’s extroversion score would predict if a student wants a lecturer to be extroverted. Thus two-tailed tests present a favorable focus on the outcomes.

The assumptions

Conducting a regression analysis incorporates certain assumptions that must be met. The basic assumption is that the dependent variable must be measured on a continuous level. The lecturer extroversion score is a continuous variable measured on a ratio scale hence fulfilling the variable. The independent variables must be continuous or categorical. The independent variable student’s extroversion score is a continuous variable.

Results

A regression analysis was conducted to examine whether or not the student wants a lecturer to be extroverted can be predicted using the student’s extroversion score. The findings showed that student’s extroversion score could be effectively used in predicting lecture extroversion at 95% confidence level (p = 0.01, p<0.05). The analysis also revealed that 23% of lecturer extroversion could be explained by student’s extroversion score.

The results are similar to the correlation analysis results, which found that there was a weak positive correlation between a student’s extroversion score and lecturer extroversion score.

Multiple regression analysis

Model Summary
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.168^a	.028	.018	6.82934
a. Predictors: (Constant), Gender, Student Extroversion, Age

ANOVA^a
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	373.952	3	124.651	2.673	.048^b
	Residual	12872.615	276	46.640
	Total	13246.568	279
a. Dependent Variable: Student wants Extroversion in lecturers
b. Predictors: (Constant), Gender, Student Extroversion, Age

Coefficients^a
Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	7.560	2.844		2.658	.008
	Student Extroversion	.161	.062	.155	2.607	.010
	Age	.019	.109	.010	.169	.866
	Gender	1.036	.927	.066	1.118	.265
a. Dependent Variable: Student wants Extroversion in lecturers

A two-tailed or one-tailed test

The regression analysis test utilized a two-tailed test. The analysis sought to determine whether age, gender, and student extroversion score predict lecturer’s extroversion.

Assumptions

The additional assumptions when conducting multiple regression analysis is that the independent variables must be two or more. The independent variables that have been included in the analysis are three. It also assumes that the independent variables are not highly correlated with each other.

Results

A multiple regression analysis was conducted to determine whether age, gender, and student’s extroversion score predict lecturer extroversion. The results found that only student’s extroversion score could be effectively used in predicting lecturer extroversion at 95% confidence level (p = 0.01, p<0.05).

From the correlation above, student’s extroversion score was significantly correlated to the lecturer extroversion. The multiple regression conducted in this case has shown that a student’s Extroversion predicts lecturer extroversion.

Part B. Applying Analytical Strategies to an Area of Research Interest.

1. Briefly restate your research area of interest.

The number of confirmed Corona Virus cases continues to increase significantly in the recent past. There are different projects that have been modelled, presenting the expected cases of Coronavirus at different times. Therefore the variables that will be assessed in this case will include the actual number of cases and the projected cases. The variables included in this case are the projected positive cases and the real positive cases.

Pearson correlation

Correlations
		Actual	Projected
Actual	Pearson Correlation	1	.986^**
	Sig. (2-tailed)		.000
	N	20	20
Projected	Pearson Correlation	.986^**	1
	Sig. (2-tailed)	.000
	N	20	20
**. Correlation is significant at the 0.01 level (2-tailed).

A Pearson correlation analysis was conducted to determine whether or not there was a relationship between the actual and projected coronavirus cases. The results found that there was a strong positive relationship between actual and projected coronavirus data( r = 0.986, p = 0.000, p<0.05). The coefficient of determination (r²) = 0.972. Thus the projected data explains 97.2% of the actual data.

Spearman correlation

Correlations
			Actual	Projected
Spearman’s rho	Actual	Correlation Coefficient	1.000	1.000^**
		Sig. (2-tailed)	.	.
		N	20	20
	Projected	Correlation Coefficient	1.000^**	1.000
		Sig. (2-tailed)	.	.
		N	20	20
**. Correlation is significant at the 0.01 level (2-tailed).

A Spearman rank-order correlation analysis was conducted to determine whether or not there was a relationship between the actual and projected coronavirus cases. The results found that there was a strong positive relationship between actual and projected corona virus data (r_s(20) = 1, p <0.05). The coefficient of determination (r²) = 1. Thus the projected data explains 100% of the actual data.

Partial Correlation vs Semi-Partial Correlation.

Partial correlation

The variables that are assessed in this case include actual data, projected data, and the average age of those found positive with Corona Virus. The three variables are continuous variables that are measured on a ratio scale.

Correlations
Control Variables			Actual	Projected	Avg_age
-none-^a	Actual	Correlation	1.000	.986	.467
		Significance (2-tailed)	.	.000	.038
		df	0	18	18
	Projected	Correlation	.986	1.000	.499
		Significance (2-tailed)	.000	.	.025
		df	18	0	18
	Avg_age	Correlation	.467	.499	1.000
		Significance (2-tailed)	.038	.025	.
		df	18	18	0
Avg_age	Actual	Correlation	1.000	.982
		Significance (2-tailed)	.	.000
		df	0	17
	Projected	Correlation	.982	1.000
		Significance (2-tailed)	.000	.
		df	17	0
a. Cells contain zero-order (Pearson) correlations.

A partial correlation was run to determine the relationship between actual data and projected data while controlling for the average age. The results showed that there was a strong positive partial correlation between actual data and projected data r(17) = 0.982, p = 0.000, p<0.05.

Semi partial correlation

Coefficients^a
Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.	Correlations
Model		B	Std. Error	Beta	t	Sig.	Zero-order	Partial	Part
1	(Constant)	40.616	48.319		.841	.412
	Projected	.761	.035	1.002	21.512	.000	.986	.982	.869
	Avg_age	-.562	.785	-.033	-.717	.483	.467	-.171	-.029
a. Dependent Variable: Actual

A semi partial correlation analysis was conducted to determine the relationship between actual and projected data while controlling for age. The results showed that there was a significant correlation between actual and age while controlling for age r (20) = 0.869, p = 000, p<0.05).

Comparing the two correlation analysis, the results are related, although partial correlation presents a better understanding of the relationship between the variables that were being assessed.

Simple regression analysis

The variables that would be considered in calculating regression analysis is the actual cases and projected cases. The outcome variable is actual cases, while the predictor variable is the projected cases. These variables have been measured on a ratio scale. This is because the analysis will focus on assessing whether the actual cases can be predicted by the projected cases.

Model Summary
Model		R	R Square		Adjusted R Square		Std. Error of the Estimate
1		.986^a	.971		.970		9.517
a. Predictors: (Constant), Projected
ANOVA^a
Model				Sum of Squares		df	Mean Square	F	Sig.
1	Regression			55454.702		1	55454.702	612.290	.000^b
	Residual			1630.248		18	90.569
	Total			57084.950		19
a. Dependent Variable: Actual
b. Predictors: (Constant), Projected

Coefficients^a
Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
		B	Std. Error	Beta
1	(Constant)	6.063	3.192		1.899	.074
	Projected	.749	.030	.986	24.744	.000
a. Dependent Variable: Actual

The model summary shows that the coefficient of determination (r²) = 0.97. This shows that projected cases can explain 97% of the actual cases. There is a strong relationship between actual and projected cases. The analysis also found that projected cases are a significant predictor of actual coronavirus cases.

Multiple regression

The variables that would be considered in calculating multiple regression include actual cases, projected cases and the average age of the patients. All three variables are continuous variables measured on a ratio scale. A Bivariate linear regression method will be used in this case based on the Enter method. This because there are two predictor variables which are evaluated on a linear context.

Model Summary
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.986^a	.972	.969	9.648
a. Predictors: (Constant), Avg_age, Projected

ANOVA^a
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	55502.517	2	27751.258	298.130	.000^b
	Residual	1582.433	17	93.084
	Total	57084.950	19
a. Dependent Variable: Actual
b. Predictors: (Constant), Avg_age, Projected

Coefficients^a
Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	40.616	48.319		.841	.412
	Projected	.761	.035	1.002	21.512	.000
	Avg_age	-.562	.785	-.033	-.717	.483
a. Dependent Variable: Actual

The model summary shows that the coefficient of determination (r²) = 0.972. This shows that projected cases and average age predicts 97% of the actual cases. There is a strong relationship between actual and projected cases. The analysis also found that only projected cases are a significant predictor of actual coronavirus cases.

Logistic regression

The three variables that have been included in the analysis are actual positive cases, average age and the setting. Actual cases and average age are predictor variables that are continuous and measured on a ratio scale, while the setting is a categorical variable that is measured on a nominal scale. Binary logistic regression Enter method will be used, considering that the outcome variable has only two groups.

Variables in the Equation
		B	S.E.	Wald	df	Sig.	Exp(B)
Step 1^a	Projected	-.004	.008	.282	1	.595	.996
	Avg_age	-.076	.168	.206	1	.650	.927
	Constant	4.922	10.344	.226	1	.634	137.209
a. Variable(s) entered on step 1: Projected, Avg_age.

The analysis shows that an increase in projected cases is likely to have a 0.996 chance occurrence in an urban area than rural area while the increase in age is 0.927 times likely to be in an urban area than in rural setting.