Social vulnerability Analysis for CDC
Introduction
There has also been a long-standing disparity between all the capital spent in disaster response and those dedicated to disaster prevention and mitigation. The global community and most nations have recommended addressing the problem by replying to, instead of anticipating, metabolic abnormalities (Carley & Spapens, 2017). There are many explanations for this role has become more and more challenging to retain. First of all, awareness of hazards is now widespread on a global scale, and significantly so on a local basis in many parts of the world. The plea of denial, therefore, no longer includes weight. About 257 million people worldwide are caused by the hepatitis B virus, amongst many other diseases, and around 700,000 die each year as a consequence of long-term, chronic health risks from HBV, like liver cancer and heart disease (Sarin et al., 2016). Nowadays, even so, such pain can be avoided by a vaccine or by adequate preparation. More devastating still, newborns, contaminated by their moms at conception, and also children contaminated bellow the age of 5, are at the greatest higher risk for chronic health hazards and mortality.
However, at least three hepatitis B virus shots can be safeguarded, with the very first shot provided in under 24 hours after birth accompanied by two or three extra shots all through childhood (Sarin et al., 2016). All these infections can be prevented by ensuring there are enough simulations done on various infections and measures put in place such that to enable quick countermeasures in case of spread. The main contribution of healthcare professionals to the health of global residents involves helping states recognize the WHO vaccinations of bacterial infections and different disaster recommendations. Some nations would like to know the number of people diagnosed with a particular disease either prior to the implementation of the vaccination dosage or to ensure that all people have exposure to the vaccination. Multiple countries, including each WHO, have set a target to get rid of serious infections. Given the numerous data collected by the CDC, among other major organizations, avoiding, curbing or mitigating disasters or infectious diseases is now possible (Coker et al., 2011).
Given the current risk, evaluation takes into account that people and societies have varying access levels to services to plan for, cope with and recover from the impact. A number of factors that contribute to risk and vulnerability, namely, and not related to, gender, ethnicity, socio-economic status, age and culture (Alizadeh et al., 2018). This paper aims at answering the following two questions; Is there an impact on the prediction of the CDC’s SVI when the language and minority metrics are excluded in the model? As well as are there features, that can be excluded without having any effect on the predictability of the SVI.
Methodology and analysis
In this analysis, the data to be used is the svi2018 us comma-separated file sourced from the CDC website.
The dataset has over 124 features and 72837 observations. However, some of the features are redundant hence not fit to fit a model. Thus the original dataset will be scaled down to 13 features through subsetting, and the number of observations will be retained.
For the analysis part, the data will go through an exploratory data analysis to ensure it is easily comprehensible. This will be followed by dealing with the missing variables, subset the cleaned data into training and testing datasets. Using the training datasets, different models will be fit, aiming at answering the problem statements in the introduction section. The random forest model will be used throughout the analysis.
Results and discussion
EDA analysis
The dataset has a dimension of 72837 by 124, with the head of the data as;
After filtering the necessary variables and observations not equal or greater to zero the dimension changes to;
28782 by 13
Checking at the structure of the dataset, all the variables are of type double except the STATE feature, which is of type factor. With the other features excluding the state, one can identify or predict where an individual is from, in this case, which state is from given the other variables.
With the summary of the dataset, it is easy to determine the spread of the various feature; for instance, for the feature E_MOBILE, the mean of people with mobile homes across the states is 206 with a median of 98, and the STATE feature has levels based on the state names such as Texas, Georgia among others.
From the cleaned dataset, there are no missing values, as seen in the figure below.
Modelling
Before modelling the data, it is required to split the data into training and testing datasets to ensure the model developed is validated for accuracy. In this case, the data is split in the ratio of 70:30, where a proportion of 70 is used for training data and the remaining 30 for testing.
Part 1
Model 1
All variables are included in the first model.
Model summary
Model accuracy
Though the model includes all the independent variables, the accuracy is estimated to be approximately 27.87%
Model two
In the second model, the minority and language features are excluded.
Model summary
Model accuracy
When the variables are removed, the model accuracy decreases by approximately 3%. Hence, the second model has an accuracy of 24.86%, which is less than that of the first model; therefore, it is safe to say that there is a significant impact on the predictability of SVI from CDC when the two feature above is excluded.
Part 2
Model two
The correlation matrix will help to identify the variables critical to the model that, when removed, will have zero to insignificant effect.
From the correlation table above, E_TOTPOP, E_POV, E_AGE65, E_AGE17, E_MINRTY, E_LIMENG and E_HU are selected.
Despite reducing the variables based on the correlation, the model accuracy decreased. The same case is observed in the second part of question one. After the two variables were excluded, the model accuracy decreased significantly. Hence, it is safe to say that all variables are critical to the model, and none of them can be excluded without affecting the predictability of the model.
Conclusion
In conclusion, all the variables are critical to the model; hence, to achieve better accuracy, all the variables must be present. However, the model accuracy is still low and does not reach the average level. Hence, the data collection technique and the method of analysis can be changed in the future to ensure a more reliable model.
References
Alizadeh, M., Alizadeh, E., Asadollahpour Kotenaee, S., Shahabi, H., Beiranvand Pour, A., Panahi, M., … & Saro, L. (2018). Social vulnerability assessment using an artificial neural network (ANN) model for earthquake hazard in Tabriz city, Iran. Sustainability, 10(10), 3376.
Carley, M., & Spapens, P. (2017). Sharing the world: sustainable living and global equity in the 21st century. Routledge.
Coker, R. J., Hunter, B. M., Rudge, J. W., Liverani, M., & Hanvoravongchai, P. (2011). Emerging infectious diseases in southeast Asia: regional challenges to control. The Lancet, 377(9765), 599-609.
Sarin, S. K., Kumar, M., Lau, G. K., Abbas, Z., Chan, H. L. Y., Chen, C. J., … & Dokmeci, A. K. (2016). Asian-Pacific clinical practice guidelines on the management of hepatitis B: a 2015 update. Hepatology international, 10(1), 1-98.