BUSINESS INTELLIGENCE FINAL EXAM PART B
Question One (10 Marks)
Decision support systems have evolved into data analytics and now play a key role in supporting data-driven decision making in the big data era and the Fourth Industrial Revolution. Identify and discuss three critical drivers of the Fourth Industrial Revolution underpinned by data analytics and machine learning.
Throughout time, decision support systems have evolved into data analytics, which is now crucial to supporting decision-making procedures in the fourth industrial revolution.
The advent of the fourth industrial revolution can be partly attributed to the use of analytics, which caused the proliferation of primary drivers of the revolution, such as the reorganization of business units, automation of industrial processes, and the role of analytics in changing employee and manager attitudes towards work.
Automation presents the most valued contribution of analytics to the fourth industrial revolution. With rapidly changing technologies, companies have quickly evolved to be up to par with common segments, such as the use of analytics to improve productivity. More tasks that could be done by humans only are now done by machines that do not tire and which have considerably higher output as compare to humans.; as a result, further smoothening processes. The use of artificial intelligence and cognitive computing, segments of analytics, have enabled this revolution.
The reorganization and restructuring of business units and departments have also been enabled by the use of analytics. New departments are being created within organizations, and which are charged with making the processes of the organization smoother. An example includes the introduction of Business intelligence (BI) departments, analytics departments, or data science departments, which are heavily influenced by analytics. As a result, large corporations have the decision-making function separated from the other operations of the organization, thereby improving efficiency and more accurate decisions, a vital aspect of the fourth industrial revolution.
Organizational functions are also being reorganized through the use of analytics. Key areas include research and people analytics. Through analysis of business objectives and targets, organizations can change employee and manager attitudes to better suit them to achieving set objectives, also a vital component of the fourth industrial revolution.
Question Two (10 Marks)
Identify and describe the four main steps in data preprocessing. For each of these four main steps of data preprocessing, identify and discuss one key method used.
Data preprocessing refers to the time demanding and tedious process of converting raw, real-world data, into a refined, machine-understandable, and algorithm ready format. This enables the ease of analysis. The premise of preprocessing is the elimination of out of range values and inconsistent values emanating from the data collection process. The whole process is subdivided into four main steps, namely, data consolidation, data cleansing, data transformation, and data reduction. Data consolidation refers to the collection and filtering, as well as grouping of collected data. This is done using software such as AQL Queries and statistical domains. These enable filtering and selection of only required data as well as grouping of raw data for analysis.
Data cleaning involves the handling of missing data values as well as redundant values. Data noise is also identified and reduced within this step. Finally, erroneous data is identified and eliminated accordingly. Throughout this process, outliers in the data are replaced using statistical predictors such as averages. Replacing is done through processes such as binning and regressions. Missing values are filled in using the most appropriate statistical predictor. Methods of removing and replacing erroneous data also take the same process, using statistical predictors to determine and replace them.
Data transformation is the third step in preprocessing and usually involves normalization of the data by using a variety of normalization tools. The second sub-step is the aggregation of data using the binning technique. New and representative attributes, critical in analytics, are then created using simple mathematical functions and from the existing data sets. Examples include simple addition, division, or multiplication. The last step in preprocessing is data reduction, which represents the process of balancing skewed data and reducing the number of records and attributes from the data set. This is done using random stratified sampling techniques and chi-square sampling.
Question Three (10 Marks) The following confusion matrix was generated to evaluate the performance of a classification model using a Decision Tree for predicting whether a patient has Alzheimer’s disease or not. YES, being True, a patient has Alzheimer’s disease; and NO being False, a patient does not have Alzheimer’s disease.
State the formula used, calculate and discuss what the Accuracy Rate tells us about the above model for predicting if a patient has Alzheimer’s disease. Accuracy = TP + TN/(TP+TN+FP+FN);
where TP is the true positive,
TN is true negative,
FP is false positive and
FN is false negative
Therefore, accuracy = 170+105/ (170+105+36+37)
=275/348
=0.7902
This value means that the accurate value for predicting the occurrence of Alzheimer’s disease in a population is 0.79 for every 1 whole or 79.02%.
- State the formula used, calculate and discuss what the Sensitivity Rate tells us about the above model for predicting if a patient has Alzheimer’s disease.
Sensitivity rate =TP/(TP+FN)
=170/ (170+37)
=170/207
=0.8213
The sensitivity rate entails the ratio of correct prediction of positive observations to actual observations. From the above calculation, the value is 0.8213.
Statistically, all values above the 0.5 sensitivity ratio are effective in the prediction of the occurrence of Alzheimer’s disease. Moreover, since our ratio is 0.8213 and it is above 0.5, it is good for the above model.
- State the formula used, calculate and discuss what the Specificity Rate tells us about the above model for predicting if a patient has Alzheimer’s disease.
Specificity rate =TN/(TN+FP)
=105/ (105+36)
=0.7447
The specificity rate is employed to test the capability of the above model to identify patients without
Alzheimer’s disease correctly. A value of 0.7447 means that this model is effective in accurately identifying patients without Alzheimer’s disease.
- State the formula used, calculate and discuss what the F1 Score tells us about the above model for predicting if a patient has Alzheimer’s disease.
FI Score = 2 *((precision * recall) / (precision + recall))
Precision =TP/(TP+FP)
=170/ (170+36)
=0.8252
Recall= TP/(TP+FN)
=170/ (170+37)
=0.8213
FI Score = 2*((0.8252*0.8252)/ (0.8252+0.8213))
=2*(0.6777/1.6465)
=0.8232
FI score is the average harmonic weight of recall and precision. The F1 value of this model is 0.8332, therefore indicating that the model is useful in predicting if a patient has Alzheimer’s disease. The higher the F1 score, the better, and since our is close to 1, the model is concluded to be accurate.
Question Four (10 Marks)
Describe the term Social Media Analytics and discuss the impact of social media sites like Facebook and Twitter can have on customer sentiment and how this can be measured using social media analytics.
Social media analytics refers to a branch of analytics that explicitly studies the social interactions of people through technology. It studies explicitly how people create, share, and receive information through virtual networks and communities—these communities, include Twitter, Facebook, and Instagram, among others.
In a technologically advanced world, social media analytics is today a critical tool in companies and especially in marketing, since it allows for producers and manufacturers to study consumption rates and patterns within the potential market. It is, therefore, an integrated decision support tool within companies.
The exponential growth, evident in the use of social media, is a chief reason why analytics tools that allow conversation between the global market and companies are in high demand. It is currently estimated that more than 70% of multinational companies within the US are continually investing in social media investment as a part of a plot to increase customer engagement. Through active social media channels advertising, customers have access to and are aware of the products offered by a particular company. Social media also offers a platform for the provision of feedback to the companies to maintain or improve service, therefore an integral part of increasing sales.
Measuring the impact of social media can be measured using analytics.
Analytical tools are majorly divided into three classes, descriptive, social network, and advanced analytics. Descriptive analytics usually use simple statistical identifiers to explain trends and characteristics. These include the number of followers one has or how many reviews to a company’s products have been made. Social network analytics uses a connection between users and their acquaintances to determine preferences and sources of influence. Advanced analytics make use of predictive analysis that would not be fathomed just by using casual analytics. Using these tools, companies can make sound decisions to influence the achievement of set targets.
Question Five (10 Marks)
Describe what is meant by the term Stream Analytics and provide an example of how to stream analytics is being used in a real-world situation. Discuss two key benefits that are being realized in your example.
Stream analytics, which is also known as real-time data analysis, refers to a type of analytics that extracts information streaming or continuously flowing data. Speed and accuracy are critical aspects of industrial processes. The use of streaming assures these two critical aspects. Streaming analytics uses a technology known as stream processing, which continuously processes large amounts of data, resulting in a stream. The premise of streaming is to present an analyst with up to date data. Streaming analytics can, therefore, be used in the healthcare industry.
The healthcare industry is a fragile sector that requires constant monitoring and involvement to save lives. The ultimate goal of the healthcare sector is to save lives, shorten hospital stay, alleviate pain, and ensure healthy and working communities. To achieve this vital mandate, streaming analysis can be used by the sector in the real-time monitoring of patients, clinical risk assessments, and feeding alerts to personnel. This reduces the pressure on healthcare workers. Using streaming, healthcare practitioners can monitor trends and patterns, especially of the current Covid-19 Pandemic, that is currently ravaging the world. Using this up to date information, practitioners can make prudential decisions to prepare for situations, alleviating the impact of the diseases, and at the same time achieving their pre-mentioned mandate. Several advantages can be realized through the use of stream analytics. These include highly personalized care and reduction of treatment costs, as well as relieving of pressure from health workers, ultimately making the whole exercise more efficient and safe. In a fragile sector such as healthcare, accuracy is a critical value. This puts a burden on healthcare workers; however, with streaming analytics, the pressure is relieved, and medical procedures made safer and more effective.
Question Six (10 Marks)
Describe the four major aspects of the Internet of Things (IoT) Technology Infrastructure and explain how IoT technology could be used in a hospital ward to improve and complement patient care with two specific examples
IoT technology infrastructure refers to an interconnected and interrelated set of computing devices, people and networks, provided with unique identifiers and which can send data without requiring human control or interaction. The internet of things is grouped into four main aspects, namely applications, hardware, connectivity, and software backend.
Hardware refers to the physical objects and devices actuators and sensors where information is created and recorded. These devices are usually controlled and monitored. Connectivity refers to the base station or hub that collects data from the input hardware and sends the data to the cloud storage.
Through gateways, output devices can communicate through the cloud to the input devices. Software backends present layers in which data is collected and managed. Software backends provide the role of data integration, that manages all networks and devices connected through the cloud. The last aspect of IoT is applications, most of which are supported in platforms such as android, windows, and IOS. These applications form the link between the users and the other components such as cloud storage and input devices through network connections.
In the healthcare sector, IoT can be used in the patient ward to make service provision more effective and also safer. One example is the use of Dina Katabi’s device, which works on the IoT principle. The device is a biosensor used in medical wards to automatically record patient attributes such as heart rate, breathing patterns, and temperature.
It is also able to provide emotional states of patients. Owing to this, nurses can remotely access patient data and respond to any emergencies accurately and on time. IoT can also be used in the RFID technology, critical in tracking administration of medicines as well as medical equipment. Using this technology, optimum conditions of humidity, temperature and safety can be realized in hospital wards, ultimately resulting in better care and service delivery
Question Seven (10 Marks)
Analytics and Artificial Intelligence (AI) rely on identifying patterns in large amounts of data to accurately predict outcomes which are often used in automation of operational decision making. However exceptional events like the recent bush fires in Australia and globally the COVID-19 Pandemic can play havoc with the accuracy and reliability of algorithms underpinning analytics and data-driven decision making. Discuss how these analytics algorithms should be evaluated and made more reliable and robust in dealing with exceptional events, changing contexts, and norms.
Analytical algorithms are indeed not foolproof. Current events in the world have wreaked havoc in the industry. Such include the current Covid -19 pandemic and the bush fires, witnessed in Australia in early 2020. The prediction capacity of algorithms has been severely undermined, and the volatility of current situations have undermined the accuracy of data provided by such technology. For algorithms to be more accurate and robust in predicting situations, several changes need to be made. It is pertinent to note that current algorithms provide results based on current and past events. Currencies, for example, fluctuate depending on past happening as well as the current, they are then able to map the past and the present a consequently make predictions. Slight changes from the normal in the future, such as bush fires and disease [pandemics such as the Covid result into massive deviations of the predicted values. In 2020, for example, economies of most affected countries such as Australia have been severely affected, leading to a drop in currency value against other stronger economies like the UK. This is contrary to the prediction of algorithms. While little can be done to lessen the effect of natural disasters on algorithms, analytics can be made more robust by including exceptions for natural disasters in predictions. As a result, the final outputs will multiple, putting into account if any calamities occur within the predicted timeline. While the information predicted may not be accurate, it will be more realistic of the world situations and less idealistic and theoretical as currently presented by the algorithms today. Using functions such as “if” in coding algorithms, it will be viable for the administrators to have a glimpse of the financial predictions in case of an outbreak of a fire.