Big data requirements
The tourism industry has recently been heavily hit by the outbreak of coronavirus pandemic all over the world. The industry has been heavily hit to the extent that ZLS has considered closing down zoos under their management. The industry has been affected by revenue generation through fees charged to visitors. Resources are also scarce from well-wishers as people are still facing the pandemic. However, before this pandemic broke out the number of visits that were being received in the zoos was on a decline. This implies that the zoos were still struggling to generate revenue for its conservation efforts. This problem justifies the need for the adoption of a big data strategy.
For a big data strategy to be viable certain elements should be present. Schmarzo (2013) derived five data strategy supporting requirements that will help guide the process of data collection. The five steps are described below.
Key Performance Indicators
Performance indicators rate the performance of the implemented data strategy. Performance indicators are based on the objectives that the strategy was rolled out to fulfill. For this study, the main objectives are to increase revenues. Some of the techniques of increasing revenue include cutting costs in some areas by management, increasing the number of visits, and coming up with a new pricing strategy. The indicators for these will be the return on investment for advertisement, the success of policies aimed at increasing the number of visits, and the cost-benefit analysis of changing the pricing strategy. Suggested new price and old price will be compared by forecasting the number of expected visits if the price changes and the expected revenue.
Business Questions
Business questions are developed based on the problem that ZSL wants to solve. This implies that answers to the business questions will provide recommendations that ZSL can adopt to have a brighter future. For ZSL the business questions include: What should be done to generate more revenue? Which service is best rated? Which is the most productive advertisement channel? Which category of people should be targeted for direct education to enhance visits?
Business Decisions
Business decisions help is the selection of models that help in reaching the goals of a big data strategy. Below is a list of some of the business decisions that were identified.
- Conduct case summaries on different categories of people based on the number of visits and various unique features that each customer has.
- Summary statistics which include the measures for central tendency and measures of variability will be calculated for all features in the data.
- From the total number of visits over the years a simple regression model will be developed.
- ANOVA analysis will be used to determine the performance of the regression model.
From the first decision it will be possible to determine how various features in the data interact. It will be possible to determine which particular groups are lagging and can be selected to increase revenues. The visitors whose data is being analyzed can be categorized according to their type, the number of years they have been visiting the park, the number of family members, and where the tickets were booked. From the second business decision, the summary statistics of various features help describe the source of the problem and how it can be mitigated. From the regression model it is possible to determine whether the following year’s visits will decline. This will also justify the implementation of a big data strategy. The fourth business decision gauges the performance of the linear regression model.
Analytic algorithms and modeling
For the analysis of this data set various algorithms are suggested. The first model is the correlation model. Correlation analysis will help prove the existence of relationships between different features. The relationship between features that are highly likely to influence each other. Examples of these features in this data set include the number of visits and age, the amount one is likely to spend at the zoo and the number of family members, the average spends per year, and the number of years that the visitor has been visiting.
A simple linear regression model is also highly suggested in determining if the trend of a decrease in visitors will continue over time. The simple linear regression is facilitated by the linearity in the total number of visitors per year. ANOVA analysis will be used to gauge the relevance of this model. Case summaries for various categories of visitors based on some unique feature at a time will be analyzed. This will help in the making of decisions based on facts and not a hunch. A multiple regression model is also required. The model will be used to show the relationship between the amounts of money that customers spend per year and other independent factors.
Supporting data sources
The last step is determining data sources to support the adoption of a big data strategy. The data used in this report is from ZSL on their past visitors. There are also survey data for the ratings of different products that are offered at the zoo. There is data on the prices that are charged and the new prices that are being suggested. The survey has 30 participants and 30 products to review. Data from social media is also collected. These data will be used to review the publicity of ZLS. Data on the visitors contains information on demographics and some other features including the number of times that the visitor has visited the zoo and the average he/she spends per visit.
Data analysis
Simple linear regression.
A simple linear model is fitted on the data to facilitate the forecasting of the future count of visits for the next few years.
The figure below shows a trend line for the number of visits for the years between 2009 and 2018.
From the trend line an upward trend is observable. The upward trend reveals that the number of visitors at the zoo are generally increasing over the years. However the line shows relatively low steepness indicating a very low gradient. The gradient represents the rate of increase in the number of visitors over the years. The increase in the number of visitors at the zoo has been relatively low. Measures to speed up this increase should be implemented.
The table below shows the summary output from a linear regression model.
Regression statistics | |
Multiple R | 0.359404863 |
R squared | 0.129171856 |
Adjusted R square | 0.004767835 |
Standard error | 123.4004065 |
The model poorly represents the data but can be used for prediction. The poor fit on the data is represented by the r-squared value of 0.3594. This implies that the model represents about
% of the whole data set.
The table below shows the co-efficient of the regression model.
Coefficients | Standard Error | t Stat | P-value | |
Intercept | -31538.04444 | 32084.91 | -0.98296 | 0.358375 |
Year | 16.23333333 | 15.93092 | 1.018983 | 0.34213 |
The equation for predicting the future number of visitors can be described as follows.
This equation can be used to predict the future number of visitors by replacing the unknown value of x with the year. The results from predictions of these models are not very reliable because of the many errors it does not account for. The number of visitors for the year 2019, 2020, and 2021 have a count of 1169.8, 1186, and 1202.2. From observing these values, the simple regression model shows that the number of visitors will be on the increase in the next few years. However, this increase is predicted to occur at a very slow rate. Visitors in the first year are expected to increase with a value of about 30 visitors. This count is not enough to help dig out ZLS from its revenue troubles.
Descriptive summary statistics
Age | Salary (£000) | Number of family members | Number of visits /per year | Average spending per visit | |
Mean | 35 | 32.15384615 | 3.025641 | 3.282051 | 101.3077 |
Standard Error | 1.767528372 | 1.76199895 | 0.227902 | 0.269825 | 11.8582 |
Median | 37 | 36 | 3 | 3 | 80 |
Mode | 45 | 36 | 2 | 4 | 30 |
Standard Deviation | 11.03821114 | 11.00367991 | 1.42325 | 1.685054 | 74.05444 |
Sample Variance | 121.8421053 | 121.0809717 | 2.025641 | 2.839406 | 5484.061 |
Kurtosis | -1.227266052 | -1.160565308 | -0.71714 | -0.74779 | -1.31559 |
Skewness | -0.24737012 | -0.260817867 | 0.299014 | 0.329453 | 0.448966 |
Range | 35 | 36 | 5 | 6 | 225 |
Minimum | 18 | 14 | 1 | 1 | 25 |
Maximum | 53 | 50 | 6 | 7 | 250 |
Sum | 1365 | 1254 | 118 | 128 | 3951 |
Count | 39 | 39 | 39 | 39 | 39 |
The table above shows descriptive statistics for the different features of the visitors. The highest age among the visitors is 53 while the youngest is 18. The ages show a high variance. This implies that the ages of the people within the range of 35 are distributed randomly. The mean age of the visitors is 35 years old while the median is 37. The most common visitor’s age-aged 45. Observing this set of summary statistics indicates that the zoo mainly attracts people who are not young. The mean, median, and mode indicate that majority of the visitors to the zoo are 35 years and above.
The mean and median of the salary are 32153.84 and 36000 respectively. The small difference between the mode and median indicates that the salaries are spread out evenly over the range of 36000. There are no outliers in the salary. However the value of the variance in the salary indicates that the zoo serves people in different salary groups. The minimum salary of the people who visit the park is 14000 while the maximum is 15000. The majority of the people who visit the park earn 36000.
The mean count of family members for the visitors is 3.28 while the median is 3. This indicates that people who visit the zoo have an average of 3 family members. This indicates that people rarely visit the zoo alone. Together with conservation activities, the zoo acts as an entertainment spot where exhibitions can be done. The highest number of family members is 7 while the least is 1. The variance in the number of children is also relatively low indicating that all values of age are within the range of 6.
Visitors spend different amounts of money depending on various actors such as their purpose of visit. The mean spending per visit is 101.3077 while the median is 80. There is a huge difference between the mean and median. The difference indicates that there is a possibility of a small group of visitors who spend extremely high amounts per visit than the rest. The variance is valued at 5484.061. The variance is relatively high indicating that people spend independently of others.
The table below shows summary statistics for the duration the visitor has been visiting the zoo and the average sending per year.
How long have been visiting? (years) | Total spending (£/year) | |
Mean | 3.641026 | 317.2564 |
Standard Error | 0.280945 | 46.31525 |
Median | 4 | 200 |
Mode | 2 | 840 |
Standard Deviation | 1.754501 | 289.2386 |
Sample Variance | 3.078273 | 83658.99 |
Range | 6 | 950 |
Minimum | 1 | 50 |
Maximum | 7 | 1000 |
The minimum number of years one has been visiting is less than one year. The highest number of visits is seven years. This implies that the zoo serves both long term visitors and short term visitors. The mean number of years for the visitors is 3.6410 while the median is 4. Most of the visitors have been to the park two times. This implies that most of the visitors are people who have already been to the zoo at least once.
From a survey on preference it’s possible to analyze the frequencies to come up with a new package. 30 products are selected for rating by a group of 30. Summary statistics indicate that the most preferred product is product 29 which has a mean rating of 27and a median value of 28. This means that the product received very high ratings from all visitors. The minimum rating of product 29 is 21 which is generally high. The lowest rated product is number 11 which has a mean rating of 8.2333 and a median value of 6.5. This implies that are some outliers in the ratings of this product. The minimum rating of this product is one while the highest is 20. The set of products from 1 to 20 receive very low ratings. The means are generally below the rating of 20. These products are dragging down the overall ratings of the products. The set of products from 21 to 31 show positive ratings.
Correlation analysis
Through the calculation of correlation through various techniques such as the Spearman techniques help in testing for the presence of a relationship between two variables.
The table below shows the correlation values for different characteristics of the visitors.
Age | Salary (£000) | No. of family member | Number of visits /per year | How long have been visiting? (years) | Total spending (£/year) | |
Age | 1 | |||||
Salary (£000) | 0.939817 | 1 | ||||
No. of family member | 0.768266 | 0.640092 | 1 | |||
Number of visits /per year | 0.488672 | 0.411227 | 0.506875 | 1 | ||
How long have been visiting? (years) | 0.778229 | 0.670296 | 0.777438 | 0.670061 | 1 | |
Total spending (£/year) | 0.884241 | 0.824546 | 0.943972 | 0.518772 | 0.824015 | 1 |
Various characteristics interact in different ways. However for this analysis, the problem is based on earnings in revenue. To increase revenues ZLS has to increase the amount of money that visitors are willing to part with which is indicated by the total spending in pounds per year. The highest correlations are displayed by age, salary, and number of family members for their interaction with the total spending. However the highest is the number of family members. The high positive relationship implies that the higher the number of family members the higher the average amount he/she is likely to be willing to part with.
ZLS ran a campaign previously through different advertisement media. Different advertisement media yield varying results due to the different reliability of the media in passing the message.
The table below shows a bar graph for the adoptions according to print, television, and FM channels.
The main aim of advertising is to increase the size of the market. TO achieve an increase in market size, the strategy that reaches the highest number of potential customers. The highest number of viewership or readership in this set of advertisement media is from National Geographic. The highest prices for advertising are charged by Heart Radio. BBC Wildlife is the cheapest but it has a small readership and therefore it yields the least revenue from adoption. ITV led to the highest revenue generated by the adoption of animals.
Another form of advertisement is online advertising. Online advertising has been made possible by the huge strides that have been taken in the technological world. Some of the channels for advertising are social media groups such as Facebook which a very large fan base.
The figure below shows the analysis of the adoption levels from online channels where promotion was carried out.
From the bar graph above, the channels that are capable of reaching the highest number of people are YouTube and Facebook. However YouTube produces the least number of unit sales from the adoption of the animals. Facebook has the highest number of clicks on the advertisement but translates to a relatively low count of units sold. The Animal Fact Guide leads to the highest number of unit sales. This implies that this channel is the most effective.
As stated earlier one of the ways of increasing revenue is the changing of prices. ZLS aims to change its pricing model. It is important to evaluate the impact of the new strategy on revenue and visits. In the hospitality market the peaks are seasonal implying that there are low seasons and standard seasons.
The average revenues in standard seasons exceed the other two seasons. ZLS earns the least money during the peak seasons. The margin by which ZLS proposes to increase the prices will leave them still in a good position in terms of competition. This is because increased prices do not exceed the current market price.
The figure below shows the analysis of data mined on twitter.
The company different tags to pass their information. Retweets also determine ZLS’s engagement on social media. The number of retweets highly fluctuates. However, the highest number of retweets is reached when some tags are used. These tags allow interaction between ZLS and among social media users.
Case processing summary
The figure below shows a summary of the amount of money spent and the type of visitor.
From the figure above local family, day-trippers provide the mains source of revenue followed by overseas visitors. Educators contribute the least per day.
The purpose of the visit should also be analyzed to determine the activity which generates the highest income.
The table below shows a bar chart for activity and spending per day.
Zoo experience is the product that produces the highest source of revenue. Library produces the least amount of revenue.
Multiple regression model.
The multiple regression model can be used to analyze the relationship between different promotional techniques and the number of visitors they yield.
The table below shows a summary of the multiple regression model.
Regression Statistics | |
Multiple R | 0.394748358 |
R Square | 0.155826266 |
Adjusted R Square | 0.12304282 |
Standard Error | 2.515207018 |
The model fits the advertisement data with an accuracy of about 15.58%. The poor fit of this model indicates that the models poor for forecasting the number of visitors according to the advertisement. This implies that more advertisement does not increase the number of visitors to zoos. However the model is still significant according to the ANOVA table.
The table below shows the performance of the model using ANOVA
ANOVA | |||||
df | SS | MS | F | Significance F | |
Regression | 4 | 120.2800298 | 30.07001 | 4.7532 | 0.001468 |
Residual | 103 | 651.6054331 | 6.326266 | ||
Total | 107 | 771.885463 |
The significance of the model is valued at 0.001468 which is below the 5% alpha level of significance. This implies that advertisement is significant in determining the number of visitors to the parks.
The table below shows the multiple regression coefficient tables.
Coefficients | Standard Error | t Stat | P-value | |
Intercept | 80.69413431 | 1.252142922 | 64.44483 | 4.64E-85 |
No. of web unique visits | 0.001577747 | 0.000801145 | 1.969366 | 0.051597 |
No. media/PR exposure | 0.07260865 | 0.021858813 | 3.321711 | 0.001239 |
Service attractiveness | -0.044966663 | 0.177268188 | -0.25366 | 0.80026 |
Adverts Budget (£000) | 0.00044171 | 0.000478308 | 0.923484 | 0.357913 |
Another important model is the multiple regression for predicting the average spending per visit and is dependent on different characteristics of the visitor.
The table below shows the multiple regression summary tables.
Regression Statistics | |
Multiple R | 0.958162 |
R Square | 0.918075 |
Adjusted R Square | 0.899576 |
Standard Error | 23.46771 |
The r-squared value indicates the performance of the fitted model on the residuals of the data. The r squared value is 0.918. This model represents 91.8% of the data accurately.
The table below shows the ANOVA which tests for the significance of a model.
ANOVA | |||||
df | SS | MS | F | Significance F | |
Regression | 7 | 191321.6 | 27331.65 | 49.62774 | 4.37E-15 |
Residual | 31 | 17072.73 | 550.7334 | ||
Total | 38 | 208394.3 |
The p-value of the test is very small and is exceeded by the 5% alpha level of significance indicating that the dependent factors affect the average spend per visit.
Recommendations
Strategies that should increase the number of visitors to z
The first recommendation is increasing advertisement activities in order to venture into new markets. According to the summary statistics most people who are visitors at the park are people who had already visited at least once. There is therefore the need to expand the ZLS brand in order to gain more new customers. Targeting new visitors through employing new effective marketing strategies will increase the market size for the zoo hence increasing the number of visits and subsequently the revenues generated from the zoo.
The way ZLS brands itself is very crucial. The crucially of the brand is signified by the significance of the multiple regression model which shows the significance of a good promotion strategy to ZLS. The number of visitors to the park is partially dependent on the face of the company. The size of the brand is displayed by commendable PR exposure, Service attractiveness, the number of unique visits that are recorded by the ZLS website, and the advertisement budget. To increase the number of visitors it is recommended that ZLS find a balance between these promotion strategies. The strategies should be balanced in such a way that it yields the highest number of new visitors. Choosing the advertisement media is also very crucial to the company. A highly effective advertisement strategy is significant enough to affect the number of visits.
It is highly recommended that products that have a rating which is above 20 be grouped into one package. This package is likely to appeal to most customers. Products in this category have very high ratings. The products that have the ratings which are below 20 should be discarded or improved. These products are bringing down the general average ratings of the services that are offered at the zoo. Grouping the highly performing products together and improving or eliminating the rest will bring up the ratings of the zoo and hence greatly increase the number of visitors to the zoo.
Improvement of the services at ZSL is highly recommended. For ZSL, when the season has peaked, the number of visitors dwindle. This is a season in which other competitors have the number of visitors on the increase. This implies that the services of ZLS are not highly preferred. This justifies the need for a new customer package that will likely increase the number of visits.