4.1 DESCRIPTIVE STATISTICS | Sharksavewriters

4.0 CHAPTER

Updated chapter 4

4.1 DESCRIPTIVE STATISTICS

Used to shade light into the basic features of the data in a research and provide simple summaries. Help us to simply large amounts into an under stable way. The following is the descriptive statistics for Safaricom share prices

STATISTIC VALUE

Mean 15.26305

minimum 5.3

1st quarter 12.2

median 15.55

3rd quarter 18.4

Maximum 28.0

variance 28.2893

Standard deviation(volatility) 5.318769

Range 22.7

4.2 BOX PLOTS OF SAFARICOM SHARE PRICES

Thestudy conducted quarterly box plots for each year for five years and compared the various box plots. Box plots are five number summary (minimum,1st quartile, median,3rd quartile and maximum value) and they are used to identify outliers in the dataset. They are also used to determine skewness and the degree of dispersion. The following are various box plots for our share returns.

BOXPLOT 2013 QUARTERLY

BOXPLOT 2014 QUARTERLY

BOXPLOT 2015 QUARTERLY

BOXPLOT 2016 QUARTERLY

BOXPLOT 2017 QUARTERLY

The above figures represent the boxplots for years 2013,2014,2015,2016 and 2017.

We considered the first quarters for each year and compared. Clearly, 2017 had the highest mean, followed by 2016,2015,2014 and 2013 respectively.

Observing the skewness,2013,2014 has a positive skewness while 2015,2016 and 2017 were negatively skewed as confirmed by the plot of the density of the first quarter.

For the second quarter ,the study considered the second quarters for each year and compared them. Clearly, 2017 had the highest mean, followed by 2016,2015,2014 and 2013 respectively.

Observing the skewness,2013,2014 has a negative skewness while 2015,2016 has a positive skewness and 2017 has a negative skewness as confirmed by the plot of the density of the second quarter

The study also considered the third quarters for each year and compared them. Clearly, 2017 had the highest mean, followed by 2016,2015,2014 and 2013 respectively.

Observing the skewness,2013 has a positive skewness while 2014,2015 and 2016 has a negative skewness and 2017 has a positive skewness as confirmed by the plot of the density of the third quarter

The study also considered the fourth quarters for each year and compared them. Clearly, 2017 had the highest mean, followed by 2016,2015,2014 and 2013 respectively.

Observing the skewness,2013 has a positive skewness while 2014,2015,2016 and 2017 had a negative skewness as confirmed by the plot of the density of the fourth quarter.

4.3 TIME PLOT OF SAFARICOM SHARE PRICES

We conducted a time plot of Safaricom share prices. The test was done to establish whether the prices were non-stationary or not. The figure below shows our time plot.

From the figure, the time plot displays a significant trend, i.e non-stationarity which agrees with empirical findings that financial time series prices are non-stationary. Therefore, it may be interpreted that there has been a significant increment in Safaricom share prices since 2013.This could be attributed to the significant role that Safaricom company has made to the Kenyan economy through innovation and therefore this might have attracted more investors to invest in the company thereby increasing the company’s demand for the shares. Having found that the prices were non -stationary, we conducted another test to confirm this finding ACF and PACF test.

4.4 ACF AND PACF PLOT OF SHARE PRICES

This test argues that if there are serial correlations among the data points, the ACF function will have positive values for a large number of lags. In other words, if there is significant serial correlation in the data, the lags will not be within the confidence lines. The following is our ACF and PACF plot output display

The ACF plot above clearly indicates that the data contains significant serial correlation since the lags are not within the confidence lines and therefore it is non-stationary. In addition, Augmented Dickey Fuller Root -Unit test was conducted to furtherly affirm that the share prices were non -stationary.

4.5 AUGMENTED DICKEY FULLER TEST

This is a root unit test for stationarity.

The test stipulates that if the p-value resulting from this test is greater than 0.05, then the data is non-stationary and if the p-value is less than 0.05, the data is stationary. The test has the following hypothesis:

H_0: Data has a unit root (Non -stationary)

H_1 : Data has no unit root (Stationary)

The test statistic of ADF test is a t-statistic that’s shown below

t=α/(s.e(α))

ADF summary

Data Returns. Price

Dickey Fuller Constant=-2.0451

Lag order=10

p-value=0.5592

Alternative hypothesis Stationary

Since the p-value is greater than 0.05 we fail to reject the null hypothesis and conclude that the data is not stationary/has a unit root.

Therefore, from the three tests above, it clear that the Safaricom share prices were found non-stationary which called for intervention to remove the non-stationarity. One way to remove this non-stationarity is to find the log returns of the price data because it is a stylized fact that the returns are stationary.

Computation of returns:

r_t=ln (x_t⁄x_(t-1) ).

We had to conduct some tests to confirm the earlier hypothetical say that returns are stationary and the tests were as follows;

4.6 Test of autocorrelation of returns

We first conducted a test of ACF of the returns which displayed the results shown below,

From the plot of ACF above it was clear that the Returns were stationary since the lags of ACF plot are within the confidence lines indicating non-significant serial correlation.

We conducted ADF test to confirm this sentiment.

4.7 ADF test of returns

This is the test for stationarity.

The test argues that if the p-value resulting from this test is greater than 0.05, then the data is non-stationary and if the p-value is less than 0.05, the data is stationary. The test has the following hypothesis:

H_0: Returns has a unit root (Non -stationary)

H_1 : Returns has no unit root (Stationary)

The test statistic of ADF test is a t-statistic that’s shown below

t=α/(s.e(α))

Our ADF results for returns was obtained as shown below;

Data Returns

Dickey-Fuller Constant=-12.231

Lag order = 10

p-value = 0.01

alternative hypothesis stationary

Warning message In daftest (RETURNS): p-value smaller than printed p-value

The p -value of the test was less than 0.05 hence we rejected our null hypothesis and concluded that the returns were indeed stationary, a similar decision we drew under the ACF plot.

We also plotted the time plot of returns to observe the behavior of returns with time.

4.8 Time plot of returns.

To further study or confirm that the returns were stationary, we had a time plot of returns and observed the trend. The plot displayed the results below;

Clearly, from the plot, one is not in position to determine whether the trend of the returns is moving upward, downward or constant. Therefore, there is no trend in the returns and hence the returns are stationary.

Having fully confirmed that the returns were stationary, a normality test was done.

4.9 NORMALITY TEST

In financial time series, it is a stylized fact that we assume data is non -normal to allow for heavy tails distribution, which is measured by kurtosis. To test for normality of the returns, we use quantile to quantile plots (QQ-plots). For data to be normal, the distribution of the dots should be spread along the straight line. The following is our QQ-plot for the returns;

Clearly, from our hypothetical statement, the returns were non-normal hence legit for use in the modelling.

5.0 MODEL ESTIMATION FOR THE SHARE PRICES RETURNS.

To identify the ARIMA model order the study split the datapoints into the trial data and forecasting data points. As a stylized fact in financial time series75% of the data points were used as trial, i.e. the study used 75% of the data find the model order. A plot of ACF was used to find the order of the moving average and PACF was used to find the order of AR of the 75% of our data points.

Using the ACF and PACF plots above, various combinations of ARIMA (pdq) with their respective AICS were recorded as shown below;

ARIMA(PDQ) IC=AIC

(1,1,1) -5301.81

(1,1,4) -5300.65

(1,1,18) -5292.12

(2,1,1) -5304.67

(2,1,4) -5301.19

(2,1,18) -5299.11

(3,1,4) -5302.7

(3,1,18) -5295.8

(3,1,1) -5306.57

The study choose the model with least AIC which was found to be ARIMA (3,1,1) From the plot of ACF and PACF ,it is clear that ARIMA(3,1,1) has the least AIC .

The fitted ARIMA (3,1,1) model is displayed In the following output.

DATA Prices (log return price);937 datapoints

MODEL ARIMA(3,1,1)

Coefficients AR1= 1.1200

AR2=-0.2399

AR3=0.0018

MA1=-0.9194

Drift(Coefficients) 0.0153

Standard error AR1=0.0519

AR2=0.0497

AR3=0.0351

MA1=0.0403

Drift(Standard errors) 0.0043

sigma^2 estimated 0.03702

log likelihood 217

AIC -421.99

AICc -421.9

BIC -392.94

The fitted ARIMA(3,1,1) model becomes:

R_t=1.12R_(t-1)-0.2399R_(t-2)+0.0018R_(t-3)-0.9194Q_(t-1)+e_t.

5.1VALIDATION OF THE MODEL.

To test for the model fitness, we employed the following tests;

5.1.1LJUNG BOX STATISTIC

The model fitness was evaluated through the use of Ljung-box test as follows:

The hypothesis:

H_0: The model fits the data well.

H_a: The model does not fit the data

TEST LJUNG BOX

DATA RESID

X-Squared(chi-square) 1.1307e-06

Degrees of Freedom 1

p-value 0.9992

Since the p-value is greater than 0.5,the study failed to reject null hypothesis hence our model fits the data well.Also ,the test for adequacy was also found using the plot of ACFs of the residuals.

5.1.2 ACFS OF THE RESIDUALS

The adequacy test for the model was also done by plotting ACFs of the residuals.

For our model to be adequate, there is supposed to be non- significant serial correlation among the residuals of the returns.Also,the residuals of the model should be White Noise. The following was the plot ;

Clearly, the residuals are uncorrelated with a minor serial correlation at lag 1.The study also sort to validate the distribution of the error terms by first plotting the density and the following figure displays this study;

Clearly the mean of the error term is symmetry at zero ,which is a special feature of White Noise error term. Then validation of normality of the error term was conducted and the results obtained as follows.

Clearly the error term is normally distributed as a proof of White Noise property. Since the error terms of the model are uncorrelated, normal and are white noise, then our model is fit to forecast.

5.2 FORECAST.

The study used the remaining 25%(312) of the data points to forecast and the following is the sample of the first 30 forecasted values the fi forecasted values;

FORECASTED VALUES

19.94978 ,19.92717 ,19.90268 ,19.88162 ,19.86474, 19.85155 ,19.84136, 19.83352,

19.82749 ,19.82286 ,19.81931, 19.81658 ,19.81449 ,19.81288, 19.81165 ,19.81070,

19.80997 ,19.80941, 19.80899 ,19.80866 ,19.80841, 19.80821 ,19.80806 ,19.80795,

19.80786 ,19.80780, 19.80774, 19.80770 ,19.80767 ,19.80765

The first 30 actual prices and forecasted prices were compared as shown below

ACTUAL PRICES FORECASTED PRICES

20.00 19.949778

20.00 19.92717

20.00 19.90268

19.95 19.88162

20.25 19.86474

20.25 19.85155

20.00 19.84136

19.90 19.83352

19.95 19.82749

19.95 19.82286

19.85 19.81931

19.75 19.81658

19.70 19.81449

19.70 19.81288

20.00 19.81165

20.00 19.81070

20.00 19.80997

19.95 19.80941

19.90 19.80899

19.85 19.80866

20.25 19.80841

20.75 19.80821

21.25 19.80806

21.25 19.80795

21.00 19.80786

20.75 19.80780

20.25 19.80774

20.75 19.80770

20.75 19.80767

20.25 19.80765

We also plotted the forecasted prices and the actual prices and the study displayed the following graphs;