Numerical data
Chapter Three Questions
3.44 The properties of numerical data are grouped into three major groups. The first group of properties is the central tendency, which includes the mode, mean, and median. The second group is variation, which includes interquartile range, range, standard deviation, and variance. The last group of properties is relative standing that includes z-scores and percentiles.
3.45 The central tendency property is the summary statistic that represents the central value of a dataset. It is a measure that indicates the point where most values of a data set fall.
3.46
Measure
Mean
Median
Mode
Difference
The mean is calculated by dividing the sum of total observations by the number of observations.
After arranging the observations in a data set, the median is the value in the middle.
The mode is the number with most occurrences in the set of data.
Advantage
-It can be used for continuous and discrete numeric data.
-The measure is less affected by skewed data and outliers.
-works well with non-symmetrical data.
-it can be identified when using both categorical and numerical data.
Disadvantage
-The mean cannot be used with categorical data
-Affected by outliers
-it is impossible to identify the median when using categorical nominal data.
-it does not reflect the actual mean in some distribution of data.
3.47 After arranging the data in ascending order, the first quartile is the middle number of the lower half of the data set. The median is the center of the entire set of data, while the center of the upper half of the data set is represented by the third quartile.
3.48 The property of variation is a measure of how the data is dispersed.
3.49 A z-score measures the distance between a value in a data set and the mean of the data. It measures the number of the standard deviation a data point is below or above the mean.
3.50
Variability measure
Difference
advantage
disadvantage
Range
-It is the difference between the largest and the smallest values of a data set
-It is very simple to calculate
-it is susceptible to outliers
Interquartile range
-It is the data from the lower quartile to the upper quartile.
-not affected by outliers and skewness.
-cannot be used for categorical data.
Variance
-while the range and the interquartile range uses part of the values, the variance uses all values in the data set.
-deviations from the mean in either direction are treated equally.
-difficult to interpret
Standard deviation
-the standard deviation uses the original values of all observations in the data set.
-it is easy to understand and interpret.
-does not provide the full range of data.
Coefficient of variation
-It’s the ratio between the standard deviation and the mean.
– it can easily measure the degree of variation existing between two series of data.
-very sensitive to small changes in the value of the mean.
3.51 It is useful in examining the variability in bell-shaped distributions. It helps to measure how data values distribute below and above the mean. It also helps identify outliers. From the empirical rule, 1 in 20 values lies outside two standard deviations from the mean in either direction.
3.52 The empirical rule only applies for bell-shaped data while Chebyshev rule can be used for data not appearing as bell-shaped.
3.53 The property of shape means that the distribution of data can either be skewed or symmetrical.
3.54 Covariance only measures the direction of the linear relationship between variables, while the coefficient of correlation measures both direction and strength of the linear relationship.
Chapter Four Questions
4.52 The difference between priori, subjective, and empirical probability is that priori probability is based on prior knowledge and includes deductive reasoning. In contrast, empirical probabilities are based on historical data. Finally, subjective probability is based on an individual’s opinions.
4.53 The difference is that a simple event has a single outcome while a joint event has more than one outcome.
4.54 Using the addition rule, we calculate the probability of event A by summing the probability of event A and event B provided they are mutually exclusive.
4.55 Let’s assume we have to events; A and B. When the events are mutually exclusive, the occurrence of event A means event B cannot occur. They cannot happen at the same time. On the other side, collectively exhaustive events are the events that can happen at the same time.
4.56 The conditional probability of A given B when we have events A and B means that B has already occurred. The occurrence of A is dependent on B occurring. Thus, conditional probability relates to the independence of events in that the probability of one event depends on the occurrence of another event.
4.57 The multiplication rule states that, for dependent events, we calculate the probability of the two events occurring by multiplying the conditional probability of one event with the probability of the event itself occurring. The multiplication rule for independent events states that the probability of both events occurring is found by finding the multiplying the probabilities of the two events.
4.58 to revise the probability of events using Bayes’ theorem, we include the new data in the calculations together with the prior probabilities.
4.59 The prior probability is the probability of an event occurring before additional data is included in the calculations, while the revised probability is the probability of an event occurring after new data is included in the calculation.