Reading 2: Organizing, Visualizing, and Describing Data

50 questions available

Data Organization and Types10 min
Data is the raw material of analysis, classified as numerical or categorical. Numerical data allows for mathematical operations and is either discrete (integers like number of days) or continuous (fractions like returns). Categorical data labels attributes and is either nominal (e.g., fund categories) or ordinal (e.g., star rankings). Data is structured in arrays: a one-dimensional array constitutes a time series, while two-dimensional arrays form data tables or panel data. Unstructured data lacks predefined models and often comes from text or sensors.

Key Points

  • Numerical data: Discrete vs. Continuous.
  • Categorical data: Nominal vs. Ordinal.
  • Time series track one variable over time; Cross-sectional tracks multiple variables at one point in time.
  • Panel data combines time series and cross-sectional dimensions.
Visualizing Data15 min
Visualization techniques are chosen based on the data type and analytical goal. Frequency distributions summarize numerical data into intervals, visualized via histograms or frequency polygons. Cumulative distributions show the proportion of observations below a value. For categorical data, bar charts (grouped or stacked) and tree maps compare relative sizes. Relationships between variables are depicted using scatter plots or heat maps. Time series are best shown with line charts, where bubble charts can add a third dimension.

Key Points

  • Histograms visualize frequency distributions of numerical data.
  • Scatter plots reveal relationships between two numerical variables.
  • Heat maps use color intensity to show frequency or correlation.
  • Tree maps visualize relative sizes of categories.
Measures of Central Tendency15 min
Central tendency identifies the center of a dataset. The arithmetic mean is the sum of values divided by the count, serving as the center of gravity. The median is the middle value, robust to outliers. The mode is the most frequent value. Geometric mean calculates compound growth rates, always lower than or equal to the arithmetic mean for variable data. The harmonic mean, used for average cost, is the lowest of the three Pythagorean means when data varies.

Key Points

  • Arithmetic mean is sensitive to outliers; Median is not.
  • Geometric mean is used for compound returns over time.
  • Harmonic mean is used for average price per share (cost averaging).
  • Order for variable data: Harmonic Mean < Geometric Mean < Arithmetic Mean.
Measures of Dispersion and Distribution Shape20 min
Dispersion measures variability. The range is the difference between maximum and minimum. Variance measures average squared deviation from the mean; sample variance divides by n-1. Standard deviation is the square root of variance, in the same units as the data. The Coefficient of Variation (CV) measures risk per unit of return. Skewness describes asymmetry; positive skew has a long right tail (Mean > Median). Kurtosis measures tail thickness; leptokurtic distributions have fat tails, indicating higher probability of extreme outcomes.

Key Points

  • Sample variance uses n-1 divisor to be an unbiased estimator.
  • Coefficient of Variation (CV) = Standard Deviation / Mean.
  • Positive Skew: Mean > Median > Mode.
  • Leptokurtic distributions have excess kurtosis > 0 (fat tails).

Questions

Question 1

Which of the following best describes discrete numerical data?

View answer and explanation
Question 2

A dataset contains the daily closing prices of a specific stock over the past year. This is best classified as:

View answer and explanation
Question 3

In a frequency distribution, the relative frequency of an interval is calculated as:

View answer and explanation
Question 4

A contingency table is primarily used to analyze:

View answer and explanation
Question 5

Which visualization tool is most appropriate for identifying whether a nonlinear relationship exists between two numerical variables?

View answer and explanation
Question 6

The sum of the deviations of observations from their arithmetic mean is always:

View answer and explanation
Question 7

Calculate the weighted mean return of a portfolio consisting of 60 percent Asset A (return 10 percent) and 40 percent Asset B (return 5 percent).

View answer and explanation
Question 8

Which measure of central tendency is least affected by outliers?

View answer and explanation
Question 9

Calculate the geometric mean return for three years with returns of 10 percent, 20 percent, and -10 percent.

View answer and explanation
Question 10

An investor purchases 1,000 USD of stock each month. The share prices paid were 10, 15, and 20. The average cost per share is best calculated using the:

View answer and explanation
Question 11

If a distribution is positively skewed, the relationship between the mean, median, and mode is typically:

View answer and explanation
Question 12

The third quartile (Q3) of a dataset represents the value below which what percentage of observations lie?

View answer and explanation
Question 13

What is the Mean Absolute Deviation (MAD) of the returns 2 percent, 5 percent, and -1 percent?

View answer and explanation
Question 14

The sample variance is calculated using a denominator of:

View answer and explanation
Question 15

Calculate the coefficient of variation (CV) for a stock with a mean return of 8 percent and a standard deviation of 12 percent.

View answer and explanation
Question 16

A distribution with excess kurtosis of 2.0 is best described as:

View answer and explanation
Question 17

The correlation coefficient between two variables ranges from:

View answer and explanation
Question 18

Which chart type is best for displaying the joint frequency of two categorical variables?

View answer and explanation
Question 19

If the harmonic mean, geometric mean, and arithmetic mean are calculated for a dataset with variable positive values, which inequality is correct?

View answer and explanation
Question 20

A winsorized mean is calculated by:

View answer and explanation
Question 21

Unstructured data is best described as:

View answer and explanation
Question 22

For a dataset with 9 observations, the position of the 3rd quartile is calculated using the formula (n+1)y/100. What is the position?

View answer and explanation
Question 23

A box and whisker plot specifically highlights which measure of dispersion?

View answer and explanation
Question 24

Calculate the sample variance for a dataset: 2, 4, 6. (Mean is 4).

View answer and explanation
Question 25

Target downside deviation differs from standard deviation because it:

View answer and explanation
Question 26

Which word cloud feature indicates the frequency of a word in a text dataset?

View answer and explanation
Question 27

Ordinal data allows for which of the following operations?

View answer and explanation
Question 28

A confusion matrix is a type of:

View answer and explanation
Question 29

Which chart is best suited for comparing categories by size where area represents value?

View answer and explanation
Question 30

The harmonic mean of 2 and 8 is:

View answer and explanation
Question 31

In a unimodal distribution, if the Mode < Median < Mean, the distribution is:

View answer and explanation
Question 32

Spurious correlation refers to:

View answer and explanation
Question 33

What is the joint frequency of 'Monday' and 'Front Street' if the marginal frequency for Monday is 19 and Front Street is 25, given the cell value is 7?

View answer and explanation
Question 34

A histogram is essentially a bar chart of:

View answer and explanation
Question 35

If the standard deviation of a dataset is 5 and the mean is 20, the coefficient of variation is:

View answer and explanation
Question 36

Excess kurtosis is calculated as:

View answer and explanation
Question 37

Which measure is calculated as the sum of squared deviations from the mean divided by (n-1)?

View answer and explanation
Question 38

Given returns of 10 percent, 10 percent, 40 percent, 10 percent. Which principle allows us to sum the present values of these cash flows?

View answer and explanation
Question 39

A bubble line chart adds a third dimension to a line chart by modifying the:

View answer and explanation
Question 40

Which of the following is true for a normal distribution?

View answer and explanation
Question 41

If covariance is 0.0058, standard deviation of A is 0.0529, and standard deviation of B is 0.1114, what is the correlation coefficient?

View answer and explanation
Question 42

Panel data typically combines which two data types?

View answer and explanation
Question 43

In a box and whisker plot, the vertical line within the box represents the:

View answer and explanation
Question 44

To calculate a compound annual rate of return over three years, one should use the:

View answer and explanation
Question 45

Which of the following is true regarding the arithmetic mean and geometric mean for variable data?

View answer and explanation
Question 46

Covariance measures:

View answer and explanation
Question 47

A heat map uses which visual element to display data frequency?

View answer and explanation
Question 48

Which of the following is NOT a property of the arithmetic mean?

View answer and explanation
Question 49

To construct a frequency polygon, one plots points using:

View answer and explanation
Question 50

Given a sample of returns: 30 percent, 12 percent, 25 percent, 20 percent, 23 percent. What is the median?

View answer and explanation