Library/CFA (Chartered Financial Analyst)/Quantitative Methods — CFA Program Curriculum, 2026 • LEVEL I • VOLUME 1/Learning Module 9 Parametric and Non-Parametric Tests of Independence

Learning Module 9 Parametric and Non-Parametric Tests of Independence

50 questions available

Take a quiz Listen to a podcast

Overview and Hypothesis Formulation5 min

This chapter explains hypothesis testing approaches applicable to assessing relationships and independence between variables. It begins with three generic hypothesis formats: two-sided (theta = theta0 vs not equal), one-sided right (theta <= theta0 vs >) and one-sided left (theta >= theta0 vs <). Parametric tests are those about population parameters or that require specific distributional assumptions (for example, the Pearson product-moment correlation test assumes bivariate normality). Nonparametric tests make minimal distributional assumptions or are used when the hypothesis is not about a parameter. Typical reasons to use nonparametric methods include nonnormal data, presence of outliers, data given in ranks, or hypotheses not concerning a parameter. For testing correlation, the parametric approach uses the sample Pearson correlation r and the t-statistic t = r sqrt((n-2)/(1 - r^2)), which follows a t-distribution with n-2 degrees of freedom under the null that the population correlation rho = 0. A key practical point: the magnitude of r required for significance decreases as sample size increases because the numerator grows with sqrt(n-2) and the critical t decreases. The Spearman rank correlation rs is a nonparametric alternative computed from ranks: rs = 1 - (6 sum(di^2)) / (n (n^2 - 1)), where di is the difference between ranks for each observation pair. For large n (n > ~30) rs can be tested via the same t formula (substituting r -> rs) with n-2 df; for small n specialized critical-values tables are used. The chapter shows how to compute ranks, handle ties (average ranks), and then compute di^2 and rs, with examples and conversions to t-statistics and p-values. For categorical/discrete data, the chi-square test of independence uses a contingency (two-way) table of observed frequencies Oij. Expected frequencies under independence are Eij = (row i total * column j total) / overall total. The test statistic is chi-square = sum over cells (Oij - Eij)^2 / Eij and has df = (r - 1)(c - 1). Because the squared term ensures nonnegativity, the rejection region is on the right tail. The chapter gives a practical step-by-step example with ETF classifications by size and investment type: compute Eij, compute scaled squared deviations, sum to obtain chi-square, compare to chi-square critical value (or compute p-value), and interpret. The chapter also introduces standardized residuals (Pearson residuals) for each cell: (Oij - Eij) / sqrt(Eij), useful to identify which cells contribute most to chi-square, and suggests visualization (mosaic plots) for insight. Multiple worked examples show degrees of freedom, calculation of chi-square, evaluation of one-sided and two-sided alternatives, and alternative formation of F-statistics by inverting numerator/denominator for symmetry. Practical advice includes using software (Excel, R, Python) for critical values and p-values, noting that when two-tailed tests are used with chi-square one must split alpha appropriately only if using specialized two-tailed logic (but chi-square tests are one-sided by construction). The chapter emphasizes when to prefer nonparametric tests: when distributional assumptions fail, outliers contaminate parametric measures, data are ordinal/ranked, or the hypothesis is about ranks or randomness (e.g., runs test). It compares parametric tests for a single mean and paired/independent means with nonparametric counterparts (e.g., Wilcoxon signed-rank, Mann-Whitney U, sign test). In summary: use the Pearson parametric t-test for correlation when bivariate normal assumptions are plausible; use Spearman rank when not; use chi-square of independence for contingency tables; inspect residuals/standardized residuals to see which cells deviate most; rely on software for exact critical values and p-values; and always consider sample size, effect size, assumptions, and costs of Type I vs Type II errors when interpreting results.

Key Points

Hypotheses can be two-sided or one-sided; formulation must be in population-parameter terms.
Parametric Pearson correlation test uses t = r * sqrt((n-2)/(1-r^2)) with n-2 df.
Spearman rank rs uses ranks and rs = 1 - 6 sum(di^2) / (n(n^2 -1)); for large n can use same t-formula.
Chi-square test of independence compares observed and expected frequencies; df = (rows-1)(cols-1).
Use standardized residuals to identify which cells most deviate from independence.
Nonparametric tests preferred when assumptions fail, with ranked or ordinal data, or with outliers.
Always consider sample size, p-values, alpha choice, and trade-offs between Type I and Type II errors.

Questions

Question 1

You compute a Pearson sample correlation r = 0.35 from n = 12 paired observations. Using the parametric t-test for correlation, calculate the t-statistic and determine whether r is significantly different from zero at the two-sided 5 percent level (critical t approximately ±2.201).

Learning Module 9 Parametric and Non-Parametric Tests of Independence

Questions

Other chapters