Library/CFA (Chartered Financial Analyst)/Quantitative Methods — CFA Program Curriculum, 2026 • LEVEL I • VOLUME 1/Learning Module 7 Estimation and Inference

Learning Module 7 Estimation and Inference

50 questions available

Take a quiz Listen to a podcast

Overview and Sampling Methods5 min

This chapter explains how analysts obtain population information through samples and how to quantify sampling error and perform statistical inference. Sampling methods are categorized into probability sampling (simple random, systematic, stratified random, cluster) and non-probability sampling (convenience, judgmental). Simple random sampling gives every population member equal selection probability. Stratified sampling partitions the population into strata and samples proportionally from each stratum to increase precision when subgroups differ. Cluster sampling selects whole clusters and is often cost-effective for large populations but generally less precise. Non-probability sampling is faster and cheaper but risks nonrepresentative samples.
Sampling error is the difference between a statistic and the true population parameter due to using a subset. The sampling distribution of a statistic describes the distribution of that statistic across repeated random samples. The central limit theorem (CLT) states that for a population with finite variance, the sampling distribution of the sample mean is approximately normal with mean equal to the population mean and variance equal to population variance divided by sample size (sigma squared over n) for sufficiently large n. The standard error of the sample mean is sigma/sqrt(n) when population sigma is known, or s/sqrt(n) when estimated from the sample. The CLT justifies use of normal-based confidence intervals and hypothesis tests for large samples; n >= 30 is a common rule of thumb but more may be needed for heavily skewed distributions.

Key Points

Probability sampling methods include simple random, stratified, cluster, and systematic sampling.
Non-probability sampling includes convenience and judgmental sampling; these can be biased.
Sampling error arises because samples are subsets; sampling distribution describes variability of statistics.
Central Limit Theorem: sample mean approx. normal with mean mu and variance sigma^2/n for large n.
Standard error of sample mean is s / sqrt(n) when sigma unknown.

Resampling: Bootstrap and Jackknife5 min

Resampling methods create empirical sampling distributions when analytic formulas are difficult or assumptions are uncertain. Bootstrap repeatedly draws samples with replacement from the observed sample (each resample same size as original), computes the statistic of interest for each resample, and uses the variation across resamples to estimate standard errors and confidence intervals. The bootstrap estimate of a standard error is the sample standard deviation of the resampled statistics (adjusted for B-1). Jackknife removes one observation at a time (leave-one-out) to estimate bias and standard errors; it uses n replications for a sample of size n and is deterministic for given data.

Key Points

Bootstrap draws many resamples with replacement from the observed data to estimate sampling distributions.
Bootstrap standard error = sqrt( (1/(B - 1)) sum_{b=1..B} (theta_hat_b - mean_theta_hat)^2 ).
Jackknife leaves one observation out at a time to estimate bias and variance; requires n replications.
Resampling is valuable when analytic standard errors are hard to obtain or distributional assumptions are weak.

Hypothesis Testing Fundamentals6 min

Hypothesis testing framework: (1) state null (H0) and alternative (Ha) hypotheses (must be mutually exclusive and exhaustive), (2) choose test statistic and its sampling distribution, (3) choose significance level alpha (probability of Type I error), (4) define decision rule (critical values or p-value), (5) compute test statistic from data, (6) decide to reject or fail to reject H0. Type I error (alpha) is rejecting true H0; Type II error (beta) is failing to reject false H0; power = 1 - beta is probability of correctly rejecting false H0. Common test statistics in finance include t for means and regression coefficients, chi-square for variances and contingency tables, F for variance ratios and overall regression fit, and correlation tests using t with df = n - 2.

Key Points

Always specify H0 and Ha, sample statistic, distribution, alpha, decision rule, compute and decide.
Type I error = alpha (false positive); Type II error = beta (false negative); power = 1 - beta.
t-tests are used for means and regression coefficients when sigma unknown (df = n-1 or n-k-1).
chi-square test for single variance and contingency-table independence (df = (r-1)(c-1)).
F-test for ratio of variances and overall regression significance (MSR/MSE).

Parametric and Nonparametric Tests4 min

Parametric tests rely on distributional assumptions and target parameters (mean, variance). Nonparametric tests (e.g., Wilcoxon signed-rank, Mann-Whitney U, sign test, Spearman rank correlation) make fewer assumptions, work with ranks or ordinal data, are robust to outliers and non-normality, and are useful for small samples or when data are ordinal. However, parametric tests are typically more powerful when their assumptions hold. Guidance is provided on when to use nonparametric alternatives to t-tests and correlation tests.

Key Points

Use nonparametric tests when data violate parametric assumptions, contain outliers, or are ordinal.
Spearman rank correlation uses ranks and can be tested similarly to Pearson via t for large n.
Nonparametric alternatives include Wilcoxon signed-rank for single-sample or paired tests and Mann-Whitney U for two independent samples.

Testing Correlation and Contingency-Table Independence5 min

Testing correlation: Pearson correlation r uses t = r sqrt((n - 2)/(1 - r^2)) with df = n - 2 to test H0: rho = 0. Spearman rank correlation r_s is calculated from rank differences and can be tested via t for large samples. Contingency-table tests: expected count E_{ij} = (row_i_total * col_j_total)/grand_total; chi-square statistic = sum (O_{ij} - E_{ij})^2 / E_{ij}; df = (r - 1)(c - 1). Standardized residuals (Oij - Eij)/sqrt(Eij) indicate which cells differ most from independence.

Key Points

Pearson correlation significance uses t with df = n - 2.
Spearman correlation is a rank-based nonparametric alternative; for n > 30 use t approximation.
Chi-square independence test compares observed vs expected cell frequencies; df = (r-1)(c-1).
Standardized residuals help identify which cells drive rejection of independence.

Questions

Question 1

Which sampling method guarantees that every member of a population has an equal chance of selection?

Learning Module 7 Estimation and Inference

Questions

Other chapters