Learning Module 7 Estimation and Inference
50 questions available
Sampling error is the difference between a statistic and the true population parameter due to using a subset. The sampling distribution of a statistic describes the distribution of that statistic across repeated random samples. The central limit theorem (CLT) states that for a population with finite variance, the sampling distribution of the sample mean is approximately normal with mean equal to the population mean and variance equal to population variance divided by sample size (sigma squared over n) for sufficiently large n. The standard error of the sample mean is sigma/sqrt(n) when population sigma is known, or s/sqrt(n) when estimated from the sample. The CLT justifies use of normal-based confidence intervals and hypothesis tests for large samples; n >= 30 is a common rule of thumb but more may be needed for heavily skewed distributions.
Key Points
- Probability sampling methods include simple random, stratified, cluster, and systematic sampling.
- Non-probability sampling includes convenience and judgmental sampling; these can be biased.
- Sampling error arises because samples are subsets; sampling distribution describes variability of statistics.
- Central Limit Theorem: sample mean approx. normal with mean mu and variance sigma^2/n for large n.
- Standard error of sample mean is s / sqrt(n) when sigma unknown.
Key Points
- Bootstrap draws many resamples with replacement from the observed data to estimate sampling distributions.
- Bootstrap standard error = sqrt( (1/(B - 1)) sum_{b=1..B} (theta_hat_b - mean_theta_hat)^2 ).
- Jackknife leaves one observation out at a time to estimate bias and variance; requires n replications.
- Resampling is valuable when analytic standard errors are hard to obtain or distributional assumptions are weak.
Key Points
- Always specify H0 and Ha, sample statistic, distribution, alpha, decision rule, compute and decide.
- Type I error = alpha (false positive); Type II error = beta (false negative); power = 1 - beta.
- t-tests are used for means and regression coefficients when sigma unknown (df = n-1 or n-k-1).
- chi-square test for single variance and contingency-table independence (df = (r-1)(c-1)).
- F-test for ratio of variances and overall regression significance (MSR/MSE).
Key Points
- Use nonparametric tests when data violate parametric assumptions, contain outliers, or are ordinal.
- Spearman rank correlation uses ranks and can be tested similarly to Pearson via t for large n.
- Nonparametric alternatives include Wilcoxon signed-rank for single-sample or paired tests and Mann-Whitney U for two independent samples.
Key Points
- Pearson correlation significance uses t with df = n - 2.
- Spearman correlation is a rank-based nonparametric alternative; for n > 30 use t approximation.
- Chi-square independence test compares observed vs expected cell frequencies; df = (r-1)(c-1).
- Standardized residuals help identify which cells drive rejection of independence.
Questions
Which sampling method guarantees that every member of a population has an equal chance of selection?
View answer and explanationYou want the standard error of a sample mean to be at most 0.5 units. If the population standard deviation is estimated to be 4.0, what minimum sample size n do you need (use the population formula)?
View answer and explanationWhich statement best describes the central limit theorem as used for sample means?
View answer and explanationAn analyst has a sample of n = 25 independent observations from a population with unknown variance. The sample mean is 10 and the sample standard deviation is 5. What is the estimated standard error of the sample mean?
View answer and explanationWhich sampling method is most appropriate when you need to ensure representation across known subgroups (for example, bond duration buckets) and to improve precision?
View answer and explanationAn analyst has a sample of 50 returns and computes a 95 percent confidence interval for the mean. Which significance level alpha corresponds to this interval?
View answer and explanationWhich statement about bootstrap resampling is true?
View answer and explanationYou compute a t-statistic of 2.5 with df = 20 for a two-sided test. Which statement is correct at alpha = 0.05?
View answer and explanationAn analyst wants to test whether two independent samples have equal means and assumes equal variances. What test should she use?
View answer and explanationWhat is the formula for an expected cell frequency Eij in a contingency table under the null hypothesis of independence?
View answer and explanationWhich of the following is a valid reason to use a nonparametric test instead of a parametric t-test?
View answer and explanationWhen performing a bootstrap with B resamples to estimate the standard error of a statistic theta_hat, which expression gives the bootstrap standard error estimate?
View answer and explanationYou test H0: sigma^2 = 0.04 for a normally distributed variable using a sample of n = 16 and obtain sample variance s^2 = 0.02. Which test statistic should you use?
View answer and explanationWhich of the following best describes a Type I error in hypothesis testing?
View answer and explanationAn analyst uses a two-sample t-test with pooled variance and obtains a p-value of 0.03. At alpha = 0.05 what should she conclude?
View answer and explanationWhich test is appropriate to examine whether the variance of returns changed after a policy event using two independent samples (before and after) of normally distributed returns?
View answer and explanationA researcher has a single sample of size n = 12 of monthly returns and wishes to estimate the standard error of the sample median but no analytic formula is available. Which method is most appropriate?
View answer and explanationWhich of the following is the correct test statistic to test whether a sample Pearson correlation r differs from zero for n observations?
View answer and explanationWhich approach is best if you have ranked (ordinal) performance scores for fund managers and you want to test whether two groups have different central tendency?
View answer and explanationYou observe two time periods with very different volatilities in a regression residual plot, indicating heteroskedasticity. Which statement is true?
View answer and explanationIn a simple linear regression Y = b0 + b1 X + e, what is the meaning of b1?
View answer and explanationWhich decomposition relates total variation of Y into explained and unexplained parts in regression?
View answer and explanationIn simple linear regression, how is the coefficient of determination R^2 related to the sample correlation r between X and Y?
View answer and explanationYou estimated a simple regression with n = 30 observations and obtained SSR = 45 and SSE = 155. What is R^2?
View answer and explanationIn regression output, the F-statistic tests which null hypothesis for a simple linear regression?
View answer and explanationWhich regression assumption is violated if residuals plotted against time display a clear seasonal pattern?
View answer and explanationYou run a simple regression Y on X and obtain slope b1_hat = 0.8, se(b1_hat) = 0.2, n = 25. What is the t-statistic to test H0: b1 = 0?
View answer and explanationWhich transformation yields a model where the slope approximates an elasticity (percent change in Y per percent change in X)?
View answer and explanationIf residuals of a regression are not normally distributed but the sample size is very large, what does the chapter recommend regarding inference?
View answer and explanationWhich of the following is an advantage of cluster sampling relative to simple random sampling?
View answer and explanationAn analyst computes sample correlation r = 0.3 with n = 12. Using t = r sqrt((n - 2)/(1 - r^2)), what is the approximate t-statistic (round to two decimals)?
View answer and explanationWhich of the following correctly describes the jackknife resampling method?
View answer and explanationIf a regression's residuals have a mean not equal to zero, what does the chapter say about that result?
View answer and explanationWhich of these is a correct expression for the standard error of the forecast for a new Xf in SLR (prediction standard error)?
View answer and explanationWhich functional transformation would you try if plotting Y versus X shows curvature suggesting exponential growth of Y?
View answer and explanationWhich of the following is TRUE about the relationship between the t-test for slope and the F-test of overall fit in a simple linear regression?
View answer and explanationWhich of the following best describes the purpose of a paired (dependent) samples t-test?
View answer and explanationA contingency table has 4 rows and 3 columns. What are the degrees of freedom for the chi-square test of independence?
View answer and explanationWhich statement about the bootstrap and jackknife methods is consistent with the chapter?
View answer and explanationYou conduct a chi-square test of independence on a 3x3 contingency table and obtain chi-square statistic = 12.5. The critical value at alpha = 0.05 with df = 4 is 9.49. What is your decision?
View answer and explanationWhen is the pooled variance estimator used in two-sample t-tests?
View answer and explanationYou construct a 95 percent prediction interval for an SLR forecast and find it wide. According to the chapter, which factor would contribute MOST to the interval width?
View answer and explanationWhich of the following is an example of non-probability sampling?
View answer and explanationWhen comparing means for two dependent samples, which distribution does the test statistic follow under normality assumptions?
View answer and explanationAn analyst performs stratified sampling with k strata and samples proportional to stratum sizes. Which effect does stratification typically have compared with simple random sampling of the same overall sample size?
View answer and explanationWhich of the following best describes the p-value reported by software for a regression coefficient?
View answer and explanationWhich of the following is NOT an advantage of bootstrap resampling mentioned in the chapter?
View answer and explanationIf you want to test whether two categorical classifications are independent using a sample of 1,000 observations in a 2x3 table, which test and degrees of freedom are appropriate?
View answer and explanationWhich diagnostic plot is most helpful to detect heteroskedasticity in a regression model?
View answer and explanationWhich of these is a correct interpretation of an R^2 value of 0.80 in a simple linear regression of ROA on CAPEX, as illustrated in the chapter example?
View answer and explanation