Learning Module 9 Parametric and Non-Parametric Tests of Independence

50 questions available

Overview and Hypothesis Formulation5 min
This chapter explains hypothesis testing approaches applicable to assessing relationships and independence between variables. It begins with three generic hypothesis formats: two-sided (theta = theta0 vs not equal), one-sided right (theta <= theta0 vs >) and one-sided left (theta >= theta0 vs <). Parametric tests are those about population parameters or that require specific distributional assumptions (for example, the Pearson product-moment correlation test assumes bivariate normality). Nonparametric tests make minimal distributional assumptions or are used when the hypothesis is not about a parameter. Typical reasons to use nonparametric methods include nonnormal data, presence of outliers, data given in ranks, or hypotheses not concerning a parameter. For testing correlation, the parametric approach uses the sample Pearson correlation r and the t-statistic t = r sqrt((n-2)/(1 - r^2)), which follows a t-distribution with n-2 degrees of freedom under the null that the population correlation rho = 0. A key practical point: the magnitude of r required for significance decreases as sample size increases because the numerator grows with sqrt(n-2) and the critical t decreases. The Spearman rank correlation rs is a nonparametric alternative computed from ranks: rs = 1 - (6 sum(di^2)) / (n (n^2 - 1)), where di is the difference between ranks for each observation pair. For large n (n > ~30) rs can be tested via the same t formula (substituting r -> rs) with n-2 df; for small n specialized critical-values tables are used. The chapter shows how to compute ranks, handle ties (average ranks), and then compute di^2 and rs, with examples and conversions to t-statistics and p-values. For categorical/discrete data, the chi-square test of independence uses a contingency (two-way) table of observed frequencies Oij. Expected frequencies under independence are Eij = (row i total * column j total) / overall total. The test statistic is chi-square = sum over cells (Oij - Eij)^2 / Eij and has df = (r - 1)(c - 1). Because the squared term ensures nonnegativity, the rejection region is on the right tail. The chapter gives a practical step-by-step example with ETF classifications by size and investment type: compute Eij, compute scaled squared deviations, sum to obtain chi-square, compare to chi-square critical value (or compute p-value), and interpret. The chapter also introduces standardized residuals (Pearson residuals) for each cell: (Oij - Eij) / sqrt(Eij), useful to identify which cells contribute most to chi-square, and suggests visualization (mosaic plots) for insight. Multiple worked examples show degrees of freedom, calculation of chi-square, evaluation of one-sided and two-sided alternatives, and alternative formation of F-statistics by inverting numerator/denominator for symmetry. Practical advice includes using software (Excel, R, Python) for critical values and p-values, noting that when two-tailed tests are used with chi-square one must split alpha appropriately only if using specialized two-tailed logic (but chi-square tests are one-sided by construction). The chapter emphasizes when to prefer nonparametric tests: when distributional assumptions fail, outliers contaminate parametric measures, data are ordinal/ranked, or the hypothesis is about ranks or randomness (e.g., runs test). It compares parametric tests for a single mean and paired/independent means with nonparametric counterparts (e.g., Wilcoxon signed-rank, Mann-Whitney U, sign test). In summary: use the Pearson parametric t-test for correlation when bivariate normal assumptions are plausible; use Spearman rank when not; use chi-square of independence for contingency tables; inspect residuals/standardized residuals to see which cells deviate most; rely on software for exact critical values and p-values; and always consider sample size, effect size, assumptions, and costs of Type I vs Type II errors when interpreting results.

Key Points

  • Hypotheses can be two-sided or one-sided; formulation must be in population-parameter terms.
  • Parametric Pearson correlation test uses t = r * sqrt((n-2)/(1-r^2)) with n-2 df.
  • Spearman rank rs uses ranks and rs = 1 - 6 sum(di^2) / (n(n^2 -1)); for large n can use same t-formula.
  • Chi-square test of independence compares observed and expected frequencies; df = (rows-1)(cols-1).
  • Use standardized residuals to identify which cells most deviate from independence.
  • Nonparametric tests preferred when assumptions fail, with ranked or ordinal data, or with outliers.
  • Always consider sample size, p-values, alpha choice, and trade-offs between Type I and Type II errors.

Questions

Question 1

You compute a Pearson sample correlation r = 0.35 from n = 12 paired observations. Using the parametric t-test for correlation, calculate the t-statistic and determine whether r is significantly different from zero at the two-sided 5 percent level (critical t approximately ±2.201).

View answer and explanation
Question 2

You have paired data on two variables for n = 35 observations and compute the Spearman rank correlation coefficient rs = 0.6916. Using the large-sample t-approximation (t = rs * sqrt((n-2)/(1-rs^2))), compute the t-statistic and decide whether rs is significantly different from zero at the two-sided 5 percent level (critical t ≈ ±2.0345).

View answer and explanation
Question 3

You have a 3x3 contingency table (three rows, three columns) of counts. Which formula gives the expected frequency Eij for cell (i,j) under the null hypothesis of independence, and what are the degrees of freedom for the chi-square test of independence?

View answer and explanation
Question 4

A 3x3 contingency table of 1,594 ETFs by Size (small, medium, large) and Investment Type (value, growth, blend) yields a chi-square statistic = 32.08 and df = 4. Using a 5 percent significance level (chi-square critical ≈ 9.4877), what conclusion should you draw?

View answer and explanation
Question 5

You calculate Pearson correlation r = 0.8277 between Fund 1 monthly returns and the S&P 500 using n = 36 months. Using the parametric t-test for correlation, is r different from zero at the two-sided 5 percent level? (critical t ≈ ±2.032).

View answer and explanation
Question 6

In computing the Spearman rank correlation coefficient rs for n = 9 observations you find tied ranks for two observations in one variable (a tie for the 3rd and 4th largest values). How should you assign ranks for the tied observations before computing di and di^2?

View answer and explanation
Question 7

You compute a Spearman rank correlation rs = -0.20417 for n = 9 funds comparing alpha and expense ratio. The t-statistic (using t = rs * sqrt((n-2)/(1 - rs^2))) equals about -0.552. With critical t-values ±2.306 at the 0.05 level, what decision is appropriate?

View answer and explanation
Question 8

In a contingency table test of independence, the standardized residual for a cell is (Oij - Eij)/sqrt(Eij). If for a particular cell you get Oij = 122, Eij = 87.48, what is the standardized residual and what does a value above +2 imply?

View answer and explanation
Question 9

Which situation is LEAST appropriate for applying a parametric Pearson correlation test?

View answer and explanation
Question 10

You have two independent samples and wish to test for association between group membership and categorical outcome (2 categories by 3 categories contingency). What test should you use and what is the df formula?

View answer and explanation
Question 11

In the ETF contingency example, medium-growth cell observed count O = 122 and expected E = 87.48 produced a scaled squared deviation contribution ((O-E)^2)/E of about 13.62. How much does this cell contribute to the total chi-square value of 32.080? (Percentage contribution)

View answer and explanation
Question 12

Which of the following correctly states when to prefer nonparametric tests over parametric tests?

View answer and explanation
Question 13

You have the following 2x3 contingency table: Row totals = [120, 130] columns totals = [100, 90, 60]. Overall total = 250. What is the expected frequency for cell (row1, col2)?

View answer and explanation
Question 14

An analyst computes the Pearson sample correlation r between two variables as 0.31 with n = 36. The t-statistic calculated for correlation is 1.903. Given critical t ≈ ±2.032 at the two-sided 5 percent level, what is the p-value approximate range and decision?

View answer and explanation
Question 15

Which test statistic is used in a chi-square test of independence and what main inputs are required?

View answer and explanation
Question 16

An analyst computes Pearson correlation r = 0.3102 between two funds with n = 36 and finds t = 1.903 and p approximately 0.06 (two-sided). What interpretation is correct at the 5 percent level?

View answer and explanation
Question 17

When constructing a chi-square test of independence, which of the following is a required assumption for the usual chi-square approximation to be valid?

View answer and explanation
Question 18

You test independence in a 3x3 table and find chi-square = 5.5 with df = 4. For alpha = 0.05, the critical chi-square is 9.4877. Which conclusion is correct?

View answer and explanation
Question 19

Which statement about Spearman rank correlation is TRUE?

View answer and explanation
Question 20

In a 3x3 contingency table, you find several cells have small expected values (<5). Which action is MOST appropriate before performing a chi-square test?

View answer and explanation
Question 21

You have a contingency table and compute chi-square = 12.8 with df = 4. What is the approximate p-value range (using chi-square criticals: at 0.01 -> 13.277, at 0.025 -> 11.143)?

View answer and explanation
Question 22

Which of the following is TRUE about degrees of freedom for the chi-square test of independence in an r by c table?

View answer and explanation
Question 23

Suppose you compute a chi-square statistic of 3.57 for a 2x2 contingency table (df = 1). What is the two-sided p-value approximate and decision at alpha = 0.05? (chi-square critical at 0.05 with df=1 is 3.841)

View answer and explanation
Question 24

A dataset for two categorical variables yields expected cell counts all above 10 and total sample large. Which test is appropriate and why might nonparametric vs parametric labels be confusing here?

View answer and explanation
Question 25

You perform a chi-square test of independence on a 3x2 table and get chi-square = 46.3223 with df = 4. If the critical value at alpha = 0.05 is 9.4877, what do you conclude and what might be next steps to understand the association?

View answer and explanation
Question 26

A 3x3 contingency table yields chi-square = 32.08 with df = 4 and p < 0.001. Which statement about Type I and Type II errors aligns with this outcome?

View answer and explanation
Question 27

Which of these statements about the chi-square test of independence is FALSE?

View answer and explanation
Question 28

In the ETF example, a mosaic plot shows a large dark cell for medium-growth ETFs. The standardized residual for that cell is +3.69. How should you interpret this?

View answer and explanation
Question 29

When computing Spearman rs across many pairs (e.g., 10 unique currency pairs) and testing each at the 5 percent level, which multiplicity issue should you consider and why?

View answer and explanation
Question 30

Which of the following best describes the relationship between sample size and the ability to detect a nonzero Pearson correlation?

View answer and explanation
Question 31

A researcher tests independence in an r x c table using chi-square and obtains chi-square = 18.63 with df = 6. Using the chi-square critical values table, what is the approximate p-value range (chi-square critical for df=6: at 0.05 -> 12.592, at 0.01 -> 16.812)?

View answer and explanation
Question 32

Which of the following descriptions correctly distinguishes the Pearson parametric correlation test from the Spearman nonparametric test?

View answer and explanation
Question 33

A researcher has a contingency table of 500 companies classified by environmental rating (3 levels) and governance rating (3 levels). She computes chi-square = 35.744 and df = 4. With chi-square critical at 0.05 equal to 9.4877, interpret the result.

View answer and explanation
Question 34

If you wish to test whether a contingency table classification shows an association in a one-sided direction (e.g., more of category A in row1 than expected), how can you use standardized residuals in addition to the overall chi-square?

View answer and explanation
Question 35

You compute Pearson r between two series over many sample sizes. Which trend regarding the required magnitude of r to achieve significance is correct?

View answer and explanation
Question 36

Which software functions are cited in the chapter as ways to compute t critical values for Pearson correlation and Spearman tests in Excel, R, and Python?

View answer and explanation
Question 37

An analyst wants to compare two period return distributions (Period 1 and Period 2) using daily returns where samples are independent and not paired. Which approach is appropriate to test whether mean returns differ?

View answer and explanation
Question 38

You have computed Spearman rs matrix for five exchange rates over 180 days and found all pairwise correlations significant at 5 percent. Which of the following is reasonable next-step interpretation?

View answer and explanation
Question 39

Which of these calculations do you need to compute the chi-square statistic for a contingency table cell (i,j)?

View answer and explanation
Question 40

An analyst runs Spearman correlations among many pairs and obtains several small p-values around 0.04. What is the best general advice regarding interpretation?

View answer and explanation
Question 41

When using Spearman rank correlation on five currencies over 180 days (10 unique pairs), the analyst finds all pairwise t-statistics well above critical value; which interpretation about independence is correct at 5 percent?

View answer and explanation
Question 42

Which of the following correctly states the null hypothesis commonly tested for Pearson correlation in financial time series context?

View answer and explanation
Question 43

A 3x3 contingency table test returns chi-square = 1.902 with df = 4. For alpha = 0.05, what is the decision?

View answer and explanation
Question 44

If you compute Spearman rs on small samples (n <= 30), what special consideration should you take when testing significance?

View answer and explanation
Question 45

Which of the following is a correct reason to prefer a Spearman test over a Pearson test for correlation?

View answer and explanation
Question 46

In a 3x3 contingency table of ETFs you compute expected frequency for small-value cell as 46.703 and observed O = 50. The cell's scaled squared deviation equals (O - E)^2 / E. Compute that value roughly.

View answer and explanation
Question 47

If you find a statistically significant chi-square for independence in a large contingency table, which practical step helps prioritize which deviations are most important?

View answer and explanation
Question 48

Which is the correct degrees of freedom for a chi-square test on a 4x5 contingency table?

View answer and explanation
Question 49

You have observed that pairing categorical labels into fewer categories before a chi-square test increases expected counts and may help the test validity. Which caution should you heed when doing this?

View answer and explanation
Question 50

Which measure complements chi-square results by quantifying effect size for association in contingency tables, particularly when sample sizes are large?

View answer and explanation