Sampling Methods5 min
Sampling is the process of selecting a subset of a population to estimate characteristics of the whole. Probability sampling ensures every member has a known chance of selection. Simple random sampling gives every member an equal chance. Stratified random sampling divides the population into subgroups (strata) based on characteristics and samples from each, often improving precision. Cluster sampling divides the population into clusters and samples all or some members of selected clusters. Non-probability sampling includes convenience sampling (based on accessibility) and judgmental sampling (based on researcher expertise), which may introduce bias.

Key Points

  • Probability sampling allows for error estimation; non-probability sampling relies on judgment.
  • Simple random sampling: Equal probability of selection for each item.
  • Stratified random sampling: Samples drawn from specific subgroups; ensures representation.
  • Cluster sampling: Samples entire groups (clusters); cost-effective but potentially less precise.
  • Systematic sampling: Selecting every nth member of a population.
Central Limit Theorem and Standard Error6 min
The Central Limit Theorem (CLT) is foundational for hypothesis testing. It asserts that the sampling distribution of the sample mean will be approximately normal if the sample size is large (n >= 30), regardless of the population's distribution. The mean of this sampling distribution equals the population mean. The standard deviation of this distribution is called the standard error of the sample mean. It is calculated by dividing the population standard deviation (or sample standard deviation if the former is unknown) by the square root of the sample size. This implies that larger samples result in smaller standard errors and more precise estimates.

Key Points

  • CLT applies when n >= 30, ensuring the sampling distribution is approximately normal.
  • Standard Error (SE) measures the dispersion of sample means around the population mean.
  • SE formula: Population standard deviation divided by the square root of n.
  • If population variance is unknown, use the sample standard deviation (s) to estimate SE.
  • Larger sample sizes reduce the standard error.
Properties of Estimators4 min
Point estimates are single values used to estimate population parameters. A good estimator should possess specific statistical properties. An unbiased estimator has an expected value equal to the true parameter. An efficient estimator is unbiased and has the smallest variance among all unbiased estimators. A consistent estimator yields estimates that converge to the true parameter value as the sample size increases, meaning the standard error approaches zero.

Key Points

  • Unbiasedness: Expected value of estimator equals the parameter.
  • Efficiency: Smallest variance among unbiased estimators.
  • Consistency: Probability of estimate being close to the parameter increases with sample size.
  • The sample mean is an unbiased and efficient estimator of the population mean.
Confidence Intervals and t-Distribution7 min
A confidence interval is a range of values believed to contain the population parameter with a certain probability (e.g., 95 percent). It is constructed as the point estimate plus or minus a reliability factor times the standard error. When the population variance is known, the z-statistic (normal distribution) is used. When unknown, the Student's t-statistic is used. The t-distribution depends on degrees of freedom (n - 1) and has fatter tails than the normal distribution, resulting in wider intervals. As degrees of freedom increase, the t-distribution converges to the standard normal distribution.

Key Points

  • Confidence Interval = Point Estimate +/- (Reliability Factor * Standard Error).
  • Use z-statistic when population variance is known.
  • Use t-statistic when population variance is unknown (requires assumption of normality for small samples).
  • Common z-values: 1.65 (90 percent), 1.96 (95 percent), 2.58 (99 percent).
  • t-distribution has fatter tails; intervals are wider than z-intervals.
Resampling and Sampling Biases6 min
Resampling methods like the jackknife (calculating means by removing one observation at a time) and the bootstrap (repeated sampling with replacement) are used to estimate standard errors when analytic formulas are difficult to apply. Analysts must also be aware of biases. Data snooping involves overuse of data to find patterns that may not exist. Sample selection bias occurs when data availability filters the sample non-randomly. Survivorship bias is a specific selection bias where only successful entities (e.g., surviving funds) are analyzed, inflating performance estimates. Look-ahead bias uses information not available at the time of the test. Time-period bias arises when a test is based on a specific time frame that may not be representative.

Key Points

  • Jackknife: Re-calculates statistic leaving one observation out each time.
  • Bootstrap: Repeated sampling from the original data set with replacement.
  • Data snooping bias: Finding spurious patterns due to extensive testing.
  • Survivorship bias: Overestimating performance by excluding failed firms/funds.
  • Look-ahead bias: Using future data for past simulations.
  • Time-period bias: Results valid only for a specific time period.

Questions

Question 1

Which of the following best describes a simple random sample?

View answer and explanation
Question 2

In stratified random sampling, how are samples drawn?

View answer and explanation
Question 3

Which sampling method involves dividing the population into subsets and assuming each subset is representative of the overall population?

View answer and explanation
Question 4

What is the primary difference between one-stage and two-stage cluster sampling?

View answer and explanation
Question 5

Which of the following is an example of non-probability sampling?

View answer and explanation
Question 6

Sampling error is best defined as:

View answer and explanation
Question 7

According to the Central Limit Theorem, the sampling distribution of the sample mean will be approximately normal if:

View answer and explanation
Question 8

The Central Limit Theorem states that the mean of the distribution of sample means is equal to:

View answer and explanation
Question 9

If a population has a mean of 50 and a standard deviation of 10, what is the standard error of the sample mean for a sample size of 25?

View answer and explanation
Question 10

When the population standard deviation is unknown, the standard error of the sample mean is estimated by:

View answer and explanation
Question 11

A sample of 100 observations has a standard deviation of 20. The standard error of the sample mean is:

View answer and explanation
Question 12

As the sample size increases, what happens to the standard error of the sample mean?

View answer and explanation
Question 13

An estimator is considered unbiased if:

View answer and explanation
Question 14

Which property of an estimator refers to having the smallest variance among all unbiased estimators?

View answer and explanation
Question 15

A consistent estimator is one where:

View answer and explanation
Question 16

A point estimate is best described as:

View answer and explanation
Question 17

A confidence interval is constructed using which of the following formulas?

View answer and explanation
Question 18

For a normal distribution, the reliability factor for a 90 percent confidence interval is approximately:

View answer and explanation
Question 19

The reliability factor for a 95 percent confidence interval using the standard normal distribution is:

View answer and explanation
Question 20

A sample of 64 observations has a mean of 20 and a population standard deviation of 4. What is the 95 percent confidence interval for the population mean?

View answer and explanation
Question 21

When constructing a confidence interval for the population mean of a normal distribution with unknown variance, which statistic should be used?

View answer and explanation
Question 22

The degrees of freedom for a t-statistic calculated from a sample of size n is:

View answer and explanation
Question 23

Compared to the standard normal distribution, the Student's t-distribution has:

View answer and explanation
Question 24

As the degrees of freedom increase, the Student's t-distribution:

View answer and explanation
Question 25

If a population is nonnormal and the variance is unknown, which test statistic is appropriate for a large sample (n > 30)?

View answer and explanation
Question 26

If sampling from a nonnormal distribution with unknown variance and a small sample size (n < 30), which test statistic is available?

View answer and explanation
Question 27

The Jackknife method of resampling involves:

View answer and explanation
Question 28

The Bootstrap method involves:

View answer and explanation
Question 29

Data snooping bias occurs when:

View answer and explanation
Question 30

Survivorship bias is a form of:

View answer and explanation
Question 31

Which bias occurs when a study tests a relationship using sample data that was not available on the test date?

View answer and explanation
Question 32

Time-period bias results when:

View answer and explanation
Question 33

A mutual fund database that only includes funds currently in existence likely suffers from:

View answer and explanation
Question 34

A sample has a mean of 5 percent and a standard deviation of 10 percent. If the sample size is 100, what is the standard error of the sample mean?

View answer and explanation
Question 35

With a sample size of 200 and a standard deviation of 20 percent, the standard error is 1.4 percent. If the sample size increases, the standard error will:

View answer and explanation
Question 36

In a random sample of 50 items with a known population standard deviation, the standard error is 2. If the sample size is increased to 200, the new standard error will be:

View answer and explanation
Question 37

If a confidence interval is 95 percent, the significance level (alpha) is:

View answer and explanation
Question 38

Using a z-table, the probability that a standard normal random variable is less than -1.96 is:

View answer and explanation
Question 39

For a t-distribution with 29 degrees of freedom, the critical value for a 95 percent confidence interval is 2.045. A sample of 30 has a mean of 2 and a standard deviation of 20. The confidence interval is closest to:

View answer and explanation
Question 40

Systematic sampling involves:

View answer and explanation
Question 41

Which sampling method is often used in bond indexing to approximate the index without purchasing every bond?

View answer and explanation
Question 42

In the context of sampling distributions, the standard error decreases when:

View answer and explanation
Question 43

When using the t-distribution, if the degrees of freedom increase, the confidence interval for a given significance level will:

View answer and explanation
Question 44

A 99 percent confidence interval using the z-statistic includes the mean plus or minus:

View answer and explanation
Question 45

Suppose a researcher tests a trading rule using the same database repeatedly until a significant result is found. This is an example of:

View answer and explanation
Question 46

Using a price-to-book ratio based on year-end prices and year-end book values (available months later) to test a trading strategy is an example of:

View answer and explanation
Question 47

If a study covers a time period where a fundamental structural change occurred (e.g., changing inflation dynamics), the study may suffer from:

View answer and explanation
Question 48

Convenience sampling is best described as:

View answer and explanation
Question 49

Which of the following is NOT a desirable property of an estimator?

View answer and explanation
Question 50

To calculate the standard error of the sample mean when the population variance is known, one divides the population standard deviation by:

View answer and explanation