Part I (Probability)

Q1. Random Variables

a. Gender: random, no underlying covariate/biased introduced; b. No. crash: random, depending on unpredictable factors: hardware failures, software malfunction, etc.; c. Commute time: random, influenced by unpredictable variables: transport delays, traffic jams, accidents; d. Computer assignment: random, no obvious assignment process known; e. Rainfall: random due to randomness of atmostpheric system. Temperature, wind speed, humidity, and other factors may vary randomly.

Q4. Sample Mean

There is a thin chance that the sample average weight of four students is exactly equal to the class (population) mean. However, the sample mean is generally close to but not equal to the population average due to sample error.

For example, we may randomly sample four heaviest students in the class, hence making the sample mean noticeably higher than 70kg. Or we could possibly choose four asian students, whose average weight is lighter than the class mean. As a result, the sample weight average is a random variable because of the randomness in the process of sample selection.

Q5. Statistics Computation

1
2
3
4
5
6
7
##            [,1]
## exp_y 0.7800000
## exp_x 0.7000000
## var_y 0.1716000
## var_x 0.2100000
## cov   0.0840000
## corr  0.4424977

Q8. Temperature Calculation

  • Formula: F = 32 + 1.8*C
  • Mean: 70 = 32 + 1.8*bar_C => bar_C = 21.111
  • Actual value: (F - bar_F) = 7 => F = 77 => C = 25
  • Standard Deviation: sd_C = 25-21.1111 = 3.889

PART II (after the Brooks book, Chapter 3)

Q9. Vertical Distances

a. Why are the vertical distances between the true y and the fitted y squared before added together when applying OLS

We square the differences between fitted and actual y to avoid the cancelation across negative & positive residuals, hence reflecting the prediction errors more accurately.

b. Why are the squares of the vertical distances taken rather than the absolute values?

First, the squared residual is a polynomial function that we can optimally minimized by taking first order derivative. Second, it is proven that mean squared error is the Best Linear Unbiased Estimators for predicting the mean under normal errors. Lastly, absolute residual will lead to the prediction of median.

Q11. Linear Regression using OLS

Models that can be estimated by OLS are EQ 11.1: \(y_t = \alpha + \beta x_t + u_t\); EQ 11.2: \(y_t = e^\alpha x_t^\beta e^ u_t \Rightarrow \ln y_t = \alpha + \beta \ln x_t + u_t\); EQ 11.4: \(\ln y_t = \alpha + \beta \ln x_t + u_t\); and EQ 11.5: \(y_t = \alpha + \beta x_t z_t + u_t \Rightarrow y_t = \alpha + \beta i_t + u_t\).

Q13. Capital Asset Pricing Model

a. \(\hat{\beta}\) is a random variable because it inherits the randomness from the underlying data generating process ($R_{it}, R_{mt}$) b. The hypothesis test is implemented on the estimated value of the coefficient because we can’t obtain the true parameter. c. Hypothesis test:

  • \(H_0: \hat{\beta} = 1\)
  • \(H_1: \hat{\beta} > 1\)
  • Test statistic: \(t = (1.147-1)/0.0548=2.682 > 1.671\) (degree of freedom = 62-2 = 60 => t-critical = 1.671)
  • Conclusion: we reject the null hypothesis.

d. New degree of freedom is 998 => t-critical ~ 1.646, thus the conclusion stays the same. e. We may use a more complicated model like Fama-French (three factors, five factors) f. Hypothesis test:

  • \(H_0: \hat{\beta} = 0\)
  • \(H_1: \hat{\beta} \neq 0\)
  • Test statistitcs: \(t = (0.214-0)/0.186 = 1.151 < 2.042\) at df = 36.
  • Conclusion: we fail to reject the null