Panel Data, Endogeneity, and Diff-in-Diff Exercises

Question 4

a. Endogeneity Issue

The regression model omits a dummy variable of Industry (1=Creative, 0=Non-creative), i.e. \(\epsilon_{it} = c D_i^{CRE} + u_{it}\), which may create endogeneity problem if \(Cov(F_{it}, D_i^{CRE}) \neq 0\).

b. Sign of Biased Coefficient

Assume that \(F_{it}<0.5\) on average, then \(Cov(F_{it}, D_i^{CRE}) > 0\) and \(Cov(R_{it}, D_i^{CRE}) > 0\), leading to a positively biased estimate of \(b\), i.e. \(\hat b > b\).

c. Two Stage Least Squares

  1. First stage: estimate \(F_{it} = cD_i^{CRE} + u_{it}\) and obtain fitted value of \(\hat F_{it}\)
  2. Second stage: estimate \(R_{it} = a + b\hat F_{it} + e_{it}\)

As we regress dependent variable on the estimated values of endogenous variable, we remove correlation between the regressor and the error term.

d. Instrument Evaluation

  • Proposed IV: \(LinF_{it}\) measures the percentage of Female Linkedin contacts of the highest ranking male in the board.
  • Relevancy: highest ranking male individual (e.g. the CEO) is likely to add his board counterparts, which could reflect the percentage of female in the board. Therefore it satisfies this assumption.
  • Exclusion: proposed IV does not affect \(R_{it}\) directly, thus satisfies this condition.
  • Exogeneity: \(LinF_{it}\) is hardly related to the error term as the highest ranking male board member may also add his previous colleagues, his subordinates, etc.

As a result, the suggested IV \(LinF_{it}\) deems to be a valid candidate for IV.

e. First Differencing Method

The omitted variable \(D_i^{CRE}\) is a time-invariant variable; thus, applying the first differencing technique removes this variable from the regression model, hence eliminating the endogeneity problem.

f. Difference-in-Difference

Yes, including entity fixed effect in the DiD setting helps control for the selection bias, i.e. systematic differences between industries, consequently isolating the treatment effect.

Question 5

a. Multicollinearity

Given model

$$ RA_{it} = \beta_0 + \beta_1 W_{i,t-1} + \sum_{j=2}^5\beta_{2,j}PR_i^j + \beta_3 D_i^{EXP} + \varepsilon_{it} $$

It’s arguable that program type can be explained by last year’s wealth level linearly, i.e. \(PR_i^j = \alpha_0 + \alpha_1 W_{i,t-1} + u_{i,t}\). In that case, the multicollinearity issue might arise, but does not cause endogeneity.

b. Omitted Variable

The suggested dummy \(D_i^{BIZ}\) may cause endogeneity problem under:

  1. \(\text{Cov}(W_{i,t-1},D_i^{BIZ}) > 0\): there is a significant correlation between the omitted variable and the endogenous variable.
  2. \(\beta^{BIZ} > 0\): the coefficient of omitted regressor in the true regression model is statistically significant.

c. Valid Instrument

A potentially valid instrument for last year’s wealth is the S&P500 Growth Rate last year \(SP500_{t-1}\). It is valid because:

  1. It’s relevant to last year’s wealth because market movements affect the valuation of student’s portfolio.
  2. It doesn’t directly affect the dependent variable \(RA_{it}\) since market movements are aggregate shocks beyond individual control.
  3. It is likely not to correlate with the error term.

d. Difference-in-Difference Setup

To set up a DiD regression model, we need to introduce a dummy \(D_t^{BAN}\) indicating the time before-after the short-sales ban, a dummy \(B_i\) whether stock \(i\) is a bank stock, and their interaction term \(D_t^{BAN}B_i\). The DiD Model looks like:

$$ \sigma_{i,t} = \beta_0 + \beta_1 D_t^{BAN} + \beta_2B_i + \beta_3D_t^{BAN}B_i + \varepsilon_{i,t} $$