Panel Data, Endogeneity, and Diff-in-Diff Exercises
Question 4
a. Endogeneity Issue
The regression model omits a dummy variable of Industry (1=Creative, 0=Non-creative), i.e. \(\epsilon_{it} = c D_i^{CRE} + u_{it}\), which may create endogeneity problem if \(Cov(F_{it}, D_i^{CRE}) \neq 0\).
b. Sign of Biased Coefficient
Assume that \(F_{it}<0.5\) on average, then \(Cov(F_{it}, D_i^{CRE}) > 0\) and \(Cov(R_{it}, D_i^{CRE}) > 0\), leading to a positively biased estimate of \(b\), i.e. \(\hat b > b\).
c. Two Stage Least Squares
- First stage: estimate
\(F_{it} = cD_i^{CRE} + u_{it}\)and obtain fitted value of\(\hat F_{it}\) - Second stage: estimate
\(R_{it} = a + b\hat F_{it} + e_{it}\)
As we regress dependent variable on the estimated values of endogenous variable, we remove correlation between the regressor and the error term.
d. Instrument Evaluation
- Proposed IV:
\(LinF_{it}\)measures the percentage of Female Linkedin contacts of the highest ranking male in the board. - Relevancy: highest ranking male individual (e.g. the CEO) is likely to add his board counterparts, which could reflect the percentage of female in the board. Therefore it satisfies this assumption.
- Exclusion: proposed IV does not affect
\(R_{it}\)directly, thus satisfies this condition. - Exogeneity:
\(LinF_{it}\)is hardly related to the error term as the highest ranking male board member may also add his previous colleagues, his subordinates, etc.
As a result, the suggested IV \(LinF_{it}\) deems to be a valid candidate for IV.
e. First Differencing Method
The omitted variable \(D_i^{CRE}\) is a time-invariant variable; thus, applying the first differencing technique removes this variable from the regression model, hence eliminating the endogeneity problem.
f. Difference-in-Difference
Yes, including entity fixed effect in the DiD setting helps control for the selection bias, i.e. systematic differences between industries, consequently isolating the treatment effect.
Question 5
a. Multicollinearity
Given model
$$ RA_{it} = \beta_0 + \beta_1 W_{i,t-1} + \sum_{j=2}^5\beta_{2,j}PR_i^j + \beta_3 D_i^{EXP} + \varepsilon_{it} $$
It’s arguable that program type can be explained by last year’s wealth level linearly, i.e. \(PR_i^j = \alpha_0 + \alpha_1 W_{i,t-1} + u_{i,t}\). In that case, the multicollinearity issue might arise, but does not cause endogeneity.
b. Omitted Variable
The suggested dummy \(D_i^{BIZ}\) may cause endogeneity problem under:
\(\text{Cov}(W_{i,t-1},D_i^{BIZ}) > 0\): there is a significant correlation between the omitted variable and the endogenous variable.\(\beta^{BIZ} > 0\): the coefficient of omitted regressor in the true regression model is statistically significant.
c. Valid Instrument
A potentially valid instrument for last year’s wealth is the S&P500 Growth Rate last year \(SP500_{t-1}\). It is valid because:
- It’s relevant to last year’s wealth because market movements affect the valuation of student’s portfolio.
- It doesn’t directly affect the dependent variable
\(RA_{it}\)since market movements are aggregate shocks beyond individual control. - It is likely not to correlate with the error term.
d. Difference-in-Difference Setup
To set up a DiD regression model, we need to introduce a dummy \(D_t^{BAN}\) indicating the time before-after the short-sales ban, a dummy \(B_i\) whether stock \(i\) is a bank stock, and their interaction term \(D_t^{BAN}B_i\). The DiD Model looks like:
$$ \sigma_{i,t} = \beta_0 + \beta_1 D_t^{BAN} + \beta_2B_i + \beta_3D_t^{BAN}B_i + \varepsilon_{i,t} $$