Panel Data, Endogeneity, and Diff-in-Diff Exercises

Question 4

a. Endogeneity Issue

The regression model omits a dummy variable of Industry (1=Creative, 0=Non-creative), i.e. $\epsilon_{it} = c D_i^{CRE} + u_{it}$, which may create endogeneity problem if $Cov(F_{it}, D_i^{CRE}) \neq 0$.

b. Sign of Biased Coefficient

Assume that $F_{it}<0.5$ on average, then $Cov(F_{it}, D_i^{CRE}) > 0$ and $Cov(R_{it}, D_i^{CRE}) > 0$, leading to a positively biased estimate of $b$, i.e. $\hat b > b$.

c. Two Stage Least Squares

First stage: estimate $F_{it} = cD_i^{CRE} + u_{it}$ and obtain fitted value of $\hat F_{it}$
Second stage: estimate $R_{it} = a + b\hat F_{it} + e_{it}$

As we regress dependent variable on the estimated values of endogenous variable, we remove correlation between the regressor and the error term.

d. Instrument Evaluation

Proposed IV: $LinF_{it}$ measures the percentage of Female Linkedin contacts of the highest ranking male in the board.
Relevancy: highest ranking male individual (e.g. the CEO) is likely to add his board counterparts, which could reflect the percentage of female in the board. Therefore it satisfies this assumption.
Exclusion: proposed IV does not affect $R_{it}$ directly, thus satisfies this condition.
Exogeneity: $LinF_{it}$ is hardly related to the error term as the highest ranking male board member may also add his previous colleagues, his subordinates, etc.

As a result, the suggested IV $LinF_{it}$ deems to be a valid candidate for IV.

e. First Differencing Method

The omitted variable $D_i^{CRE}$ is a time-invariant variable; thus, applying the first differencing technique removes this variable from the regression model, hence eliminating the endogeneity problem.

f. Difference-in-Difference

Yes, including entity fixed effect in the DiD setting helps control for the selection bias, i.e. systematic differences between industries, consequently isolating the treatment effect.

Question 5

a. Multicollinearity

Given model

$$ RA_{it} = \beta_0 + \beta_1 W_{i,t-1} + \sum_{j=2}^5\beta_{2,j}PR_i^j + \beta_3 D_i^{EXP} + \varepsilon_{it} $$

It’s arguable that program type can be explained by last year’s wealth level linearly, i.e. $PR_i^j = \alpha_0 + \alpha_1 W_{i,t-1} + u_{i,t}$. In that case, the multicollinearity issue might arise, but does not cause endogeneity.

b. Omitted Variable

The suggested dummy $D_i^{BIZ}$ may cause endogeneity problem under:

$\text{Cov}(W_{i,t-1},D_i^{BIZ}) > 0$: there is a significant correlation between the omitted variable and the endogenous variable.
$\beta^{BIZ} > 0$: the coefficient of omitted regressor in the true regression model is statistically significant.

c. Valid Instrument

A potentially valid instrument for last year’s wealth is the S&P500 Growth Rate last year $SP500_{t-1}$. It is valid because:

It’s relevant to last year’s wealth because market movements affect the valuation of student’s portfolio.
It doesn’t directly affect the dependent variable $RA_{it}$ since market movements are aggregate shocks beyond individual control.
It is likely not to correlate with the error term.

d. Difference-in-Difference Setup

To set up a DiD regression model, we need to introduce a dummy $D_t^{BAN}$ indicating the time before-after the short-sales ban, a dummy $B_i$ whether stock $i$ is a bank stock, and their interaction term $D_t^{BAN}B_i$. The DiD Model looks like:

$$ \sigma_{i,t} = \beta_0 + \beta_1 D_t^{BAN} + \beta_2B_i + \beta_3D_t^{BAN}B_i + \varepsilon_{i,t} $$

Panel Data, Endogeneity, and Diff-in-Diff Exercises#

Question 4#

a. Endogeneity Issue#

b. Sign of Biased Coefficient#

c. Two Stage Least Squares#

d. Instrument Evaluation#

e. First Differencing Method#

f. Difference-in-Difference#

Question 5#

a. Multicollinearity#

b. Omitted Variable#

c. Valid Instrument#

d. Difference-in-Difference Setup#