. drop if sat_school >= .;
(398 observations deleted)
. reg sat_school hhsize, r;
Regression with robust standard errors Number of obs = 692
F( 1, 690) = 3.92
Prob > F = 0.0482
R-squared = 0.0081
Root MSE = .76476
------------------------------------------------------------------------------
| Robust
sat_school | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
hhsize | -.0140157 .007082 -1.98 0.048 -.0279205 -.0001109
_cons | 3.476232 .0635027 54.74 0.000 3.35155 3.600914
------------------------------------------------------------------------------
We see a significant relationship where larger households
are less satisfied with the schooling received by their children.
We might be worried that larger families are found in poorer, more rural areas
where the overall quality of education is lower. To control for this we can add
fixed effects for the census enumeration area or EA (this is the level on which our data
is clustered -- we have 5 households in each census enumeration area). This controls
for the socio-economic status of the community and (in most cases) the school the
children attend. Thus we want the model:
. qui tab ea_code, gen(eac_);
. reg sat_school hhsize eac_*, r;
Regression with robust standard errors Number of obs = 692
F(187, 484) = .
Prob > F = .
R-squared = 0.4850
Root MSE = .65793
------------------------------------------------------------------------------
| Robust
sat_school | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
hhsize | -.0194672 .0066937 -2.91 0.004 -.0326195 -.0063149
eac_1 | .0389344 .0212165 1.84 0.067 -.0027535 .0806222
eac_2 | .0778688 .3477253 0.22 0.823 -.6053689 .7611064
...
eac_205 | 1.111936 .051018 21.79 0.000 1.011692 1.212181
eac_206 | .7190059 .2901567 2.48 0.014 .1488836 1.289128
eac_207 | 1.077869 .0267748 40.26 0.000 1.02526 1.130478
_cons | 3.038934 .0133874 227.00 0.000 3.01263 3.065239
------------------------------------------------------------------------------
If we want to test whether the fixed effects are jointly significiant, we would use
. testparm eac_*;
( 1) eac_1 = 0
( 2) eac_2 = 0
...
F(187, 484) = 523.69
Prob > F = 0.0000
This method works perfectly fine, but it is unwiedly and involves three seperate commands.
. xi: reg sat_school hhsize i.ea_code, r;
i.ea_code _Iea_code_11020308-42080602(naturally coded;
_Iea_code_11020308 omitted)
Regression with robust standard errors Number of obs = 692
F(186, 484) = .
Prob > F = .
R-squared = 0.4850
Root MSE = .65793
------------------------------------------------------------------------------
| Robust
sat_school | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
hhsize | -.0194672 .0066937 -2.91 0.004 -.0326195 -.0063149
_Ie~11040201 | .0389344 .3466722 0.11 0.911 -.642234 .7201028
_Ie~11040503 | -.0389344 .0212165 -1.84 0.067 -.0806222 .0027535
...
_Ie~42080508 | .6800716 .2864624 2.37 0.018 .1172081 1.242935
_Ie~42080602 | 1.038934 .0212165 48.97 0.000 .9972465 1.080622
_cons | 3.077869 .0314294 97.93 0.000 3.016114 3.139624
------------------------------------------------------------------------------
This is the most efficient method when you have a small number of categories and care
about the estimated value of the fixed effect for each category.
. areg sat_school hhsize, a(ea_code) r;
Regression with robust standard errors Number of obs = 692
F( 1, 484) = 8.46
Prob > F = 0.0038
R-squared = 0.4850
Adj R-squared = 0.2648
Root MSE = .65793
------------------------------------------------------------------------------
| Robust
sat_school | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
hhsize | -.0194672 .0066937 -2.91 0.004 -.0326195 -.0063149
_cons | 3.522633 .0626253 56.25 0.000 3.399582 3.645684
-------------+----------------------------------------------------------------
ea_code | absorbed (207 categories)
. xtreg sat_school hhsize, fe i(ea_code);
Fixed-effects (within) regression Number of obs = 692
Group variable (i): ea_code Number of groups = 207
R-sq: within = 0.0188 Obs per group: min = 1
between = 0.0003 avg = 3.3
overall = 0.0081 max = 5
F(1,484) = 9.29
corr(u_i, Xb) = -0.0505 Prob > F = 0.0024
------------------------------------------------------------------------------
sat_school | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
hhsize | -.0194672 .0063855 -3.05 0.002 -.032014 -.0069204
_cons | 3.522633 .0598293 58.88 0.000 3.405075 3.64019
-------------+----------------------------------------------------------------
sigma_u | .56106439
sigma_e | .65793019
rho | .42103494 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(206, 484) = 2.18 Prob > F = 0.0000
Note that xtreg does not allow the , r option for robust
standard errors. areg is my favorite command for fixed effects regressions
although it doesn't display the joint significance of the fixed effects when you
have a large number of categories.
. bys ea_code: egen h_m = mean(hhsize);
. bys ea_code: egen s_m = mean(sat_school);
. gen h_dm = hhsize - h_m;
(10 missing values generated)
. gen s_dm = sat_school - s_m;
. reg s_dm h_dm, r;
Regression with robust standard errors Number of obs = 692
F( 1, 690) = 11.93
Prob > F = 0.0006
R-squared = 0.0188
Root MSE = .55111
------------------------------------------------------------------------------
| Robust
s_dm | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
h_dm | -.0194672 .005635 -3.45 0.001 -.030531 -.0084034
_cons | -.000289 .0209499 -0.01 0.989 -.0414222 .0408442
------------------------------------------------------------------------------
Note that all these models give exactly the same value for b, the coefficient
on hhsize. Demeaning gives different (slightly inacurate) standard errors.
contact: djiboliz@gmail.com
last modified: 2 May 2007